Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2022 Sep 5;51(1):87–113. doi: 10.1080/02664763.2022.2116409

A test for comparing conditional ROC curves with multidimensional covariates

A Fanjul-Hevia a,CONTACT, J C Pardo-Fernández b, I Van Keilegom c, W González-Manteiga d
PMCID: PMC10763921  PMID: 38179166

Abstract

The comparison of Receiver Operating Characteristic (ROC) curves is frequently used in the literature to compare the discriminatory capability of different classification procedures based on diagnostic variables. The performance of these variables can be sometimes influenced by the presence of other covariates, and thus they should be taken into account when making the comparison. A new non-parametric test is proposed here for testing the equality of two or more dependent ROC curves conditioned to the value of a multidimensional covariate. Projections are used for transforming the problem into a one-dimensional approach easier to handle. Simulations are carried out to study the practical performance of the new methodology. The procedure is then used to analyse a real data set of patients with Pleural Effusion to compare the diagnostic capability of different markers.

KEYWORDS: Bootstrap, covariates, hypothesis testing, projections, ROC curves

1. Introduction

In any classification problem such as a diagnostic method – in which the aim is to discriminate between two populations, usually identified as the healthy population and the diseased population – the main concern is to minimize the number of subjects that are misclassified. Receiver Operating Characteristic (ROC) curves are commonly used in this context for studying the behaviour of the classification variables [see, for example, the monograph of 19, as an introduction to the topic]. They combine the notions of sensitivity (the ability of classifying a diseased patient as diseased) and specificity (the ability of classifying a healthy individual as healthy), two measurements that can be expressed in terms of the cumulative distribution functions of the diagnostic variables of the diseased and the healthy populations.

When there is more than one variable for diagnosing a certain disease one can compare their respective ROC curves in order to decide whether their discriminatory capability is different or not. This is what happens in the medical example that we will be analysing in this paper, a real data set containing the information of patients with pleural effusion. In this data set there are two variables (the carbohydrate antigen 152 and the cytokeratin fragment 21-1) that can be used for deciding whether that pleural effusion is due to the presence of a malignant tumour or not. The objective of the analysis will be to compare the diagnostic capability of those markers.

There are several methodologies discussed in the literature for making that sort of comparisons [for a review of such methodologies, see 6], although most of them do not consider the possible effect that the presence of covariates can have in the performance of the test. In the example provided, apart from the diagnostic variables there are other covariates such as the age or the neuron-specific enolase of the patients. It is important to take this information into account, because the diagnostic capability of a marker may change with the value of a covariate [17].

In this paper the aim is to propose a test to compare ROC curves that includes the presence of a multidimensional covariate in the analysis. With this methodology, given a new patient waiting for a diagnosis on his or her pleural effusion, we could compare the different diagnosis mechanisms taking into account the covariate values of this particular individual and see which one would be the most appropriate.

One way of introducing the effect of the covariates into the study is by using the conditional ROC curve. If we consider YF and YG as the continuous diagnostic markers in the diseased and healthy populations, respectively, XF=(X1F,,XdF) as the continuous ddimensional covariate of the diseased population and XG=(X1G,,XdG) as the continuous ddimensional covariate of the healthy population, then, given a fixed value x=(x1,,xd)RX (where RX is the intersection of RXF and RXG, the supports of XF and XG, and is assumed to be non-empty), the conditional ROC curve is defined as

ROCx(p)=1F(G1(1p|x)|x),p(0,1), (1)

where F(y|x)=P(YFy|XF=x), and G(y|x)=P(YGy|XG=x).

By comparing these conditional ROC curves instead of the standard ROC curves it is possible to incorporate the potential effect of the covariates in the analysis of the equivalence of two or more methods of diagnosis. A test for performing this comparison is proposed in [7] for the case of a continuous one-dimensional covariate. The objective here is to extend that methodology to the case in which we have a multidimensional covariate. Thus, the aim is to test, given a certain xRX,

H0:ROC1x(p)==ROCKx(p)for allp(0,1), (2)

where K is the number of diagnostic markers (and thus, ROC curves) that are being compared. In this context we would have K diagnostic variables and one ddimensional covariate in the healthy population, (XF,Y1F,,YKF), and similar variables in the diseased population, (XG,Y1G,,YKG). In practice this kind of test could help to design a more personalized diagnostic method based on the covariate values of each patient. With this methodology, in the medical example at hand we could determine whether the carbohydrate antigen 152 and the cytokeratin fragment 21-1 are equally suitable for the diagnosis of a patient with a certain age and a certain enolase value.

In order to be able to make this comparison, we are going to rely on the estimation of the corresponding conditional ROC curves. There is a wide range of estimation methods in the literature: some of them estimate the conditional distribution functions involved in the definition of the conditional ROC curve, others use regression functions to include the effect of the covariates (following direct or indirect approaches). See [17] for a further review of this topic.

In [7] the estimation of the conditional ROC curve that is used is based on the indirect (or induced) regression methodology, which incorporates the covariate information through regression models by considering the effect of those covariates in the diagnostic marker in each population of healthy or diseased separately. However, this method was originally designed for one single covariate. One could think of extending that methodology by changing the estimator of the conditional ROC curve for another capable of handling multidimensional covariates. Nevertheless, there are not many methods in the literature capable of considering more than one covariate when estimating the conditional ROC curve, and most of them have some parametric assumptions that we would like to avoid making. See [11] as an example of a non-parametric Bayesian model to estimate the conditional distribution functions involved in the ROC curves, [13], [21] or [22] as examples of a direct ROC regression model (where generalized additive models are used to directly regress the ROC curve) or [20] as an example of induced methodology (framed in a Bayesian setting). In our case we will be following a frequentist approach.

The tests related to multidimensional data tend to become less powerful when the dimension of the problem increases. This is why, in this paper, the problem of comparing conditional ROC curves is first transformed using projections in such a way that the multidimensional problem becomes a unidimensional problem easier to handle. This idea has been applied several times in the literature for reducing the dimension in goodness-of-fit problems [see, for example, 5,8,18], but, to the best of our knowledge, it is the first time that it is applied on an ROC curve setting. In the last few years random projections are increasingly being used as a way to overcome the curse of dimensionality. The characterization of the multidimensional distribution of the original data by the distribution of the randomly projected unidimensional data is what allows for the reduction of the dimension.

To that end, in Section 2, which is dedicated to the exposition of the methodology introduced in this paper, we begin by showing how (2) can be transformed in a test with one-dimensional covariates using projections. In Section  2.2 we show how to compare ROC curves in that simplified setting, with a unidimensional covariate and in Section  2.3, which contains the major contribution of this paper, we propose a methodology for testing that equivalent hypothesis with a multi-dimensional covariate. This includes the proposal of a bootstrap algorithm to approximate the distribution of the statistic under the null hypothesis. In Section 3 the results from a simulation study show the practical performance of the test in terms of level approximation and power. The procedure is illustrated in Section 4 by analysing the real data set containing information of patients with pleural effusion.

2. Methodology

This section is divided in three subsections. In the first one, Section  2.1, we present a result that allows us to transform the problem discussed in (2) into an equivalent one, easier to handle, by using projections to reduce the multidimensional role of the covariate to a unidimensional one.

In Section 2.2 we show a methodology to test the equality of conditional ROC curves on a unidimensional problem [based on the one proposed in 7]. Finally, in Section  2.3, we combine that methodology with the result obtained in Section  2.1 to solve our original problem with multidimensional covariates. Both Sections 2.2 and 2.3 include the statistic proposed to perform the test and a bootstrap algorithm to approximate its distribution.

2.1. An equivalent problem

In order to present the transformation of the problem, first we need to introduce the definition of the ROC curve conditioned to a pair (xF,xG)RXF×RXG:

ROCxF,xG(p)=1F(G1(1p|xG)|xF),p(0,1). (3)

This concept is very similar to the conditional ROC curve (1): the only difference is that this new definition allows us to condition on different values for the diseased and healthy populations. In this case xF and xG are unidimensional, but the definition could be applied on a multidimensional case. Even if the interpretability of this new ROC curve is not very clear in practice, theoretically it does not present any problems (as it will not do its estimation), as the population of healthy and diseased are always considered to be independent.

The following result is the base for developing the test for comparing ROC curves with multidimensional covariates. It borrows the ideas in [5] of using projections for reducing the dimension of the covariate in a regression context. Since here we are dealing with ROC curves, the dimension reduction is less straightforward and some adjustments are required, as each ROC depends on two cumulative distribution functions. To the best of our knowledge, the idea of using projections has not been considered in the context of ROC curves.

Given x,βRd, xβ denotes the scalar product of the vectors x and β. For now on, all the vectors representing the projections will be considered to be contained in the d-dimensional unit sphere Sd1={βRd:||β||=1}. This way we ensure that all possible directions are equally important.

Lemma 2.1

Assume E|YkF|< and E|YkG|< for every k{1,,K}. Then, given a certain xRX, and assuming dependence among the ROC curves (meaning the covariate is common for all the K curves considered), then

ROC1x(p)==ROCKx(p)for all p(0,1)a.s.

if and only if

ROC1(βF)x,(βG)x(p)==ROCK(βF)x,(βG)x(p)for all p(0,1)a.s.for anyβF,βG,

where βF and βG are ddimensional coordinates in Sd1 that represent the directions of the projections.

The proof of this Lemma can be found in the Appendix. Note that (βF)x and (βG)x are one-dimensional values. By using these ROC curves conditioned to a pair of projected covariates (as defined in 3), the problem is reduced to a one-dimensional covariate conditional ROC curve comparison test for each possible direction βF and βG.

Thus, taking advantage of the result in Lemma 2.1, instead of testing for the null hypothesis (2), we may use this equivalent formulation to develop a methodology that, given a certain xRX, tests

H0:ROC1(βF)x,(βG)x(p)==ROCK(βF)x,(βG)x(p)for all p(0,1)βF,βG (4)

against the general alternative H1: H0 is not true. The notation ∀ will be used instead of ‘for any’ to shorten the expression (this applies mainly in the proofs found in the Appendix).

Note that values of the projections do not have a meaning on their own: there is not optimal direction of the coefficients to be found, they are all equally important. This methodology should not be confused with the search for the best linear combination of markers for developing new diagnostic tests, like the ones proposed in [23] or in [12]. Here we are combining the components of a multidimensional covariate to perform a test about the performance of two or more markers.

In a first step, a statistic for testing the equivalence of these ROC curves is presented for a certain pair of fixed projections, and then that statistic is adapted to include all possible directions.

2.2. Test for a one-dimensional covariate

The objective in this section is to develop a test for the equivalent problem presented in Lemma 2.1 for a fixed pair of projections βF and βG. Here a test is presented for comparing two or more dependent ROC curves conditioned to two one-dimensional values. Given the pair (xF,xG)RXF×RXG, the aim is then to test

H0:ROC1xF,xG(p)==ROCKxF,xG(p)for all p(0,1) (5)

against the general alternative H1:H0 is not true.

The samples available in this context are:

  • {(XiF,Y1,iF,,YK,iF)}i=1nF an i.i.d. sample from the distribution of (XF,Y1F,,YKF),

  • {(XiG,Y1,iG,,YK,iG)}i=1nG an i.i.d. sample from the distribution of (XG,Y1G,,YKG),

with nF and nG the sample sizes of the diseased and healthy populations, respectively. Define n=nF+nG as the total sample size used for the estimation of each conditional ROC curve (that will be the same for all k{1,,K}). Note that both XF and XG are here one-dimensional covariates.

The method used for the estimation of the conditional ROC curves is based on the one proposed in [9], which relies on non-parametric location-scale regression models. To be more precise, for each k=1,,K, assume that

YkF=μkF(XF)+σkF(XF)εkF (6)
YkG=μkG(XG)+σkG(XG)εkG (7)

where, for D{F,G}, μkD()=E(YkD|XD=) and (σkD)2()=Var(YkD|XD=) are the conditional mean and the conditional variance functions (both of them unknown smooth functions), and the error εkD is independent of XD. The dependence structure between the K diagnostic variables is modelled by introducing a dependence structure between the errors: (ε1D,,εKD) will follow a multivariate distribution function with zero mean and a covariance matrix with ones in the diagonal.

Given this location-scale regression model structure for the diagnostic variables, the k-th ROC curve conditioned to a pair of values (xF,xG)RXF×RXG can be expressed in terms of the marginal cumulative distribution functions of the errors, HkF and HkG:

ROCkxF,xG(p)=1HkF((HkG)1(1p)bk(xF,xG)ak(xF,xG)), (8)

where

ak(xF,xG)=μkF(xF)μkG(xG)σkF(xF)andbk(xF,xG)=σkG(xG)σkF(xF).

Thus, this k-th conditional ROC curve can be estimated by

ROC^kxF,xG(p)=1H^kF((H^kG)1(1p+hku)b^k(xF,xG)a^k(xF,xG))κ(u)du, (9)

where, for D{F,G},

  • H^kD(y)=(nD)1i=1nDI(ε^k,iDy),

  • ε^k,iD=Yk,iDμ^kD(XiD)σ^kD(XiD), with i{1,,nD},

  • μ^kD(x)=i=1nDWk,iD(x,gkD)Yk,iD is a non-parametric estimator of μkD(x) based on local weights Wk,iD(x,gkD) depending on a bandwidth parameter gkD,

  • (σ^kD)2(x)=i=1nDWk,iD(x,gkD)[Yk,iDμ^kD(XiD)]2 is a non-parametric estimator of (σkD)2(x). For simplicity we take the same bandwidth parameter gkD that is used for the estimation of the regression function μ^kD(x),

  • Wk,iD(x,gkD)=κgkD(xXiD)l=1nDκgkD(xXlD) are Nadaraya–Watson-type weights, where κgkD()=κ(/gkD)/gkD and κ is a probability density function symmetric around zero,

  • a^k(xF,xG)=(μ^kF(xF)μ^kG(xG))/σ^kF(xF) and b^k(xF,xG)=σ^kG(xG)/σ^kF(xF),

  • hk is a bandwidth parameter responsible for the smoothness of the estimator. Its value does not seem to have a significant effect on the conditional ROC curve estimation.

This way of estimating the conditional ROC curve is similar to the one proposed in [9], with the difference that they condition the ROC curve on a single value x and here we have a pair of values xF and xG, each one of them related to the diseased and the healthy population, respectively. As both populations are independent, the adaptation of the methodology of [9] to this case is straightforward.

Once we know how to estimate this doubly conditional ROC curve we can propose a test statistic for the test (5):

Sx=k=1Kψ(ngk{ROCk^xF,xG(p)ROC^xF,xG(p)}), (10)

where:

  • for k{1,,K}, gk=nFgkF+nGgkGn, where gkF and gkG are bandwidth parameters involved in the estimation of the kth conditional ROC curve.

  • for k{1,,K}, ROC^kxF,xG(p) is the estimated conditional ROC curve given (xF,xG), as seen in (9),

  • ROC^xF,xG(p)=(k=1Kgk)1k=1KgkROCk^xF,xG(p) is a sort of weighted average of the K conditional ROC curves.

  • ψ is a real-valued function that measures the difference between each estimated conditional ROC curve and the weighted average of all of them. This function may be similar to the ones used for the comparison of cumulative distribution functions (after all, a ROC curve can be viewed as a cumulative distribution function). For example, if one considers the L2-measure, then the resulting test statistic is
    SL2x=k=1Kngk(ROCk^xF,xG(p)ROC^xF,xG(p))2dp.
    On the other hand, when using the Kolmogorov–Smirnov criteria the resulting test statistic is
    SKSx=k=1Kngksupp|ROCk^xF,xG(p)ROC^xF,xG(p)|.

The null hypothesis will be rejected for large values of Sx. In order to obtain the distribution of this statistic, a bootstrap algorithm is proposed. This bootstrap algorithm is adapted from the procedure proposed in [14] and has been already used by Martínez-Camblor et al. [15] and by Fanjul-Hevia et al. [7] in the context of ROC curves. The key of this algorithm is that

Tx=k=1Kψ(ngk{(ROC^kxF,xG(p)ROC^xF,xG(p))(ROCkxF,xG(p)ROCxF,xG(p))}),

coincides with the statistic Sx as long as the null hypothesis holds, where

ROCxF,xG(p)=(k=1Kgk)1k=1KgkROCkxF,xG(p),0<p<1.

The quantity Tx can be rewritten as

Tx=k=1Kψ(j=1Kngjαkj{ROC^jxF,xG(p)ROCjxF,xG(p)}), (11)

where αkj=I(k=j)gkgj(i=1Kgi)1. Note that, in general, Tx cannot be computed from the data, as it depends on the unknown theoretical conditional ROC curves, but it is useful when applying the bootstrap algorithm.

The bootstrap algorithm suggested to approximate a p-value for this test is the following:

  • A.1

    From the original samples, {(XiF,Y1,iF,,YK,iF)}i=1nF and {(XiG,Y1,iG,,YK,iG)}i=1nG, compute the test statistic value (10), that we will denote by sx.

  • A.2
    For b=1,,B, generate the bootstrap samples {(XiF,Y1,iF,b,,YK,iF,b)}i=1nF and {(XiG,Y1,iG,b,,YK,iG,b)}i=1nG as follows:
    1. For each D{F,G}, let {(ε1,iD,b,,εK,iD,b)}i=1nD be an i.i.d. sample from the empirical cumulative multivariate distribution function of the original residuals.
    2. Reconstruct the bootstrap samples {(XiD,Y1,iD,b,,YK,iD,b)}i=1nD for each D{F,G}, where Yk,iD,b=μ^kD(Xk,iD)+σ^kD(Xk,iD)εk,iD,b.
  • A.3
    Compute the test statistic based on the bootstrap samples, for b=1,,B using (11) as
    tx,b=k=1Kψ(j=1Kngjαkj{ROC^jxF,xG,b(p)ROC^jxF,xG(p)}),
    where ROC^jxF,xG,b is the estimated j-th conditional ROC curve of the b-th bootstrap sample.
  • A.4
    The distribution of Sx under the null hypothesis (and thus, the distribution of Tx) is approximated by the empirical distribution of the values {tx,1,,tx,B} and the p-value is approximated by
    pvalue=1Bb=1BI(sxtx,b).

In contrast with the usual bootstrap algorithms in testing setups, in this case the null hypothesis is not employed when generating of the bootstrap samples (Step A.2), because replicating the null hypothesis of equal ROC curves is not a straightforward problem. Instead, it is used in the computation of the bootstrap statistic (Step A.3) by using Tx instead of Sx, that are equal under the null hypothesis. This particularity also appears in the bootstrap algorithm of the next section.

There are two kind of bandwidth parameters that appear in the estimation of the kth conditional ROC curve (9), with k{1,K}. The first one, hk, is taken as 1/n, and the second ones, gkF and gkG, are selected by least-squares cross-validation. Note that, for each bootstrap iteration, the bandwidth parameters could change, as their selection depends on the sample. However, hk remains constant, as we are choosing it in terms of the sample size, and that is the same for each bootstrap iteration. As for gkF and gkG, for computational issues we have decided to compute them on step A.1 using the original sample, and then apply the same bandwidths for all the bootstrap estimations. The cross-validation method can be very time-consuming, and this simplification prevents the simulations to become infeasible.

2.3. Test for a multidimensional covariate

In the previous subsections we have shown how to transform our original multidimensional problem into a one-dimensional one by using projections and how to test the equality of ROC curves conditioned to a given pair (xF,xG) (and more particularly, to a fixed pair of directions). It is time now to resume our main objective, which was to compare ROC curves conditioned to a multidimensional covariate.

Once having seen a strategy for testing (4) for only one pair of fixed directions, the idea now is to modify the previous procedure so the new statistic takes into account all the possible directions that βF and βG can take. For that purpose, consider the test statistic

DSx=Sd1Sd1S(βF)x,(βG)xdβFdβG, (12)

where dβF and dβG represent the uniform density on the sphere of dimension d, Sd1. This ensures that all directions are equally important.

The expression S(βF)x,(βG)x is equal to the statistic used in (10) for testing the equality of K ROC curves when conditioned to the value of the pair ((βF)x,(βG)x), that is,

S(βF)x,(βG)x=k=1Kψ(ngk{ROCk^(βF)x,(βG)x(p)ROC^(βF)x,(βG)x(p)}).

Note that, in this context with ddimensional covariates, the samples are {(XiF,Y1,iF,,YK,iF)}i=1nF and {(XiG,Y1,iG,,YK,iG)}i=1nG, with XiF=(X1,iF,,Xd,iF) and XiG=(X1,iG,,Xd,iG).

In practice, as it is done in [1], to compute the test statistic DSx random directions β1F,,βnβF and β1G,,βnβG are drawn uniformly from Sd1, where nβ is the number of random directions considered (the same number of directions is taken for βF and for βG). With them, the approximated statistic is

D~Sx=1nβ2r=1nβl=1nβS(βrF)x,(βlG)x. (13)

In order to obtain the distribution of the statistic, a bootstrap algorithm (similar to the one described in the previous section) is proposed. To do so, the following expression is introduced:

DTx=Sd1Sd1T(βF)x,(βG)xdβFdβG, (14)

where T(βF)x,(βG)x is the same as in (11), but for the conditioning values of ((βF)x,(βG)x):

T(βF)x,(βG)x=k=1Kψ(j=1Kngjαkj{ROC^j(βF)x,(βG)x(p)ROCj(βF)x,(βG)x(p)}).

As it happened in (11), T(βF)x,(βG)x cannot be computed without knowing the true distribution of the diagnostic markers. However, it can be computed in the bootstrap algorithm below, and there DTx is approximated by

D~Tx=1nβ2r=1nβl=1nβT(βrF)x,(βlG)x. (15)

As happened before, for two given projections βF and βG, S(βF)x,(βG)x and T(βF)x,(βG)x coincide as long as the null hypothesis holds, and thus the same happens with DSx and DTx.

Taking into account these approximations, the resulting bootstrap algorithm goes as follows:

  • B.1
    Draw nβ random directions β1F,,βnβF and β1G,,βnβG uniformly from Sd1. This can be done by using the method proposed by Muller [16] to generate points in a multidimensional sphere. For each random direction βsD, with D{F,G} and s{1,,nβ}:
    1. Generate d values independently from a normal standard distribution: u1,,ud.
    2. Consider the projection obtained by normalizing the vector (u1,,ud):
      βsD=(u1u12++ud2,,udu12++ud2).
  • B.2

    For each random directions βrF and βlG (with r,l{1,,nβ}), consider the sample {((βrF)XiF,Y1,iF,,YK,iF)}i=1nF and {((βlG)XiG,Y1,iG,,YK,iG)}i=1nG and the conditioning values ((βrF)x,(βlG)x). With them, following steps A.1–A.3 of the bootstrap algorithm of the previous subsection, compute the value of s(βrF)x,(βlG)x and the B corresponding t(βrF)x,(βlG)x,b.

  • B.3

    Compute d~Sx=1nβ2r=1nβl=1nβs(βrF)x,(βlG)x and d~Tx,b=1nβ2r=1nβl=1nβt(βrF)x,(βlG)x,b as in (13) and (15).

  • B.4
    Approximate the p-value of the test by:
    pvalue=1Bb=1BI(d~Sxd~Tx,b).

Remark 2.1

Note that nβ represents the number of random directions drawn from Sd1 considered for the approximation of (13) and (15), but that, in fact, we are using nβ2 different combination of pairs (βF,βG)Sd1×Sd1 to make that approximation. This could become a problem from the computational point of view, as the complexity of the problem increases very fast when increasing the value of nβ.

As an alternative, we could consider using

DSx=Sd1×Sd1S(βF)x,(βG)xdβFβG,

instead of statistic (12), where dβFβG represents the uniform density on the torus of dimension d, Sd1×Sd1. This ensures, as before, that all pairs of directions are equally important. Thus, in practice, instead of using the approximation (13) we could consider

D^Sx=1mβr=1mβS(βrF)x,(βrG)x,

where (β1F,β1G),,(βmβF,βmβG) are pairs of random directions drawn uniformly from Sd1×Sd1, and where mβ would represent here the same as nβ2 before, with the advantage that it allows for more flexibility because it can assume non-squared values. A similar adaptation could be applied for the approximation of DTx in (14).

Remark 2.2

In the literature we can find papers, like for example [2] or [3], that use only one random projection. The main idea is to perform the test at hand for a randomly selected projection instead of for all possible projections. The use of projections results in a dimension reduction (as desired), and, despite being a procedure that may produce less powerful tests, the use of one single projection results in a reduction of the computational cost.

Following that idea, instead of testing the equality of covariate-projected ROC curves for all possible projections, we could test the equality of covariate-projected ROC curves for some random pair of projections given a certain xRX, meaning:

H0:ROC1(βF)x,(βG)x==ROCK(βF)x,(βG)xfor someβF,βG. (16)

The equivalence between this hypothesis and the one of interest in this paper given in (2) still needs theoretical justification. However, it is a possibility worth studying, if only for computational reasons. A way of perform this approach could be to consider the proposed methodology for nβ=1.

Additionally, in a practical situation we should take into account that the different magnitudes of the covariates that we are projecting may obscure the effect of some of them. In order to prevent this from happening we suggest standardizing the multidimensional covariate X before we start the analysis. Note that an ROC curve conditioned to a certain value x is the same as the ROC curve in which the covariate is modified by a one-to-one transformation and the ROC curve is conditioned to the corresponding transformed x value.

Given a non-degenerate multidimensional covariate X the standardization proposed here is to consider the multidimensional covariate Xs=B1(Xa), with B a diagonal matrix with (Var(X1),,Var(Xd)) in the diagonal and a=(E(X1),,E(Xd)). Then, for a given variable Y, a given yR and a certain value of the covariate x,

P(Yy|X=x)=P(Yy|B1(Xa)=B1(xa))=P(Yy|Xs=xs)),

with xs=B1(xa) and, thus,

ROCx(p)=1F(G1(1p|x)|x)=1F(G1(1p|xs)|xs)=ROCxs(p).

The standardization that takes place here does not care for the covariance between the covariates that conform X, as we are only interested on obtaining covariates with similar magnitudes. Also, in practice the standardization is made considering the sample mean and the sample standard deviation of the covariates at hand.

3. Simulations

In order to analyse the performance of the proposed methodology, simulations were run for the comparison of several dependent conditional ROC curves. On a first stage, these simulations were focused on analysing the behaviour of the unidimensional test described in Section 2.2, but we do not display them here, as they are very similar to the ones that can be found in [7]. Instead, we show the results for several scenarios (first under the null hypothesis and then under the alternative) in which we compare K ROC curves (with K{2,3}) conditioned to a ddimensional covariate (with d{2,3}).

All the curves used in the simulation study were drawn from location-scale regression models similar to the ones presented in (6) and (7), only that, in this case, the regression and the conditional standard deviation functions are for d-dimensional covariates. The construction of those curves is summarized in Table 1, were all the different conditional mean and conditional standard deviation functions are displayed.

Table 1.

Conditional mean and conditional standard deviation functions of the conditional ROC curves considered in the simulation study.

Covariate ROC curves Regression functions Conditional standard deviation functions
  ROC1x μ1F(x)=sin(0.5πx1)+0.1x2 μ1G(x)=0.5x1x2 σ1F(x)=0.5+0.5x1 σ1G(x)=0.5+0.5x1
x=(x1x2) ROC2x μ2F(x)=0.3+sin(0.5πx1)+0.1x2 μ2G(x)=0.5x1x2 σ2F(x)=0.5+0.5x1 σ2G(x)=0.5+0.5x1
  ROC3x μ3F(x)=sin(0.5πx1)+0.1x2 μ3G(x)=0.3+0.4x2+0.5x1x2 σ3F(x)=0.5+0.5x1 σ3G(x)=0.5+0.5x1
  ROC4x μ4F(x)=sin(0.5πx1)+0.1x2+0.5x3, μ4G(x)=0.5x1x2+x3 σ4F(x)=0.5+0.1x3, σ4G(x)=0.5+0.1x3
x=(x1x2x3) ROC5x μ5F(x)=sin(0.5πx1)+0.1x2+0.5x3, μ5G(x)=x1x2+x3 σ5F(x)=0.5+0.1x3 σ5G(x)=0.5+0.1x3
  ROC6x μ6F(x)=sin(0.5πx1)+0.1x2+0.5x3, μ6G(x)=0.3+0.5x1x2+x3 σ6F(x)=0.5+0.2x2+0.3x3 σ6G(x)=0.5+0.1x3

The regression errors were considered to have multivariate normal distribution with zero mean, variance one and correlation ρ for all the models.

In all scenarios the covariates X1F, X1G, X2F, X2G, X3F and X3G are uniformly distributed in the unit interval. Thus, the value of the multidimensional covariate x at which the conditional ROC curves should be compared is contained in [0,1]d. Particularly, the comparisons are made for x=(0.5,0.6) and for x=(0.5,0.6,0.5), for d = 2 and d = 3, respectively.

The study contains simulations for different sample sizes (nF,nG){(100,100), (250,150), (250,350)} and different values of ρ that represent different possible degrees of correlation between the diagnostic variables under comparison ( ρ{0.5,0,0.5}).

Moreover, two different functions ψ were considered for the construction of S(βF)x,(βG)x: one based on the L2measure and the other one based on the Kolmogorov-Smirnov criterion (from now on denoted by L2 and KS, respectively). The number of iterations used in the bootstrap algorithm was 200, and 500 data sets were simulated to compute the proportion of rejection in each scenario.

Furthermore, the number of directions that was used for approximating the test statistic DSx was taken as nβ=5 (as mentioned in Remark 2.1, notice that this means that nβ2=25 different pairs of directions were considered).

3.1. Level of the test

The scenarios that were considered for calibrating the level of the test (by comparing the same conditional ROC curves) are represented in Table 2. The results of the simulations obtained for nβ=5 are summarized in Figures 1 (for d = 2) and 2 (for d=3). Each subfigure represents the test of one scenario for a particular sample size. The nominal level considered is 0.05. The estimated proportion of rejections over 500 replications of the data sets is represented along with the rejection region of such nominal level. For the test to be well calibrated the estimated proportions should fall between the gray lines.

Table 2.

Scenarios under the null hypothesis considered for calibrating the level of the test.

graphic file with name CJAS_A_2116409_ILG0001.jpg

Figure 1.

Figure 1.

Estimated proportion of rejection under the null hypothesis and the corresponding limits of the critical region (in gray) for the level 0.05 (dotted black line) with d = 2 and nβ=5 for different sample sizes and different ρ.

Figure 2.

Figure 2.

Estimated proportion of rejection under the null hypothesis and the corresponding limits of the critical region (in gray) for the level 0.05 (dotted black line) with d = 3 and nβ=5 for different sample sizes and different ρ.

In general it can be said that the expected nominal level is reached, as most of the estimated proportions are close to the corresponding nominal level. The L2 statistic seems to overestimate the level in a few scenarios, but its behaviour improves when increasing the sample size. The KS statistic is a little more conservative.

3.2. Power of the test

On the other hand, the scenarios that were considered for studying the power of the test (by comparing different conditional ROC curves) are represented in Table 3. The results of the simulations are summarized in Figure 3 (for nβ=5). In those figures the first and second row represent the simulation results for the scenarios with K = 2 and K = 3, respectively, and the first and the second column represent the simulation results for d = 2 and for d = 3, respectively. In this case, only α=0.05 was considered.

Table 3.

Scenarios under the alternative hypothesis considered for calibrating the power of the test.

graphic file with name CJAS_A_2116409_ILG0002.jpg

Note: ROC1x and ROC4x are represented in purple, ROC2x and ROC5x in green, and ROC3x and ROC6x in yellow.

Figure 3.

Figure 3.

Estimated proportion of rejection under the alternative hypothesis for different sample sizes and different ρ, for nβ=5 ( α=0.05).

It can be seen that the power of the test grows with the considered sample sizes. The L2 statistic yields higher power than the KS statistic, which is consistent with KS being more conservative. Moreover, the difference between the conditional ROC curves considered for the case of d = 2 is bigger than the difference between the ROC curves in the scenarios with d = 3, which translates in higher power for the cases in which d = 2.

We can also observe that for each scenario, the highest power is always obtained for the cases in which the correlation of the diagnostic variables is ρ=0.5, and the lowest for ρ=0.5.

Note that for the scenario with d = 3 and ρ=0.5 the power of the test does not increase significantly from the first sample size to the second (in fact, for K = 3 it even decreases a little), but this can be due to the fact that the lower sample size has balanced data, ( nF, nG) being (100,100). whereas for the second sample size considered ( nF, nG) take the value (250,150). The highest sample size is also unbalanced, but not so much.

Remark 3.1

In order to evaluate the modification of the method proposed in Remarks 2.1 and 2.2 we have run simulations for the same scenarios previously described. We show here the results for the scenarios with K = 2 and d = 2 under the null and the alternative hypotheses for assessing the level and the power of the test, respectively. Similar conclusions were obtained with the rest of the scenarios. The parameters that are used here are the same as before, with the exception that now 1000 data-sets were simulated instead of 500.

Figure 4 shows the results of the simulations when considering the modification of Remark 2.1 for mβ=50 (first row) and mβ=25 (second row), and the results for considering only one random projection (Remark 2.2), i.e. mβ=nβ=1 (third row). Note that taking mβ=25 is comparable with nβ=5 used in the previous simulations (see first row of Figure 1), and that the results are very similar: the estimated proportion or rejections is a little overestimated for the L2 statistic for the smaller sample size and otherwise close to the nominal level, and the KS statistic is always more conservative. Increasing mβ from 25 to 50 does not seem to affect the results significantly, and neither does reducing it to a single random projection ( mβ=1).

In Figure 5 we can observe the results for the simulations under the alternative hypothesis, once again for mβ=50, mβ=25 and mβ=1. The firs two graphics are very similar to the one obtained for nβ=5 (see the first graphic of Figure 3), but from the last graphic it is obvious that by using only one random projection the power of the test decreases considerably (as it was expected).

In the light of these results it seems that the alternative methodology proposed in Remark 2.1 yields similar conclusions than the first proposal, with no noticeable gain when increasing the number mβ used to approximate the value of the statistic from 25 to 50. It remains an open problem to determine an optimal value for that parameter.

As for the idea mentioned in Remark 2.2, using only one random projection seems to produce a well calibrated test, despite having considerably lower power.

Figure 4.

Figure 4.

Estimated proportion of rejection under the null hypothesis and the corresponding limits of the critical region (in gray) for the level 0.05 (dotted black line) with K = 2, d = 2 and mβ=50,25,1 for different sample sizes and different ρ.

Figure 5.

Figure 5.

Estimated proportion of rejection under the alternative hypothesis for different sample sizes and different ρ, for nβ=50,25,1 and for the scenarios with K = 2 and d = 2.

3.3. Some extra simulations: changing the distribution of the covariates

The previous simulations were all obtained for scenarios in which the covariates are uniformly distributed in the unit interval. The point x at which the comparisons where made was selected ensuring that it has enough data around. However, in practice the covariates may not follow a uniform distribution and the may even behave differently in the diseased and the helathy populations. This may result in scenarios where the point x is closer to the boundaries of the supports of the covariates or in scenarios where it would be advisable to standardize the covariates with different magnitudes.

In this section we repeat the simulations for two of the scenarios considered previously, changing just the distribution of the covariates. The first one is the scenario under the null hypothesis where we compare two ROC curves equal to ROC1x (defined in Table 1) for d = 2, with ρ=0 and (nF,nG)=(250,150), at x=(0.5,0.6). The second one is the scenario under the alternative hypothesis where we compare two curves, ROC1x and ROC2x, also for d = 2, with ρ=0 and (nF,nG)=(250,150) and at x=(0.5,0.6). Both scenarios are represented in Tables 2 and 3. Note that it does not matter if we change the distribution of the covariates: the conditional ROC curve remains the same. The simulations were run in similar conditions as before, with mβ=25, 200 bootstrap iterations and 500 replications of the data sets.

We considered a total of 16 new models for the covariates, with different combinations of the distributions of X1F, X1G, X2F and X2G. Models A.1–A.4 follow uniform distributions; the difference between them is that the point x is almost at the centre of the support of the first model A.1, but as we move from A.1 to A.4 it gets closer and closer to the boundary of the support. The same applies for models B.1–B.4, but now with normal distributions. In models C.1–C.4 we add the feature that the the four variables have different distributions, and that the healthy and the diseased populations grow apart for each model. The distributions of the covariates in each case are described in Table 4. We also consider the models D.1–D.4, that used the same distributions than in C.1–C.4 but for which we applied the standardization recommended at the end of Section 2.

Table 4.

Distributions of the covariates considered for the simulations.

  X1F X1G X2F X2F
A.1 U(0,1) U(0,1) U(0,1) U(0,1)
A.2 U(0.1,1.1) U(0.1,1.1) U(0.1,1.1) U(0.1,1.1)
A.3 U(0.2,1.2) U(0.2,1.2) U(0.2,1.2) U(0.2,1.2)
A.4 U(0.3,1.3) U(0.3,1.3) U(0.3,1.3) U(0.3,1.3)
B.1 N(0.5,1/12) N(0.5,1/12) N(0.5,1/12) N(0.5,1/12)
B.2 N(0.6,1/12) N(0.6,1/12) N(0.6,1/12) N(0.6,1/12)
B.3 N(0.7,1/12) N(0.7,1/12) N(0.7,1/12) N(0.7,1/12)
B.4 N(0.8,1/12) N(0.8,1/12) N(0.8,1/12) N(0.8,1/12)
C.1 N(0.55,2/12) N(0.45,2/12) N(0.55,1/12) N(0.45,1/12)
C.2 N(0.6,2/12) N(0.4,2/12) N(0.6,1/12) N(0.4,1/12)
C.3 N(0.7,2/12) N(0.3,2/12) N(0.7,1/12) N(0.3,1/12)
C.4 N(0.8,2/12) N(0.2,2/12) N(0.8,1/12) N(0.2,1/12)

Note: Models D.1–D.4 are not included as they are a standardization of models C.1–C.4.

The results for the scenario under the null hypothesis are summarized in Figure 6, considering the 16 different combinations of covariate distribution for both the L2 and the KS type of statistic. The estimated proportion of rejections and its rejection region is presented as before for nominal level 0.05.

Figure 6.

Figure 6.

Estimated proportion of rejection under the null hypothesis and the corresponding limits of the critical region (in gray) for the level 0.05 (dotted black line) for different combinations of distribution for the bidimensional covariates (models A.1–A.4–D.1–D4) for both the L2 and the KS statistics.

The simulations show that the test is well calibrated regardless of the distribution of the covariates. It shows some problems when we take the point x too close to the boundaries or in a place where there is limited data (models .4), but it is also expected considering that the sample size that we are using is not too large, (nF,nG)=(250,150). Models .1 and .2 are comparable with the ones studied in previous sections, and only A.3 of the .3 models overestimates the level of the test for the L2 statistic. Also, the good behaviour of the test for models D.1–D.4 shows that the standardization proposed in this methodology does not affect the result of the test.

The results of the simulations for the scenario under the alternative hypothesis are collected in Table 5. Leaving aside the differences between the results for the L2 and the KS statistic, that were observed also in the previous sections, the power of the test does not change significantly when the covariate distribution changes, and it is comparable to the power obtained when we compared the same conditional ROC curves in the prevoius section. As it could be expected, it decreases when we move from the models with more data (.1) to the models with less data (.4) around the point x. Moreover, the power experiments some increasing from models in C to models in D for the L2 statistic, which means that the standardization does not worsen the behaviour of the test. Note that in this case the models in C do not need the standardization for the test to work, as the difference of magnitude of the covariates involved is minimal.

Table 5.

Estimated proportion of rejection under the alternative hypothesis for different combinations of distribution for the bidimensional covariates (models A.1–A.4–D.1–D4) for both the L2 and the KS statistics.

  L2 KS
Models .1 .2 .3 .4 .1 .2 .3 .4
A 0.670 0.636 0.590 0.530 0.520 0.454 0.426 0.386
B 0.650 0.614 0.580 0.538 0.470 0.472 0.420 0.386
C 0.642 0.626 0.602 0.588 0.464 0.458 0.440 0.424
D 0.652 0.638 0.614 0.600 0.452 0.458 0.438 0.424

4. Application

An illustration of the proposed test is displayed in this section through the analysis of the previously mentioned data set concerning 463 patients with pleural effusion. This data set has been provided by Dr. F. Gude, from the Unidade de Epidemioloxía Clínica of the Hospital Clínico Universitario de Santiago (CHUS), and it has been used for a previous study in [24].

From a medical perspective, the goal is to find a way to discriminate the patients in which the pleural effusion (PE) has a malignant origin (MPE) from those in which the PE is due to other non-cancer-related causes. 200 individuals form the sample had MPE (the diseased population in this context), against 263 who did not (healthy population). For that matter, two diagnostic markers were considered, the carbohydrate antigen 152 (ca125) and the cytokeratin fragment 21-1 (cyfra). Moreover, the information of two different covariates is also available: the age and the neuron-specific enolase (nse). Due to the characteristics of the data (positive values, most of them close to zero, with some extreme high values), logarithms of those variables – excluding the variable age – were considered for the study. Being the logarithm a monotone transformation, its use does not have an effect on the estimation of the common ROC curve. However, it does affect the estimation of the conditional ROC curves, as it reduces the effect of the more extreme values of the variables. A representation of the relationship of each one of those biomarkers with the two covariates is depicted in Figure 7, for both MPE (green) and the non-MPE (blue) patients.

Figure 7.

Figure 7.

Scatterplots of the two different diagnostic biomarkers in function of the two covariates considered: age and log(nse). Contour plots were added in the bidimensional marginal scatterplots. The healthy subjects are represented in blue and the diseased ones in green (in the printed version, the healthy subjects appear in a darker colour than the diseased subjects).

It can be observed that the shape of the point clouds of the two populations changes with the values of the covariates, specially in the case of the diseased population.

In order to evaluate whether the discriminatory capability of those markers ( Y1F and Y1G as the variables containing the information of log(ca125), and Y2F and Y2G as the variables containing the information of log(cyfra)) is the same when the covariates age and log(nse) are taken into account, the methodology explained in previous sections is applied, comparing their respective ROC curves conditioned to different values of the bidimensional covariate X=(X1,X2) with X1= age and X2=log(nse). In order to explore the advantages of using this method over the ones that do not consider multidimensional covariates, we also test the equivalence of the ROC curves of those diagnostic markers for the case in which no covariates are taken into account and for the case in which only one of the covariates is included in the analysis. Figure 8 shows how those two covariates are distributed in the diseased and healthy populations. The scatterplot included there highlights also the pair of values of the bidimensional covariate at which we are going compare the ROC curves in this analysis. Note that the covariates have different magnitudes: the values that the variable age takes are always going to be bigger than the values of log(nse). Thus, if we were to use the procedure directly over these variables, when projecting the multidimensional covariate X on any direction, the effect of the second component will be overshadowed by the first component's. To prevent this from happening we decided to use the standardized variables of X1 and X2 instead of the originals. This also affects the value x at which the conditional ROC curves are being compared. So in this case, instead of the multidimensional covariate X and the conditional value x, we consider the standardized version Xs=(X1X1¯SX1,X2X1¯SX2) and xx=(x1X1¯SX1,x2X1¯SX2), with SX1 and SX2 the sample standard deviation of the marginal covariates.

Figure 8.

Figure 8.

Histograms, boxplots and scatterplots (with the corresponding contour plots) of the two covariates considered (age and log(nse)). The healthy subjects are represented in blue and the diseased ones in green (in the printed version, the healthy subjects appear in a darker colour than the diseased subjects). The black histogram lines and the white boxplot correspond to the two populations of the healthy and the diseased patients combined. The red points in the scatterplot correspond to the values of x at which the ROC curves are compared.

We start the analysis of the performance of the two diagnostic markers by comparing their respective ROC curves without taking into account any covariate information. For that matter we use the method proposed by DeLong et al. [4]. The estimated ROC curves for both markers are depicted in Figure 9. The p-value obtain for that comparison was 0.138. Similar results were obtained when using other ways of comparing ROC without covariates (like [15] or [25]). Thus, we do not find significant differences between the two diagnostic variables in terms of diagnostic accuracy.

Figure 9.

Figure 9.

ROC curve estimation for both diagnostic variables (log(ca125) and log(cyfra), represented by the solid and the dashed line, respectively) without covariates and conditioned to different values of the covariates age and log(nse).

Next, we compare the two diagnostic markers taking into account a unidimensional covariate using the test proposed in [7] for dependent diagnostic markers. We consider the covariates age and log(nse), each one at a time. We test the equality of the ROC curves conditioned to the values of {51,67,83} in the case of age and the values of {0.92,1.14,3.27} in the case of log(nse). The corresponding ROC curve for every case is estimated in Figure 9. For each considered covariate and each value of the covariate we obtain a p-value of the test, summarized in Table 6. The test is made considering two types of statistics, one based on the L2-measure and the other in the Kolmogorov-Smirnov criteria, although both of them yield similar results. When comparing the ROC curves conditioned on different values of the age, the results are in line with the obtained for the previous case, in which no covariates where taken into account: the equality of the two curves is not rejected. However, when considering the covariate log(nse), we see that for a certain value (1.14) the null hypothesis is rejected (for a significance level of 5 %). This matches the representation of the conditional ROC curves depicted in Figure 9.

Table 6.

Results for the comparison of the ROC curves of the diagnostic markers log(ca125) and log(cyfra) when considering a unidimensional covariate, that covariate being the age or the log(nse).

age 51 67 83
p-values ( L2) 0.454 0.218 0.936
age 51 67 83
p-values (KS) 0.512 0.202 0.762
log(nse) -0.92 1.14 3.20
p-values ( L2) 0.844 0.012 0.470
log(nse) -0.92 1.14 3.20
p-values (KS) 0.900 0.008 0.412

Finally, we compare the performance of the two diagnostic variables considering the effect of both the age and the log(nse) at the same time. This is where we use the methodology proposed in this paper. We test the equality of their respective ROC curves conditioned to nine pairs of values of the two covariates: the ones obtained by making all the possible combinations of {51,67,83} and {0.92,1.14,3.27}. As before, two different type of statistics were considered: L2 and KS (and once again, the results are similar in both cases). The results obtained are summarized in Table 7.

Table 7.

Results for the comparison of the ROC curves of the diagnostic markers log(ca125) and log(cyfra) when considering the multidimensional covariate (age, log(nse)) for the L2 and the KS statistics (to the left and to the right, respectively).

log(nse)
age 51 67 83
3.20 0.026 0.056 0.010
1.14 0.152 0.070 0.004
-0.92 0.000 0.030 0.258
log(nse)
age 51 67 83
3.20 0.066 0.196 0.032
1.14 0.212 0.050 0.016
-0.92 0.004 0.048 0.424

Note that in this case we did not represent the estimated ROC curves conditioned to the bidimensional covariate (age,log(nse)). This is to stress the fact that, with this methodology, ROC^x (with x bidimensional) does not need to be computed at all.

The obtained p-values show that, depending on the pair of values of the covariate considered, we can find significative differences between the ROC curves of the log(ca125) and the log(cyfra) markers, including pairs of values that when considered separately in the previous test did not rejected the null hypothesis. Likewise, finding differences between the ROC curves conditioned to marginal covariates at certain values does not mean that those differences will be significant when considering the multidimensional covariates (for example, when we conditioned the ROC curves marginally to the value of 1.14 log(nse) we find differences, but when considering both covariates this difference between the ROC curves only remains significant for the age of 83).

This means that if two patients with pleural effusion entered the doctors office, patient A with 1.14 level of log(nse) and 67 years old, and patient B also with 1.14 level of log(nse) but 83 years old, we could apply this methodology and take into account those covariate values to personalize their diagnostic. Without including the covariates in the analysis both diagnostic methods (based on the log(ca125) and the log(cyfra)) would seem equally effective to detect MPE. However, when we take into account that multidimensional covariate information, we would see that for patient B (and not for patient A) there are significant differences between the two diagnostic markers.

5. Discussion

In this work a new non-parametric methodology has been presented for comparing two or more dependent ROC curves conditioned to the value of a continuous multidimensional covariate. This method combines existing techniques for reducing the dimension in goodness-of-fit tests and for estimating and comparing ROC curves conditioned to a one-dimensional covariate. Although in this paper we have used a induced regression models to include the covariate effect in the ROC curves, we believe that this test could be adapted and extended to other estimation techniques. This opens the door to future research that could include longitudinal or functional covariates, using, for example, the approach presented in [10] for extending the induced ROC methodology to functional covariates.

A simulation study was carried out in order to analyse the practical performance of the test. Two different functions were proposed for the construction of the statistic, the L2 and the KS, the second one being a little more conservative. Different correlations between the diagnostic variables and different sample sizes have been considered, including uneven ones without any appreciable effect on the test performance.

The behaviour of the test was also studied for different distributions of the covariates. It showed that, even if those distribution do not seem to affect the test, the lack of data around the point x at which we perform the test does reduce its power, so the points taken too close to the boundary of the sample range should be analysed with precaution.

Finally, the methodology was illustrated by means of an application to a data set: with this new test it was possible to detect differences on the discriminatory ability of two diagnostic variables conditioned to two different covariates without the need of an estimator of an ROC curve conditioned to a multidimensional covariate. With this application it becomes clear the importance of being able to include the effect of multidimensional covariates to the ROC curves analysis, as different conclusions could be drawn of the comparison of those curves when considering a multidimensional covariate, when considering unidimensional covariates or when excluding the covariates from the study. It can also be an important tool for the personalized medicine, as it allows us to compare different diagnostic methodologies using the personal information of each patient.

Acknowledgements

The authors would like to thank the Associate Editor and the anonymous reviewers for their constructive comments and suggestions on an earlier version of this manuscript. The Supercom puting Center of Galicia (CESGA) is acknowledged for providing the computational resources that allowed to run most of the simulations. Dr. F. Gude (Unidade de Epidemioloxía Clínica, Hospital Clínico Universitario de Santiago) is thanked for providing the data set analysed in this article.

Appendix A. Proofs

The proofs needed for Lemma 2.1 are presented below.

Lemma A.1

[5] or [3]: Given a random variable Y such that E|Y|<,

E[Y|X]=0a.s.E[Y|βX]=0a.s. for any vector βSd1.

From now on it will be assumed that all directions β considered satisfy βSd1.

Lemma A.2

Let Y1,,YK be K dependent random variables with cumulative distribution functions F1,,FK, respectively, such that E|Yk|< for every k{1,,K}. Let X be a multidimensional covariate. Then, given c1,,cK,

F1(c1|X)==FK(cK|X)a.s.F1β(c1|βX)==FKβ(cK|βX)a.s.β, (A1)

with βSd1 and where Fiβ(ci|βX)=P(Yici|βX=βX) for i{1,,K}.

Proof.

It is proven for K = 2.

Seeing that F1(c1|X)=F2(c2|X)a.s. is the same as proving that E[I(Y1c1)|X]=E[I(Y2c2)|X]a.s., which, given that the random variables are dependent in the sense that they are conditioned to the same covariate X, is equivalent to E[I(Y1c1)I(Y2c2)|X]=0a.s. Now, because of Lemma A.1, this is the same as saying that

E[I(Y1c1)I(Y2c2)|βX]=0a.s.β.

Using again the dependence between the random variables, that is equivalent to E[I(Y1c1)|βX]=E[I(Y2c2)|βX]a.s.β, which in turn is equivalent to F1β(c1|βX)=F2β(c2|βX)a.s.β.

Definition A.3

The inverted conditional ROC curve ( IROC ) is defined as:

IROC(p)=1G(F1(1q)),q(0,1).

Related to the previous definition, the inverted conditional ROC curve ( IROCx), given the pair (xF,xG)RXF×RXG, is defined as:

IROCxG,xF(q)=1G(F1(1q|xF)|xG),q(0,1).

Lemma A.4

The equality of ROC curves is equivalent to the equality of the inverted ROC curves, i.e.

ROC1(p)==ROCK(p)p(0,1)IROC1(q)==IROCK(q)q(0,1).

Moreover, the same property holds when talking about conditional ROC curves. Given the pair (xF,xG)RXF×RXG, ROC1xF,xG(p)==ROCKxF,xG(p)p(0,1) holds if and only if IROC1xG,xF(q)==IROCKxG,xF(q)q(0,1).

Proof.

It is proven for the unconditional case, and for K = 2. The conditional case is similar.

ROC1(p)=ROC2(p)p(0,1)1F1(G11(1p))=1F2(G21(1p))p(0,1).

Take q=1F2(G21(1p)) (and hence, q=ROC2(p)). q will take all the values in (0,1), and thus, p=1G2(F21(1q))=IROC2(q). Then,

ROC1(p)=ROC2(p)p(0,1)1F1(G11(1(1G2(F21(1q)))))=qq(0,1)1G2(F21(1q)=1G1(F11(1q))q(0,1)IROC2(q)=IROC1(q)q(0,1).

Proof Proof of Lemma 2.1 —

It is proven for K = 2. For p(0,1),

ROC1x(p)=ROC2x(p)a.s.F1(G11(1p|x)|x)=F2(G21(1p|x)|x)a.s.

Using Lemma A.2, that is equivalent to

F1βF(G11(1p|x)|(βF)x)=F2βF(G21(1p|x)|(βF)x)a.s.βF,

which in turn is the same as saying that ROC1(βF)x,x(p)=ROC2(βF)x,x(p)a.s.βF. Now, using Lemma A.4 we know that this is equivalent to

IROC1x,(βF)x(q)=IROC2x,(βF)x(q)a.s.βFforq(0,1).

By the definition of the inverted conditional ROC curve, that is the same as saying that

G1((F1βF)1(1q|(βF)x)|x)=G2((F2βF)1(1q|(βF)x)|x)a.s.βF.

Using again the result of Lemma A.2, the previous statement is equivalent to

G1βG((F1βF)1(1q|(βF)x)|(βG)x)=G2βG((F2βF)1(1q|(βF)x)|(βG)x)a.s.βF,βG.

By definition, that is the same as saying that IROC1(βG)x,(βF)x(q)=IROC2(βG)x,(βF)x(q)a.s.βF,βG and, using again Lemma A.4, that is equivalent to

ROC1(βF)x,(βG)x(p~)=ROC2(βF)x,(βG)x(p~)a.s.βF,βGfor p~(0,1),

where FiβF(c|(βF)x)=P(YiFc|(βF)XF=(βF)x) and GiβG(c|(βG)x)=P(YiGc|(βG)XG=(βG)x) for i = 1, 2.

Funding Statement

The research of A. Fanjul-Hevia is supported by the Ministerio de Educación, Cultura y Deporte (fellowship FPU14/05316), as well as by the Spanish Ministerio de Educación y Formación Profesional (Mobility Grant EST18/00673). A. Fanjul-Hevia, W. González-Manteiga and I. Van Keilegom acknowledge the support by the Grant PID2020-116587GB-I00 from Spanish Ministerio de Ciencia e Innovación (MCIN/AEI/ 10.13039/501100011033). J.C. Pardo-Fernández acknowledges financial support by the Grant PID2020-118101GB-I00 from Spanish Ministerio de Ciencia e Innovación (MCIN/AEI/10.13039/501100011033). I. Van Keilegom is financially supported by the European Research Council (2016-2021, Horizon 2020 / ERC grant agreement No. 694409).

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Colling B. and Van Keilegom I., Goodness-of-fit tests in semiparametric transformation models using the integrated regression function, J. Multivar. Anal. 160 (2017), pp. 10–30. [Google Scholar]
  • 2.Cuesta-Albertos J.A., Fraiman R., and Matrán C., The random projection method in goodness of fit for functional data, Comput. Stat. Data. Anal. 51 (2007), pp. 4814–4831. [Google Scholar]
  • 3.Cuesta-Albertos J.A., García-Portugués E., Febrero-Bande M., and González-Manteiga W., Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes, Ann. Stat. 47 (2019), pp. 439–467. [Google Scholar]
  • 4.DeLong E.R., DeLong D.M., and Clarke-Pearson D.L., Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach, Biometrics 44 (1988), pp. 837–845. [PubMed] [Google Scholar]
  • 5.Escanciano J.C., A consistent diagnostic test for regression models using projections, Econ. Theory. 22 (2006), pp. 1030–1051. [Google Scholar]
  • 6.Fanjul-Hevia A. and González-Manteiga W., A comparative study of methods for testing the equality of two or more ROC curves, Comput. Stat. 33 (2018), pp. 357–377. [Google Scholar]
  • 7.Fanjul-Hevia A., González-Manteiga W., and Pardo-Fernández J.C., A non-parametric test for comparing conditional ROC curves, Comput. Stat. Data. Anal. 157 (2021), pp. 107146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.García-Portugués E., González-Manteiga W., and Febrero-Bande M., A goodness-of-fit test for the functional linear model with scalar response, J. Comput. Graph. Stat. 23 (2014), pp. 761–778. [Google Scholar]
  • 9.González-Manteiga W., Pardo-Fernández J.C., and Van Keilegom I., ROC curves in non-parametric location-scale regression models, Scand. J. Stat. 38 (2011), pp. 169–184. [Google Scholar]
  • 10.Inácio V., González-Manteiga W., Febrero-Bande M., Gude F., Alonzo T.A., and Cadarso-Suárez C., Extending induced ROC methodology to the functional context, Biostatistics 13 (2012), pp. 594–608. [DOI] [PubMed] [Google Scholar]
  • 11.Inácio de Carvalho V., Jara A., Hanson T.E., and de Carvalho M., Bayesian nonparametric ROC regression modeling, Bayesian Anal. 8 (2013), pp. 623–646. [Google Scholar]
  • 12.Kim E., Zeng D., and Zhou X.-H., Semiparametric transformation models for multiple continuous biomarkers in ROC analysis, Biom. J. 57 (2015), pp. 808–833. [DOI] [PubMed] [Google Scholar]
  • 13.Li J., Applications of the bootstrap in roc analysis, Commun. Stat. Simul. Comput. 41 (2012), pp. 865–877. [Google Scholar]
  • 14.Martínez-Camblor P. and Corral N., A general bootstrap algorithm for hypothesis testing, J. Stat. Plan. Inference. 142 (2012), pp. 589–600. [Google Scholar]
  • 15.Martínez-Camblor P., Carleos C., and Corral N., General nonparametric ROC curve comparison, J. Korean. Stat. Soc. 42 (2013), pp. 71–81. [Google Scholar]
  • 16.Muller M.E., A note on a method for generating points uniformly on n-dimensional spheres, Commun. ACM. 2 (1959), pp. 19–20. [Google Scholar]
  • 17.Pardo-Fernández J.C., Rodríguez-Álvarez M.X., and Van Keilegom I., A review on ROC curves in the presence of covariates, Revstat Stat. J. 12 (2014), pp. 21–41. [Google Scholar]
  • 18.Patilea V., Sánchez-Sellero C., and Saumard M., Testing the predictor effect on a functional response, J. Amer. Statist. Assoc. 111 (2016), pp. 1684–1695. [Google Scholar]
  • 19.Pepe M.S., The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford University Press, Oxford, 2003. [Google Scholar]
  • 20.Rodríguez A. and Martínez J.C., Bayesian semiparametric estimation of covariate-dependent ROC curves, Bioestatistics 15 (2014), pp. 353–369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Rodríguez-Álvarez M.X., Roca-Pardiñas J., and Cadarso-Suárez C., A new flexible direct ROC regression model: Application to the detection of cardiovascular risk factors by anthropometric measures, Comput. Stat. Data. Anal. 55 (2011), pp. 3257–3270. [Google Scholar]
  • 22.Rodríguez-Álvarez M.X., Roca-Pardiñas J., Cadarso-Suárez C., and Tahoces P.G., Bootstrap-based procedures for inference in nonparametric receiver-operating characteristic curve regression analysis, Stat. Methods. Med. Res. 27 (2018), pp. 740–764. [DOI] [PubMed] [Google Scholar]
  • 23.Su J.Q. and Liu J.S., Linear combinations of multiple diagnostic markers, J. Amer. Statist. Assoc. 88 (1993), pp. 1350–1355. [Google Scholar]
  • 24.Valdés L., San-José E., Ferreiro L., González-Barcala F.-J., Golpe A., Álvarez-Dobaño J.M., Toubes M.E., Rodríguez-Núñez N., Rábade C., Lama A., and Gude F., Combining clinical and analytical parameters improves prediction of malignant pleural effusion, Lung 191 (2013), pp. 633–643. [DOI] [PubMed] [Google Scholar]
  • 25.Venkatraman E.S. and Begg C.B., A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment, Biometrika 83 (1996), pp. 835–848. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES