Abstract
Many researchers have addressed the problem of finding the optimal linear combination of biomarkers to maximize the area under receiver operating characteristic (ROC) curves for scenarios with binary disease status. In practice, many disease processes such as Alzheimer can be naturally classified into three diagnostic categories such as normal, mild cognitive impairment and Alzheimer’s disease (AD), and for such diseases the volume under the ROC surface (VUS) is the most commonly used index of diagnostic accuracy. In this article, we propose a few parametric and nonparametric approaches to address the problem of finding the optimal linear combination to maximize the VUS. We carried out simulation studies to investigate the performance of the proposed methods. We apply all of the investigated approaches to a real data set from a cohort study in early stage AD.
Keywords: diagnostic accuracy, linear combinations, ordinal categories, volume under the ROC surface
1. Introduction
Multiple diagnostic tests are often performed on the same individual to provide clinicians as much information as possible in order to make more accurate disease diagnosis as it is becoming increasingly clear that one single diagnostic test or biomarker is not sufficient to serve as an optimal screening device for early detection or prognosis [1]. It is therefore of critical importance to combine the information available in an optimal way to improve overall diagnostic accuracy [2].
When the diagnostic outcome is binary, that is, non-diseased and diseased, the receiver operating characteristic (ROC) curves and the area under the ROC curves (AUC) are commonly used diagnostic accuracy measures. Many conditions are conceptualized as having a normal stage, an early/mild/prodromal stage, and a late/diagnosable/fully symptomatic stage. For example, mild cognitive impairment and/or early stage Alzheimer’s disease (AD) is a transitional stage between the cognitive changes of normal aging and the more serious AD. More details can be seen here [3].
With three ordinal diagnostic categories, the ROC surface, analogous to ROC curve, as well as the volume under the ROC surface (VUS), analogous to AUC, have been proposed to assess diagnostic accuracy [3,4]. Let S1, S2, and S3 denote the scores resulting from a diagnostic test or biomarker, and let F1, F2, and F3 be the corresponding cumulative distribution functions for non-diseased, intermediate, and diseased subjects, respectively. Assume that the results of a diagnostic test are measured on a continuous scale and higher values indicate greater severity of the disease. Let p1 = F1. (c1), p3 = 1 – F3.(c3), where C1 and C3 are threshold values (C1 < C3), be the true classification rates for non-diseased and diseased categories, respectively. Then the probability that a randomly selected subject from the intermediate group has a score between C1 and C3 is
(1) |
The probability p2 is guaranteed positive because of the imposed order restriction of c1 < c3 such that .
For a pair of thresholds (c1, c3), we could compute the true classification rate p2 for the intermediate category. The triplet (p1, p2, p3), where p2 = p2(p1, p3) being a function of (p1, p3), would produce an ROC surface in the three-dimensional space for all possible (c1, c3) ε ℝ2. The VUS is then defined as
(2) |
This is a generalization of the AUC for a binary classification. As in Xiong et al. [3], under the normality assumption , d = 1, 2, 3, the VUS can be further expressed as
(3) |
where a = σ2/σ1, b = (μ1 − μ2)/σ1, c = σ2/σ3, d = (μ3 − μ2)/σ3, ϕ (·) is the standard normal distribution function, and ϕ(·) is the standard normal density function. One could show that VUS is mathematically equivalent to the probability P(S1 < S2 < S3), where S1, S2, and S3 are scores for randomly selected individuals from corresponding diagnostic category. For a useless test (when S1, S2, and S3 have identical distributions), VUS is 1/6. Notice that the unbiased nonparametric Mann–Whitney U statistic of the VUS is given by
(4) |
where n1, n2, and n3 are the sample sizes for non-diseased, intermediate, and diseased subjects, respectively, and I.(·) stands for the indicator function.
The problem of finding optimal combinations of diagnostic tests and biomarkers with binary diagnostic categories has been well addressed in literature. Su and Liu [5] derived an optimal linear combination that maximizes the AUC when the biomarkers in the non-diseased and diseased categories follow normal distributions. Without assumptions on the distributions of the biomarkers, Pepe and Thompson [6] considered an empirical solution of the optimal linear combination that maximizes the Mann–Whitney statistic. However, when the number of biomarkers is large, this approach is computationally formidable. Recently, Liu et al. [7] developed a min–max combination approach that only involves searching for a single coefficient that maximizes the Mann–Whitney U statistic of AUC.
Whereas several studies address optional selection of weights for binary outcomes, the problem of finding the optimal linear combinations has rarely been addressed for outcomes with three ordinal diagnostic categories. Nevertheless, it is of paramount importance to develop such combinations for biomarkers with three disease categories for the purpose of maximizing diagnostic accuracy. The importance can be seen through the data example on AD. Because AD is irreversible and no pharmaceutical treatments are effective for late stages, it is critical to accurately diagnose AD at its early stage. However, as presented in Xiong et al. [3], none of the current psychometric tests can be considered as excellent with the estimated VUS ranging from 0.522 to 0.752. Therefore, it is important to develop a composite score derived from a linear combination of biomarkers for better diagnostic accuracy.
The goal of this manuscript is twofold: (1) to present parametric and nonparametric combination approaches for the purpose of maximizing the most important diagnostic accuracy index for three-category outcomes, namely, the VUS; (2) to empirically compare the performance of the proposed methods. We organize the rest of our article as follows. In Section 2, we extend two existing combination methods for binary outcomes (i.e., the logistic regression approach and the min–max approach) to maximize VUS for three-category outcomes. In Section 3, we propose a new parametric approach and a new nonparametric approach. We present simulation studies in Section 4 for investigating the performance of different combination methods in maximizing VUS. In Section 5, we apply the proposed approaches as well as the extensions to a real data set of 118 subjects from a cohort study in early stage AD from the Washington University Knight Alzheimer’s Disease Research Center to combine diagnostic tests to increase the accuracy of discriminating different stages of AD. We present a broader discussion on deriving linear combinations of diagnostic tests and biomarkers to improve the diagnostic accuracy in Section 6.
2. Extensions of existing methods
We can easily extend two existing methods for binary outcomes, namely, the logistic regression method and the min–max method, to outcomes with three ordinal disease categories. In the following, Section 2.1 presents the notation, and Sections 2.2 and 2.3 will discuss these two extensions.
2.1. Notation
Suppose we have p diagnostic tests or biomarkers available on each individual. We denote the diagnostic category as D = d, where d = 1, 2, 3 stands for non-diseased, intermediate, and diseased subjects, respectively. Let
be the p-dimensional observed scores from a random sample of size n1 in the non-diseased category,
be the p-dimensional observed scores from a random sample of size n2 in the intermediate category, and
be the p-dimensional observed scores from a random sample of size n3 in the diseased category. We often stack the data together in a matrix form
where the first column indicates the diagnostic category and the other p columns form the matrix of observed scores concatenated from Xi, Yj, and Zk by row. For simplicity, we use Mp to denote p-variate observed scores for an individual from any diagnostic category.
2.2. The cumulative logistic regression approach
When we use a logistic regression model to model a binary outcome, we can obtain linear coefficients for multiple predictors. With three ordinal diagnostic categories, the cumulative logistic model has the form
where c is a vector coefficient of length p, and α0, β0 are two intercepts. For modeling an outcome with three or more categories, we also frequently use the multinomial logistic regression, although it is known that if the outcome variable is truly ordered, which is the case in this article, cumulative logistic regression will make the model more parsimonious. Also, the multinomial logistic regression would produce more than one set of vector coefficients for predictor variables, which is meaningless for the purpose of combinations. Therefore, we investigate the performance of the combined marker using c obtained from cumulative logistic regression.
For modeling a binary outcome, we use the logistic regression to maximize the logistic likelihood function. For such model, Jin and Lu [8] proved that c from a fitted logistic regression is the optimal linear combination in the sense that it provides the highest sensitivity uniformly over the entire range of specificity and therefore yields the largest AUC among all possible linear combinations. This impressive result, however, depends on the strong assumption that the binary response variable (i.e., disease status) is generated through a link function of predictors. As a matter of fact, in practice, disease status is not generated this way. Usually, a binary gold standard is used to determine disease status, and multiple biomarkers are measured without knowing any information on disease status. Furthermore, this result does not assume any joint distributions for multiple predictors. Therefore, it cannot include Su and Liu’s [5] method as a special case, in which multivariate normality is a fundamental assumption.
For three-category outcomes, the result from Jin and Lu [8] has not been extended to three-category cases. Despite the lack of analytical results, cumulative logistic regression still offers a possible combination method for the scenarios with three-category outcomes. Therefore, it is of interest to investigate the performance of the combination of biomarkers using c from a fitted cumulative logistic regression for the purpose of maximizing the VUS.
2.3. The min–max combination approach
With binary diagnostic categories, Pepe and Thompson [6] proposed to estimate the optimal linear combination coefficient c by maximizing the Mann–Whitney U statistic (i.e., the empirical estimate of AUC) as follows,
(5) |
where I(·) stands for the indicator function. Pepe and Thompson [6] also pointed out that because the Mann-Whitney statistic estimate of AUC is not a continuous function of c, a search rather than a derivative-based method is required for this maximization. It means that general-purpose optimization algorithms such as conjugate-gradient or Newton-type methods are not appropriate for this maximization. They illustrated the idea with an application involving only two markers. In that case, the computation is relatively easy. However, when the number of markers is large, that is, ≥ 3, this approach is computationally inaccessible.
To address such computational difficulty, Liu et al. [7] proposed a nonparametric min–max approach that linearly combines only the minimum and maximum values of the p markers to maximize the AUC, that is,
(6) |
where
and
Such a combination only involves searching for a single combination coefficient and thus is computationally efficient. They showed that, under certain circumstances, the proposed min–max combination may yield larger AUC than empirical search of c by Pepe and Thompson [6]. This min–max combination approach can be easily extended to the cases with three ordinal diagnostic categories by maximizing
where Xi,max, Xi,min, Yj,max, and Yj,min are defined as earlier and
3. The proposed methods
In this section, we will propose two new approaches for linearly combining markers to improve the VUS. The first approach requires the assumption of multivariate normality and is designed to maximize the penalized/scaled stochastic distance between three ordinal diagnostic categories. The second distribution-free stepwise approach aims to find the optimal combination empirically by maximizing the Mann–Whitney statistic of the VUS at each step.
3.1. The penalized/scaled stochastic distance method based on normality
Assume that Xi, Y j, Zk follow a multivariate normal distribution Np (μ1, Σ1), Np (μ2, Σ2), and Np (μ3, Σ3), respectively. The problem of interest is to obtain a vector combination coefficient c such that the univariate scores S1 = Xi c, S2 = Y j c, and S3 = Zkc have the largest overall discriminating ability to classify subjects into their corresponding disease category, in this case, yielding the largest VUS. Notice that under normality assumption, Sd (d = 1, 2, 3) follows a univariate normal distribution N (c′μd, c′Σd c).
Because the VUS is equal to P(S1 < S2 < S3), where S1, S2, and S3 are univariate scores after combination for a randomly selected individual from each diagnostic category, it is reasonable to conclude that the larger stochastic distance between Sd (d = 1, 2, 3), the larger VUS would be. Because of the fact that mean and variance completely characterize the normal distribution, we will define stochastic distance between normally distributed random variables on the basis of functions of mean and variance.
For Sd ∼ N (c′μd, c′ Σdc) (d = 1, 2, 3) measures the between-group variation, where is the mean of μd ’s, and measures the total within-group variation. In an ideal situation, we want the quantity to be as large as possible while at the same time keeping minimal because these are two necessary conditions for large separation of distributions underlying Sd, d = 1;2;3. An intuitive penalized stochastic distance (PSD) could be defined as
(7) |
such that c maximizing and simultaneously minimizing may be obtained once by maximizing PSD. With some rearrangement,
Lemma 3.1
Let A be a p × p real symmetric matrix with (real) eigenvalues λ1 ≥ λ2≥ …≥ λp and a corresponding set of orthonormal eigenvectors u1,u2, …, up, that is, , where I(·) stands for the indicator function, such that Aui = λiui. Then for any x ε ℜp and x ≠ 0, max||x||=1 x′Ax = λ1, and the maximum occurs when x = u1.
Lemma 3.1 directly follows from Raleigh–Ritz Theorem [9]. Therefore, the c that maximizes PSD is the eigenvector corresponding to the largest eigenvalue of . Notice that it is not necessary to normalize the eigenvector to obtain c as indicated in Lemma 3.1 because the eigenvectors are unique apart from a scalar and the VUS associated with the linear combination c is invariant to a scaling constant.
However, this newly defined PSD might have some potential problems. For example, in an extreme case, could be singular. For this reason, we also consider a scaled stochastic distance (SSD) defined as follows,
(8) |
such that, again, c maximizing and simultaneously minimizing may be obtained by maximizing SSD. This definition of SSD is similar to a natural extension of Fisher discriminant for multi-category linear discriminant analysis [10], except that here we do not assume the common variance matrix Σd = Σ, d = 1, 2, 3, as it is a too strong assumption across three ordinal diagnostic categories.
Lemma 3.2
Let A be a real p × p symmetric matrix, and let B be any p × p positive definite matrix. Let λ1 ≥ λ2 ≥ … ≥ λp be the eigenvalues of B−1A with a corresponding set of right eigenvectors u1,u2,…,up (all of which are real), that is, B−1 Aui = λiui. Then for any x ε ℜp and x ≠ 0, , with the bounds being attained when x = u1. In particular, for any a we have , and the maximum occurs when x = B−1a, apart from some scaling constant.
Lemma 3.2 follows from Theorem 6.59 (Seber [11], pp. 109–110). To obtain the maximum of SSD in Equation (8), we can obtain c as the eigenvector corresponding to the largest eigenvalue of based on Lemma 3.2. In practice, the mean and variance for each disease category can be estimated from the data, and then the estimates can be substituted into the preceding formulas for calculating the combination coefficient c.
Remark
For the scenarios with binary disease status,
the maximum occurs when from Lemma 3.2. Apart from the constant , this result is exactly the same as that in Su and Liu [5]. In this sense, our proposed SSD method coincides with Su and Liu’s method for binary disease outcomes.
Generally speaking, the term cannot be written as aa′ for some a; thus, a closed-form solution does not exist. However, eigenvalues and eigenvectors of a square matrix can be easily computed using statistical packages, such as eigen() in R and call eigen() in SAS/IML, and therefore obtaining the vector combination coefficient c using these proposed PSD or SSD methods is numerically straightforward.
3.2. The distribution-free stepwise approach
The preceding approach makes use of the assumption of multivariate normality. We now consider maximizing VUS without normality assumption. The empirical estimate of VUS of the combination c is
This is a three-category generalization of Pepe and Thompson [6]. When the number of markers p is large, that is, ≥ 3, the empirical search for c is computationally inaccessible. The nonparametric min-max procedure by Liu et al. [7] is easy to implement; however, it comes with a few drawbacks: (1) feasibility might be an issue when not all biomarkers are measured on the same scale; (2) the approach may be an inefficient use of the data as it only considers the minimum and maximum values; (3) interpretation of the estimated combination coefficient is difficult.
To overcome all the shortcomings of the current existing nonparametric combination methods, we will develop a distribution-free approach that combines the diagnostic tests or the scores of all the biomarkers in a stepwise fashion. We consider two stepwise proceeding procedures, that is, step-down and step-up, which we describe in detail in the following, using the step-down procedure as an example:
-
(1)
Estimate VUS for each of p diagnostic tests or biomarkers on the basis of the Mann–Whitney statistic by Equation (4);
-
(2)
Assign the order from 1 to p for each diagnostic test or biomarker on the basis of their estimated VUS from the largest to the smallest.
-
(3)
Combine the first two markers (i.e., markers with the first two largest VUS) using empirical search for combination coefficients presented by Pepe and Thompson [6].
-
(4)
Having derived the univariate composite score in step 3 by linearly combining the first two markers, combine it with the third marker (i.e., marker with the third largest VUS) using empirical searching combination again.
-
(5)
Proceed in this fashion until the ordered pth marker (i.e., marker with smallest VUS) is included in the linear combination.
The estimated combination coefficient by searching needs to be saved in each step, and in the end, the order of p’s combination coefficients needs to be adjusted to match their corresponding markers. The step-up procedure is exactly the same as the step-down one except that in step 2, the order from 1 to p for each diagnostic test or biomarker is assigned on the basis of their estimated VUS from the smallest to the largest.
Given p biomarkers, there exist p! ways of permuting them, and hence there exist p! stepwise procedures. The proposed step-down and step-up procedures are just two out of those p! ways. However, when p is relatively large, it is not feasible to carry out all the p! ways. For example, when p ≥ 50, there exist 50! ≈ 3 × 1064 stepwise procedures. Another reason that we only consider step-down and step-up procedures is rooted in order restricted inference [12], where it is argued that any other stepwise method selecting different proceeding orders would have performance somewhere in between the step-down and step-up procedures.
The advantages of our proposed stepwise approach are as follows: (1) it is distribution-free, and therefore it is robust; (2) it is easy to implement with computer iterations, and therefore it offers a relief from the computational burden in the empirical search of combination coefficients in p-dimensional space as p > 2 as encountered in Pepe and Thompson [6]; (3) simulation studies in Section 4 demonstrate that the stepwise approach (especially the step-down one) may outperform the other methods under some scenarios, and for other scenarios, its performance is comparable with that of other methods.
4. Simulation studies
We conduct simulations to investigate the performance of the different combination methods as it is difficult, if not impossible, to analytically evaluate the performance of the aforementioned methods. For in Equations (7) and (8), both weighted and un-weighted versions are calculated as follows: and . Overall, we empirically investigate the performance of eight approaches, namely, the SSD method with (SSD1), the SSD method with (SSD2), the PSD method with , the PSD method with (PSD2), the step-down procedure that proceeds from the marker with largest VUS to the one with smallest VUS (SW1), the step-up procedure that proceeds from the marker with smallest VUS to the one with largest VUS (SW2), the min–max approach extended to three diagnostic categories (Min–max), and the linear combination coefficients from cumulative logistic regression (Cum-logistic).
To investigate the performance of all eight approaches empirically, we consider six different settings of the joint distributions of five diagnostic tests (p = 5). For each setting, we generate multivariate observations from the underlying distributions with different sample sizes. We calculate the univariate composite scores S1i, S2j, and S3k by combining the observed data using the estimated c from a specific combination method; and then we estimate the VUS of the combined marker using the unbiased Mann-Whitney statistic in Equation (4). For each setting, we conduct 10,000 Monte Carlo repetitions. For each method, we report the mean VUS as well as the chance of obtaining the largest VUS across 10,000 Monte Carlo repetitions in Tables I–VI.
Table I.
(n1, n2, n3) | SSD1 | SSD2 | PSD1 | PSD2 | Cum-logistic | SW1 | SW2 | Min–max |
---|---|---|---|---|---|---|---|---|
(20, 20, 20) | 0.9135 (0.194) |
— — |
0.8916 (0.010) |
— — |
0.9113 (0.183) |
0.9216 (0.511) |
0.9100 (0.085) |
0.8566 (0.015) |
(20, 30, 50) | 0.9095 (0.084) |
0.9095 (0.080) |
0.8883 (0.002) |
0.8894 (0.002) |
0.9079 (0.158) |
0.9147 (0.601) |
0.9051 (0.066) |
0.8519 (0.008) |
(30, 40, 50) | 0.9074 (0.084) |
0.9074 (0.084) |
0.8885 (0.001) |
0.8889 (0.001) |
0.9088 (0.201) |
0.9111 (0.576) |
0.9027 (0.050) |
0.8504 (0.004) |
(50, 50, 50) | 0.9057 (0.176) |
— — |
0.8880 (0.002) |
— — |
0.9072 (0.242) |
0.9082 (0.541) |
0.9006 (0.039) |
0.8483 (0.001) |
Simulation setting: normal data with equal variance Σ1 =Σ2 = Σ3 = 0.7 × I5×5 + 0.3 × J 5×5.
SSD1, scaled stochastic distance method with accounting for unbalanced sample size; SSD2, scaled stochastic distance method with accounting no unbalanced information; PSD1, penalized stochastic distance method with accounting for unbalanced sample size; PSD2, penalized stochastic distance method with accounting no unbalanced information; SW1, step-down procedure (stepwise method proceeding from marker with largest VUS to smallest VUS); SW2, step-up procedure (stepwise method proceeding from marker with smallest VUS to largest VUS); Min–max, min–max approach implemented for three diagnostic categories; Cum-logistic, linear combination coefficients from cumulative logistic regression.
Table VI.
(n1, n2, n3) | SSD1 | SSD2 | PSD1 | PSD2 | Cum-logistic | SW1 | SW2 | Min–max |
---|---|---|---|---|---|---|---|---|
(20, 20, 20) | 0.7808 (0.080) |
— — |
0.7740 (0.018) |
— — |
0.7829 (0.015) |
0.8252 (0.798) |
0.8116 (0.088) |
0.6870 (0.003) |
(20, 30, 50) | 0.7782 (0.019) |
0.7780 (0.017) |
0.7702 (0.002) |
0.7733 (0.004) |
0.7815 (0.018) |
0.8115 (0.895) |
0.8020 (0.044) |
0.6775 (0.000) |
(30, 40, 50) | 0.7771 (0.017) |
0.7771 (0.018) |
0.7723 (0.002) |
0.7732 (0.003) |
0.7879 (0.030) |
0.8077 (0.892) |
0.8000 (0.039) |
0.6743 (0.000) |
(50, 50, 50) | 0.7767 (0.024) |
— — |
0.7724 (0.006) |
— — |
0.7912 (0.043) |
0.8050 (0.886) |
0.7986 (0.041) |
0.6709 (0.000) |
Simulation setting: normal-χ2-log-normal-exponential-gamma copula data.
SSD1, scaled stochastic distance method with accounting for unbalanced sample size; SSD2, scaled stochastic distance method with accounting no unbalanced information; PSD1, penalized stochastic distance method with accounting for unbalanced sample size; PSD2, penalized stochastic distance method with accounting no unbalanced information; SW1, step-down procedure (stepwise method proceeding from marker with largest VUS to smallest VUS); SW2, step-up procedure (stepwise method proceeding from marker with smallest VUS to largest VUS); Min–max, min–max approach implemented for three diagnostic categories; Cum-logistic, linear combination coefficients from cumulative logistic regression.
4.1. Multivariate normal distributions with equal variance
We generate data from multivariate normal distributions with different mean vectors and equal variance matrices corresponding to three ordinal diagnostic categories with
and Σ1 = Σ2 = Σ3 = 0.7 × I5×5 + 0.3 × J5×5, 0.5 × I5×5 + 0.5 × J5×5, 0.3 × I5×5 + 0.7 × J5×5, where I and J stand for an identity matrix and a matrix with all elements equal to 1, respectively. These three different covariance matrices correspond to scenarios with low, medium, and high correlation, respectively, and we present the corresponding simulation results in Tables I–III.
Table III.
(n1, n2, n3) | SSD1 | SSD2 | PSD1 | PSD2 | Cum-logistic | SW1 | SW2 | Min–max |
---|---|---|---|---|---|---|---|---|
(20; 20; 20) | 0.9131 (0.530) |
— — |
0.8104 (0.000) |
— — |
0.9107 (0.369) |
0.8931 (0.091) |
0.8546 (0.002) |
0.8301 (0.007) |
(20, 30, 50) | 0.9090 (0.268) |
0.9090 (0.258) |
0.8011 (0.000) |
0.8074 (0.000) |
0.9070 (0.408) |
0.8894 (0.062) |
0.8485 (0.000) |
0.8250 (0.004) |
(30, 40, 50) | 0.9066 (0.235) |
0.9066 (0.235) |
0.8044 (0.000) |
0.8065 (0.000) |
0.9080 (0.502) |
0.8872 (0.027) |
0.8463 (0.000) |
0.8228 (0.001) |
(50, 50, 50) | 0.9049 (0.442) |
— — |
0.8053 (0.000) |
— — |
0.9064 (0.549) |
0.8845 (0.010) |
0.8440 (0.000) |
0.8207 (0.000) |
Simulation setting: normal data with equal variance Σ1 = Σ2 = Σ3 = 0.3×I5×5+ 0.7 × J5×5.
SSD1, scaled stochastic distance method with accounting for unbalanced sample size; SSD2, scaled stochastic distance method with accounting no unbalanced information; PSD1, penalized stochastic distance method with accounting for unbalanced sample size; PSD2, penalized stochastic distance method with accounting no unbalanced information; SW1, step-down procedure (stepwise method proceeding from marker with largest VUS to smallest VUS); SW2, step-up procedure (stepwise method proceeding from marker with smallest VUS to largest VUS); Min–max, min–max approach implemented for three diagnostic categories; Cum-logistic, linear combination coefficients from cumulative logistic regression.
Overall speaking, the simulation results presented in Tables I–III show that SW1, SSD1, SSD2, and Cum-logistic have better performance than the other approaches. The performance of each method somewhat depends on the correlation. When correlation is relatively small, Table I (p = 0.3) shows that SW1 performs much better than SSD1, SSD2, or Cum-logistic. As correlation increases from small to large, Table III (p = 0.7) shows that the performance of SW1 is slightly inferior to that of SSD1 or Cum-logistic in view of the mean VUS. Under the setting with correlation p = 0.5 (Table II), all of SW1, SSD1, SSD2, and Cum-logistic have comparable good performance.
Table II.
(n1, n2, n3) | SSD1 | SSD2 | PSD1 | PSD2 | Cum-logistic | SW1 | SW2 | Min–max |
---|---|---|---|---|---|---|---|---|
(20, 20, 20) | 0.8951 (0.332) |
— — |
0.8441 (0.000) |
— — |
0.8944 (0.289) |
0.8977 (0.348) |
0.8759 (0.021) |
0.8239 (0.009) |
(20, 30, 50) | 0.8905 (0.156) |
0.8905 (0.154) |
0.8383 (0.000) |
0.8412 (0.000) |
0.8896 (0.290) |
0.8913 (0.386) |
0.8702 (0.008) |
0.8188 (0.007) |
(30, 40, 50) | 0.8879 (0.163) |
0.8879 (0.162) |
0.8396 (0.000) |
0.8406 (0.000) |
0.8896 (0.356) |
0.8879 (0.313) |
0.8677 (0.003) |
0.8168 (0.002) |
(50, 50, 50) | 0.8860 (0.342) |
— — |
0.8395 (0.000) |
— — |
0.8875 (0.411) |
0.8849 (0.245) |
0.8655 (0.001) |
0.8147 (0.000) |
Simulation setting: normal data with equal variance Σ1 =Σ2 = Σ3 = 0.5 × I5×5 + 0.5 × J 5×5.
SSD1, scaled stochastic distance method with accounting for unbalanced sample size; SSD2, scaled stochastic distance method with accounting no unbalanced information; PSD1, penalized stochastic distance method with accounting for unbalanced sample size; PSD2, penalized stochastic distance method with accounting no unbalanced information; SW1, step-down procedure (stepwise method proceeding from marker with largest VUS to smallest VUS); SW2, step-up procedure (stepwise method proceeding from marker with smallest VUS to largest VUS); Min–max, min–max approach implemented for three diagnostic categories; Cum-logistic, linear combination coefficients from cumulative logistic regression.
Although the method using cumulative logistic regression might work well for certain scenarios, there exist some numerical difficulties with fitting a cumulative logistic regression model. The iterative algorithms for maximum likelihood estimates of the model parameters can easily fail to converge, especially when the sample sizes are small. For fair comparisons, those ill-posed Monte Carlo samples are marked and excluded for calculating the mean VUS corresponding to Cum-logistic.
4.2. Multivariate normal distributions with unequal variance
Now we consider multivariate normal distributions with different mean vectors and unequal variance matrices corresponding to three ordinal diagnostic categories. The mean vectors are the same as in Section 4.1, with variance matrices set as follows,
As shown in Table IV, under this setting, SSD1, SSD2, SW1, and Cum-logistic have good and comparable performances.
Table IV.
(n1, n2, n3) | SSD1 | SSD2 | PSD1 | PSD2 | Cum-logistic | SW1 | SW2 | Min–max |
---|---|---|---|---|---|---|---|---|
(20, 20, 20) | 0.8954 (0.336) |
— — |
0.8456 (0.000) |
— — |
0.8943 (0.273) |
0.8982 (0.365) |
0.8766 (0.024) |
0.7994 (0.001) |
(20, 30, 50) | 0.8916 (0.167) |
0.8916 (0.159) |
0.8400 (0.000) |
0.8428 (0.000) |
0.8904 (0.278) |
0.8925 (0.388) |
0.8711 (0.008) |
0.7937 (0.000) |
(30, 40, 50) | 0.8884 (0.159) |
0.8884 (0.157) |
0.8412 (0.000) |
0.8421 (0.000) |
0.8900 (0.345) |
0.8887 (0.335) |
0.8684 (0.003) |
0.7916 (0.000) |
(50, 50, 50) | 0.8863 (0.336) |
— — |
0.8411 (0.000) |
— — |
0.8878 (0.398) |
0.8855 (0.265) |
0.8661 (0.001) |
0.7892 (0.000) |
Simulation setting: normal data with unequal variance.
SSD1, scaled stochastic distance method with accounting for unbalanced sample size; SSD2, scaled stochastic distance method with accounting no unbalanced information; PSD1, penalized stochastic distance method with accounting for unbalanced sample size; PSD2, penalized stochastic distance method with accounting no unbalanced information; SW1, step-down procedure (stepwise method proceeding from marker with largest VUS to smallest VUS); SW2, step-up procedure (stepwise method proceeding from marker with smallest VUS to largest VUS); Min–max, min–max approach implemented for three diagnostic categories; Cum-logistic, linear combination coefficients from cumulative logistic regression.
4.3. Multivariate log-normal distributions with unequal variance–covariance
In this section, we would like to investigate the diagnostic accuracy of the combined marker from different methods, assuming that multiple biomarkers follow multivariate log-normal distributions; that is, the log-transformed scores are multivariate normally distributed. We first generate data from the multivariate normal setting in Section 4.2 and then exponentiate the data to get the multivariate log-normal observations.
In this case, the normality assumption does not hold, and the normal-based approaches such as SSD1 do not work at all, which is expected, as sample means and variance matrices under this setting cannot measure the location and variation correctly for non-normal data. From Table V, we suggest that SW1 proceeding from the marker with largest VUS to the marker with smallest VUS dominate the other methods.
Table V.
(n1, n2, n3) | SSD1 | SSD2 | PSD1 | PSD2 | Cum-logistic | SW1 | SW2 | Min–max |
---|---|---|---|---|---|---|---|---|
(20, 20, 20) | 0.6727 (0.002) |
— — |
0.5784 (0.004) |
— — |
0.8378 (0.066) |
0.8835 (0.849) |
0.8678 (0.075) |
0.7981 (0.003) |
(20, 30, 50) | 0.7078 (0.000) |
0.7066 (0.000) |
0.4845 (0.001) |
0.4208 (0.001) |
0.8257 (0.033) |
0.8769 (0.932) |
0.8625 (0.031) |
0.7931 (0.001) |
(30, 40, 50) | 0.7082 (0.000) |
0.7077 (0.000) |
0.4303 (0.000) |
0.4141 (0.000) |
0.8323 (0.028) |
0.8717 (0.949) |
0.8590 (0.022) |
0.7910 (0.000) |
(50, 50, 50) | 0.7095 (0.000) |
— — |
0.4099 (0.000) |
— — |
0.8372 (0.027) |
0.8674 (0.957) |
0.8561 (0.014) |
0.7887 (0.000) |
Simulation setting: multivariate log-normal data.
SSD1, scaled stochastic distance method with accounting for unbalanced sample size; SSD2, scaled stochastic distance method with accounting no unbalanced information; PSD1, penalized stochastic distance method with accounting for unbalanced sample size; PSD2, penalized stochastic distance method with accounting no unbalanced information; SW1, step-down procedure (stepwise method proceeding from marker with largest VUS to smallest VUS); SW2, step-up procedure (stepwise method proceeding from marker with smallest VUS to largest VUS); Min-max, min-max approach implemented for three diagnostic categories; Cum-logistic, linear combination coefficients from cumulative logistic regression.
4.4. Multivariate normal-χ2-log-normal-exponential-gamma distributions via normal copula
We further investigate the performances of different methods assuming that the p-variate scores follow multivariate normal-χ2-log-normal-exponential-gamma distributions coupled together via normal copula [13] with exchangeable correlations p being 0.3, 0.5, and 0.7 for non-diseased, intermediate, and diseased categories, respectively. We choose the marginal distributions of p biomarkers for non-diseased, intermediate, and diseased subjects as follows, respectively,
Under this setting, the mean structures are exactly the same as in Section 4.1. From Table VI, we can see that the step-down procedure (SW1) proceeding from the marker with largest VUS to the one with smallest VUS is far more superior than all the other methods.
In summary, out of all the methods considered, the step-down procedure (SW1) is a good choice for combining multiple biomarkers, followed by the SSD method (SSD1 and SSD2), the cumulative logistic regression method, SW2, the PSD method (PSD1 and PSD2), and Min-max. Although SW1 is not based on normality, it requires p – 1 nonparametric searching steps. On the other hand, SSD1 (or SSD2) requires normal assumption, but it is more efficient with large numbers of biomarkers.
5. Analysis of data example
In this section, we apply all eight approaches investigated in simulation studies to a real data set of 118 subjects from a cohort study in early stage AD from the Washington University Knight Alzheimer’s Disease Research Center to combine several psychometric tests for larger discriminating ability, that is, larger VUS, than any individual psychometric test scores.
Experienced clinicians assessed each individual. The diagnosis of AD was based on the Clinical Dementia Rating (CDR) according to published rules [14]. In this application, we are concentrating on three diagnostic categories, non-demented (CDR 0, 45 individuals), very mildly demented (CDR 0.5, 44 individuals), and mildly demented (CDR 1, 29 individuals). Approximately 2 weeks after the clinical evaluation, subjects also completed a battery of psychometric tests. Five of these psychometric tests, the Logical Memory (LM), Digit Span Forward (DSF), Digit Span Backward (DSB), Associate Learning subtests of the Wechsler Memory Scale (WMS) [15], and the Visual Retention Test (Form C, 10-s exposure) (VRT) [16] assessed episodic memory, which involves the recollection of specific events, situations, and experiences, for example, first day of school or graduation. Xiong et al. [3] reported the estimated VUS for these five psychometric tests: 0.724 (LM), 0.522 (DSF), 0.599 (DSB), 0.630 (WMS), and 0.587 (VRT).
We provide the linear combinations with associated VUS from the SSD1, SSD2, PSD1, PSD2, Cum-logistic, SW1, and SW2 methods in the following, where the combination coefficient corresponding to LM is set to 1 to guarantee a unique solution.
| ||||||
LM | DSF | DSB | VRT | WMS | (VUS) | |
|
||||||
SSD1 | 1.0000 | 0.1533 | 0.2272 | 0.3915 | 0.0765 | (0.8077) |
SSD2 | 1.0000 | 0.1513 | 0.2219 | 0.3924 | 0.0747 | (0.8066) |
PSD1 | 1.0000 | 0.4863 | 0.6464 | 0.7902 | 0.7121 | (0.8106) |
PSD2 | 1.0000 | 0.4742 | 0.6233 | 0.7810 | 0.6957 | (0.8108) |
Cum-logistic | 1.0000 | 0.1610 | 0.4396 | 0.2934 | 0.1654 | (0.8138) |
SW1 | 1.0000 | 0.1162 | 0.4830 | 0.1290 | 0.3558 | (0.8296) |
SW2 | 1.0000 | 0.0729 | 0.1553 | 0.0924 | 0.3360 | (0.8235) |
|
The min–max approach provides the following combination
with an estimated VUS of 0:7724 for the combined marker. The Shapiro–Wilk test for multivariate normality [17] returns p-values of < 0:0001, < 0:0001, and 0:0184 for non-diseased, intermediate, and diseased categories, respectively. Therefore, the results using the procedures based on normality (SSD1, SSD2, PSD1, PSD2) should not be interpreted. All eight methods provide a linearly combined marker that yields a larger VUS than any of the original test, and the step-down method (SW1) provides a linear combination with the largest VUS.
6. Discussion
In this article, we extend two existing combination approaches to deal with three ordinal diagnostic categories. We also propose two new types of linear combination methods to combine diagnostic tests or biomarkers to improve diagnostic accuracy measure, VUS. The first proposed normal-based approach requires only the estimated means and variance-covariances of multiple diagnostic tests for each diagnostic category to calculate the linear combination coefficients. Therefore, it is efficient with large numbers of biomarkers, which is quite common nowadays with high-throughput bioinformatics tools, for instance, microarray technologies. Under the normality assumption with moderate to large correlations, our simulations show that the normal-based approach, especially SSD1, has relatively good performance in terms of obtaining a combined marker with the largest VUS. Recently, Zhang [18] proposed to directly maximize the accuracy index VUS with three diagnostic categories under normality assumption. Although appealing, the mathematical equations for finding the derivatives are formidable. The author stated that the analytic solution to directly maximizing the VUS is not generally attainable. For this reason, the proposed normal-based approach may offer investigators an opportunity to combine the diagnostic tests and biomarkers for the disease processes with more than three ordinal categories. The second proposed approach is a stepwise approach that is distribution-free in nature and hence is robust with non-normal data. The computing effort and cost in obtaining the combination coefficient is significantly less than the empirical search in p-dimensional space [6]. Our simulations show, for either non-normal data or normal data with small correlations, that the step-down procedure (SW1) proceeding from the marker with largest VUS to the marker with smallest VUS is a reasonable choice for biomarker combination. It is worthwhile to point out that we could easily generalize both the stepwise approach and the normal-based approach to diseases with more than three diagnostic categories. The cumulative logistic regression approach (Cum-logistic) has great chance to produce a combined marker with largest VUS under normality assumption with large sample sizes. Note that one of the assumptions underlying cumulative logistic regression model is the proportional odds. This is to say that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category versus all higher categories, and so forth. We recommend to test this assumption before applying this approach to the combination of markers. The min-max combination method is a fast one, although the performance is not as good as simulations indicated. It is interesting to explore if adding some other order statistics will improve the combination while maintaining its computational efficiency in the future research.
Some related research topics are currently under investigation. The methods explored here implicitly assume that the scaling metric of each of the biomarkers is linear. However, for some cognitive tests, this may not be the case; see Crane et al. [19]. It will be of great interest to determine whether some approaches that first produce a linear scaling metric for each biomarker and then apply the proposed methods may provide additional ability to distinguish among disease severity categories. Furthermore, it is also of interest to explore the performance of a generalized version of the cumulative logistic regression approach discussed in Section 2.2 without the proportional odds assumption.
Supplementary Material
Acknowledgments
Dr. Chengjie Xiong was partly supported by the National Institute on Aging grant P01 AG03991, P01 AG50837, P50 AG05681, R01 AG029672, and R01 AG034119 and Dr. Paul Crane by R01 AG029672. The opinions expressed are those of the authors and not necessarily those of the Editors. The authors thank the referees for helpful discussions and comments.
Footnotes
A supplementary R code to implement the methods described in this article is available for download at http://www.buffalo.edu/~lekang/VUScombine.
References
- 1.Sidransky D. Emerging molecular markers of cancer. Nature Reviews Cancer. 2002;2:210–219. doi: 10.1038/nrc755. [DOI] [PubMed] [Google Scholar]
- 2.Etzioni R, Kooperberg C, Pepe M, Smith R, Gann PH. Combining biomarkers to detect disease with application to prostate cancer. Biostatistics. 2003;4:523–538. doi: 10.1093/biostatistics/4.4.523. [DOI] [PubMed] [Google Scholar]
- 3.Xiong CJ, van Belle G, Miller JP, Morris JC. Measuring and estimating diagnostic accuracy when there are three ordinal diagnostic groups. Statistics in Medicine. 2006;25:1251–1273. doi: 10.1002/sim.2433. [DOI] [PubMed] [Google Scholar]
- 4.Xiong CJ, van Belle G, Miller JP, Yan Y, Gao F, Yu K, Morris JC. A parametric comparison of diagnostic accuracy with three ordinal diagnostic groups. Biometrical Journal. 2007;49:682–693. doi: 10.1002/bimj.200610359. [DOI] [PubMed] [Google Scholar]
- 5.Su JQ, Liu JS. Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association. 1993;88:1350–1355. [Google Scholar]
- 6.Pepe MS, Thompson ML. Combining diagnostic test results to increase accuracy. Biostatistics. 2000;1:123–140. doi: 10.1093/biostatistics/1.2.123. [DOI] [PubMed] [Google Scholar]
- 7.Liu C, Liu A, Halabi S. A min–max combination of biomarkers to improve diagnostic accuracy. Statistics in Medicine. 2011;30(16):2005–2014. doi: 10.1002/sim.4238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jin H, Lu Y. The optimal linear combination of multiple predictors under the generalized linear models. Statistics & Probability Letters. 2009;79:2321–2327. doi: 10.1016/j.spl.2009.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Golub GH, van der Vorst HA. Eigenvalue computation in the 20th century. Journal of Computational and Applied Mathematics. 2000;123:35–65. [Google Scholar]
- 10.Johnson R, Wichern D. Applied Multivariate Statistical Analysis. Prentice Hall Upper Saddle River; NJ: 2002. [Google Scholar]
- 11.Seber G. A Matrix Handbook for Statisticians. John Wiley & Sons; Hoboken, New Jersey: 2008. [Google Scholar]
- 12.Robertson T, Wright F, Dykstra R. Order Restricted Statistical Inference. John Wiley & Sons; New York: 1988. [Google Scholar]
- 13.Kojadinovic I, Yan J. Modeling multivariate distributions with continuous margins using the COPULA R package. Journal of Statistical Software. 2010;34:1–20. [Google Scholar]
- 14.Morris JC. The clinical dementia rating (CDR): current version and scoring rules. Neurology. 1993;43:1412–1414. doi: 10.1212/wnl.43.11.2412-a. [DOI] [PubMed] [Google Scholar]
- 15.Wechsler D, Stone CP. Wechsler Memory Scale Manual. Psychological Corporation; New York: 1973. [Google Scholar]
- 16.Benton AL. The Revised Visual Retention Test: Clinical and Experimental Applications. Psychological Corporation; New York: 1963. [Google Scholar]
- 17.Royston J. An extension of Shapiro and Wilk’s W test for normality to large samples. Applied Statistics. 1982;31:115–124. [Google Scholar]
- 18.Zhang YY. Ph.D. Thesis. National University of Singapore; Singapore: 2010. ROC analysis in diagnostic medicine. [Google Scholar]
- 19.Crane PK, Narasimhalu K, Gibbons LE, Mungas DM, Haneuse S, Larson EB, Kuller L, Hall K, van Belle G. Item response theory facilitated cocalibrating cognitive tests and reduced bias in estimated rates of decline. Journal of Clinical Epidemiology. 2008;61(10):1018–1027. doi: 10.1016/j.jclinepi.2007.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Thompson ML, Zucchini W. On the statistical analysis of ROC curves. Statistics in Medicine. 1989;8(10):1277–1290. doi: 10.1002/sim.4780081011. [DOI] [PubMed] [Google Scholar]
- 21.Pepe MS. A regression modelling framework for receiver operating characteristic curves in medical diagnostic testing. Biometrika. 84:595–608. [Google Scholar]
- 22.He X, Frey EC. The meaning and use of the Volume Under a Three-Class ROC Surface (VUS) IEEE Transactions on Medical Imaging. 2008;27:577–588. doi: 10.1109/TMI.2007.908687. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.