Skip to main content
Journal of Computational Biology logoLink to Journal of Computational Biology
. 2014 Sep 1;21(9):709–721. doi: 10.1089/cmb.2014.0097

Evaluations and Comparisons of Treatment Effects Based on Best Combinations of Biomarkers with Applications to Biomedical Studies

Albert Vexler 1,, Xiwei Chen 1, Jihnhee Yu 1
PMCID: PMC4148056  PMID: 25019920

Abstract

Many clinical and biomedical studies evaluate treatment effects based on multiple biomarkers that commonly consist of pre- and post-treatment measurements. Some biomarkers can show significant positive treatment effects, while other biomarkers can reflect no effects or even negative effects of the treatments, giving rise to a necessity to develop methodologies that may correctly and efficiently evaluate the treatment effects based on multiple biomarkers as a whole. In the setting of pre- and post-treatment measurements of multiple biomarkers, we propose to apply a receiver operating characteristic (ROC) curve methodology based on the best combination of biomarkers maximizing the area under the receiver operating characteristic curve (AUC)-type criterion among all possible linear combinations. In the particular case with independent pre- and post-treatment measurements, we show that the proposed method represents the well-known Su and Liu's (1993) result. Further, proceeding from derived best combinations of biomarkers' measurements, we propose an efficient technique via likelihood ratio tests to compare treatment effects. We show an extensive Monte Carlo study that confirms the superiority of the proposed test in comparison with treatment effects based on multiple biomarkers in a paired data setting. For practical applications, the proposed method is illustrated with a randomized trial of chlorhexidine gluconate on oral bacterial pathogens in mechanically ventilated patients as well as a treatment study for children with attention deficit-hyperactivity disorder and severe mood dysregulation.

Key words: : area under the curve, best linear combination, likelihood ratio test, paired data, receiver operating characteristic, treatment effect

1. Introduction

Biomarkers have been important tools in disease diagnosis, drug development, and research. In the area of drug development, biomarkers' measurements can be applied to reflect drug effects, and thus are often used to compare different treatment groups. Biomarkers can show treatment effects in different magnitudes or even different directions, necessitating methodologies to examine the treatment effects based on multiple biomarkers jointly. Many studies compare treatment effects based on multiple biomarkers' measurements between the independent case group and the control group. This article targets to propose methodologies that can correctly and efficiently evaluate the treatment effects based on pre- and post-treatment measurements of multiple biomarkers as a whole, and to further develop an efficient statistical testing methodology to compare independent treatment groups with paired data. One of the motivating examples in this article is as follows. The chlorhexidine gluconate on oral bacterial pathogens study was conducted on patients admitted to the 18-bed trauma intensive care unit (TICU) of the Erie County Medical Center, where patients were mechanically ventilated. These patients were of particular interest since they have a high risk of ventilator-associated pneumonia. While it is true that these patients are ill and thus may be more susceptible to infection, they also have the greatest need for prevention of infection. A randomized, double-blind, placebo-controlled clinical trial tested oral topical 0.12% chlorhexidine gluconate (treatment group) and placebo with vehicle alone (control group), applied twice a day by staff nurses. Quantitation of colonization of the oral cavity by respiratory pathogens on left teeth and right teeth was measured. The aim of the study was to determine the best regimen of oral hygiene in the TICU to reduce oral colonization by potential respiratory bacterial pathogens. In this article, we propose to combine the oral plaque quantification on left teeth and right teeth to maximize an area under the receiver operating characteristic (ROC) curve (AUC)-type quantity based on pre- and post-treatment observations in the evaluation of the treatment effect on oral bacterial pathogens in mechanically ventilated patients.

For biomarkers whose values are measured on a continuous scale, its diagnostic performance in identifying diseased subjects is commonly assessed via ROC curves (e.g., Pepe et al., 2006; Vexler et al., 2008a, b]. Suppose that values of a biomarker from the diseased population (X) and the healthy population (Y) are independent and identically distributed samples from two different distributions with cumulative distribution functions F ( · ) and G ( · ), respectively. An ROC curve plots sensitivity (true positive rate, 1 – F(t)) versus one minus specificity (true negative rate, for various values of the threshold t. The mathematical formula is Roc(t)=1 – F(G−1(t)), where Inline graphic. The AUC is a common index of the diagnostic performance of a biomarker. Bamber (1975) noted that the area under this curve is equal to Pr (X>Y).

Some recent biostatistical literature (e.g., Hauck et al., 2000; Tian, 2008; Tian et al., 2012) proposes to consider the quantity Pr (X>Y) in the context of a generalized treatment effect, when X and Y denote continuous outcome variables for the treatment arm and the control arm, respectively. Hauck et al. (2000) introduced the use of Pr (X>Y) in clinical trials as a statistical measurement of describing treatment effects, namely, the generalized treatment effect, and derived a method for confidence interval estimation of Pr (X>Y) with normally distributed outcomes. Tian (2008) compared a large sample approach, a generalized variable approach, and a bootstrap approach for confidence interval estimation of generalized treatment effects in linear models. Tian et al. (2012) proposed to utilize the generalized variable method for testing the equality of generalized treatment effects.

The standard ROC methodology as well as generalized treatment effects mentioned above are commonly considered with respect to case–control studies. In the case of independent populations, for example, cases and controls, various approaches have been proposed to evaluate and compare the performance of bivariate and/or multivariate biomarkers. McClish (1987) and DeLong et al. (1988) proposed comparisons of diagnostic biomarkers based on the difference of areas under ROC curves. Wieand et al. (1989) proposed statistics for comparisons of ROC curves based on a weighted average of sensitivities. Considering the combination of multiple biomarkers as a single composite score, Pepe and Thompson (2000) as well as Vexler et al. (2006) have considered empirical solutions to the optimal linear combinations of biomarkers in the context of nonparametric maximizations of corresponding AUCs.

Su and Liu (1993) derived the optimal linear combinations yielding the largest AUCs if the values of the biomarkers in the diseased (case) and the nondiseased (control) population both follow multivariate normal distributions. We will extend the generalized treatment effect of optimally combined biomarkers to a more general situation with paired data (X and Y are correlated). In this article, we consider the best linear combination of pre- and post-treatment measurements of biomarkers in the sense that the AUC-type measures of treatment effects of this combination are maximized among all possible linear combinations. In a particular case, when pre- and post-treatment biomarkers' measurements are independent, the proposed method corresponds to the well-addressed result of Su and Liu's (1993).

Additionally in this article, to compare effects of treatments between two independent groups based on pre- and post-treatment measurements of groups of biomarkers, we propose a test statistic using the concept of the efficient maximum likelihood ratio methodology, which carries out group comparisons of AUC-type measures of the optimal linear combination of biomarkers.

Primarily, the proposed approach is applied to a randomized trial of chlorhexidine gluconate on oral bacterial pathogens in mechanically ventilated patients. Also, we demonstrate an excellent applicability of the proposed method to any relevant multiple outcomes beyond biomarker studies via a treatment study for children with attention deficit-hyperactivity disorder (ADHD) and severe mood dysregulation (SMD). ADHD is the most commonly diagnosed behavioral disorder of childhood. Most children with ADHD also have at least one other developmental or behavioral problem. They may also have a psychiatric problem, such as depression or bipolar disorder. SMD is a syndrome defined to capture the symptomatology of children whose diagnostic status with respect to bipolar disorder is uncertain, that is, those who have severe, nonepisodic irritability and the hyperarousal symptoms characteristic of mania but who lack the well-demarcated periods of elevated or irritable mood characteristic of bipolar disorder. For each child enrolled in the study, Children's Depression Rating Scale (CDRS) scores and Young Mania Rating Scale (YMRS) scores were taken at the baseline and the endpoint. The objective of the study was to compare total treatment effects based on pre- and post-treatment measurements of CDRS-R and YMRS between the experimental group-based therapy program and the community psychosocial treatment (i.e., control). For more related research in this context, see Vexler et al. (2012). In this article, we propose to combine the measured values maximizing an AUC-type quantity based on pre- and post-treatment observations in evaluation of treatment effects in the study for children with ADHD and SMD.

This article is organized as follows. In Section 2, we define the AUC-type measure and the estimation of the best linear combination of biomarkers. The maximum likelihood ratio test is proposed in Section 2 as well. Section 3 shows an extensive Monte Carlo study for the proposed methods. Section 4 illustrates applications to a randomized trial of chlorhexidine gluconate on oral bacterial pathogens in mechanically ventilated patients as well as a treatment study for children with ADHD and SMD. In Section 5, we conclude the article with remarks.

2. Methods

When distributions of two independent populations, say, case and control, are compared based on measurements of multiple biomarkers, it is desirable to combine the measurements of different biomarkers (e.g., Su and Liu, 1993), since markers usually represent different aspects of diseases. Using combined scores of biomarkers can increase the diagnostic accuracy of the set of medical tests. Commonly, biomarkers' values are proposed to be combined with respect to the maximization of AUCs (e.g., Vexler et al., 2006; Liu et al., 2011). In this article, we derive best linear combinations of pre- and post-treatment measurements of biomarkers. The likelihood ratio test is used to compare two treatment groups (e.g., case and control) based on the AUC-type criterion computed with respect to the best linear combinations of biomarkers' values.

2.1. Best linear combination

Without loss of generality and with respect to our practical examples, suppose that two biomarkers involved in a study to analyze treatment effects. Let X1i and X2i be the pre- and post-treatment measurements of one biomarker, respectively, with respect to the i-th Inline graphic patient for a certain treatment. Let Y1i and Y2i be the pre- and post-treatment measurements of another biomarker, respectively, with respect to the i-th Inline graphic patient. Assume that (X1, X2, Y1, Y2)T (here T stands for the transpose operation) follows a multivariate normal distribution with the mean vector Inline graphic and the covariance matrix Σ=(σhl), 1≤h≤4, 1≤l≤4. To represent a simple measure of treatment effects, we are interested in reducing dimensionality by constructing an effective linear combination of biomarkers with values Xs and Ys. This implies that we derive certain optimal linear coefficients (λ1, λ2) so that for groups of markers' values (X1, Y1) and (X2, Y2), the one-dimensional random variables U1=λ1X1+λ2Y1 and U2=λ1X2+λ2Y2 can be presented. This linear combination of measurements of biomarkers dominates all the other possible linear combinations in the sense that it provides a maximum of the AUC-type measure Pr (U1 < U2) for all λ1 and λ2. Thus, the optimal Inline graphic maximizes the AUC-type measure, denoted by A, where

graphic file with name eq6.gif

over all possible values of λ1 and λ2, that is, Inline graphic.

Under the assumption of multivariate normality of the biomarkers' measurements distribution, (λ1, −λ1, λ2,−λ2) (X1, X2, Y1, Y2)T follows the normal distribution with mean λ1Δ μX+λ2 Δ μY and variance Inline graphic, where

graphic file with name eq9.gif

and

graphic file with name eq10.gif

Then, the corresponding AUC-type measure has the form of

graphic file with name eq11.gif

where Φ is a standard normal cumulative distribution function. The best linear combination can be defined by maximizing the AUC-type measure and obtaining values of Inline graphic shown in the following proposition.

Proposition 2.1.1.

The best linear combination coefficientsInline graphic are proportional to

graphic file with name eq14.gif

The proof is shown in the Appendix.

Given the best linear combination derived in Proposition 2.1.1., the maximized AUC-type measure has the form of

graphic file with name eq15.gif

If biomarkers are mutually independent, that is, X=(X1, X2) and Y=(Y1, Y2) are independent, the best linear combination coefficients are

graphic file with name eq16.gif

that is, proportional to the weighted change in the mean vector (–δ3 Δ μX, –δ1 Δ μY).

In a special case of independent pre- and post-treatment measurements of biomarkers, which is an analogy to the statement of a case–control study, we have the same result as that proposed by Su and Liu (1993). The result is formalized in the following proposition.

Proposition 2.1.2.

If pre- and post-treatment measurements are independent for both markers; that is, X1 is independent of X2, and Y1 is independent of Y2, the best linear combination coefficients are Inline graphic. Thus, Inline graphic, where

graphic file with name eq19.gif

The corresponding proof is outlined in the Appendix.

Thus, we propose to use the maximized AUC-type measure in the context of the best linear combinations to depict the total treatment effects based on pre- and post-treatment measurements of biomarkers. The total treatment effect Pr (λ1X1+λ2Y1<λ1X2+λ2Y2) has the value of Equation (2).

2.2. Maximum likelihood ratio tests

In this section, we propose the maximum likelihood ratio test for comparing treatments' effects based on best linear combinations of pre- and post-treatment measurements of biomarkers. To this end, we modify the technique proposed in Vexler et al. (2008a).

Let Xrki represent the pre- (r=1) and post-treatment (r=2) measurements of a biomarker (X) for the i-th Inline graphic patient in the k-th group, k=1 for the new therapy group, and k=2 for the control group, respectively. Likewise, let Yjki represent the pre- (r=1) and post-treatment (r=2) measurements of another biomarker (Y) for the i-th Inline graphic patient in the k-th group, k=1 for the new therapy group, and k=2 for the control group, respectively. Assume biomarkers' measurements for the new therapy group v1i=(X11i, X21i, Y11i, X21i)T Inline graphic and biomarkers' measurements for the control group v2j=(X12j, X22j, Y12j, Y22j)T Inline graphic follow a multivariate normal distribution with the mean vector μk=(μ1k, μ2k, μ3k, μ4k)T=E(vk1)=(E(X1k1), E(X2k1), E(Y1k1), E(Y2k1))T and with the covariance matrix Σk=E((vk1E(vk1)) (vk1E(vk1))T)=(σhlk), 1≤h≤4, 1≤l≤4, k=1, 2. Let A1 and A2 denote the maximized AUC-type measures for the new therapy group and the control group, respectively. In this section, for the comparison of the treatment effects for the new therapy group and the control group based on paired observations, we formally consider testing hypothesis,

graphic file with name eq24.gif

In Section 2.1., we showed that the maximized AUC-type measures in both groups have the form of Equation (1). Thus, Inline graphic can be expressed as a function of μk and Σk, where

graphic file with name eq26.gif

Therefore, the hypothesis setting (Eq. 3) is equivalent to

graphic file with name eq27.gif

Under the null hypothesis, μ11 can be represented as a function of the remaining set of parameters, say, μ11=h ( · )=h (μ21, μ31, μ41, μ2, Σ1, Σ2), for a certain function h. We show the exact form of the function h in the Appendix. Thus, in a simple case, when all the parameters are known, we can utilize the classical most powerful likelihood ratio method for testing H0. To this end, the likelihood functions under H1 and H0 can be presented correspondingly as

graphic file with name eq28.gif

where φ(·) denotes the multivariate normal density function known as

graphic file with name eq29.gif

Therefore, the classical likelihood ratio test-statistic is

graphic file with name eq30.gif

When the parameters are unknown, we can apply the maximum likelihood ratio (MLR) to be the test statistic

graphic file with name eq31.gif

The maximum likelihood estimators under H1 have closed-form solutions. The maximum log-likelihood under H1 is

graphic file with name eq32.gif

where

graphic file with name eq33.gif

Under H0, in order to calculate the maximum likelihood, we carried out the numerical approach without specifying the closed forms of the estimators of the unknown parameters.

Thus, we reject the null hypothesis if Λ>Λα, where the threshold Λα corresponds to type I error α. Following the Wilks's theorem (e.g., Lehmann and Romano, 1997), under H0, the statistic 2lnΛ asymptotically has a Inline graphic distribution. Thus, the threshold Λα can be easily obtained from Pr (Λ>Λα), as n1, n2 → ∞. Moreover, the proposed test is asymptotically locally most powerful (e.g., Choi et al., 1996).

Remark 1. Numerical calculations.

Note that, applying statistical software such as R and S-Plus allows us to calculate the minimization of –log (L021, μ31, μ41, μ2, Σ1, Σ2))without using closed forms of the estimators of the unknown parameters. The basic procedure “optim” in R (2012) can be carried out to minimize the negative log-likelihood under H0 and the procedure “multiroot” helps finding this minimization. The related R codes are available from the authors upon request.

Remark 2. Transformed normal approach.

In the case that the normal assumptions are not satisfied, for example, when biological mechanisms induce log-normal distributions of biomarkers (Eckhard et al., 2001), we can fit the data to a Box–Cox power transformation model (1964) to better achieve normality of biomarkers. To be more specific, for the i-th Inline graphic measurement of k-th (k=1, 2) group vki=(X1ki, X2ki, Y1ki, Y2ki), the Box–Cox power transformed values are defined as Inline graphic, where

graphic file with name eq37.gif

The power coefficients λ1k, λ2k, ϑ1k, and ϑ2k can be estimated by maximizing the likelihood

graphic file with name eq38.gif

Then, the normality-based best linear combinations of biomarkers and the maximum likelihood ratio test can be used on the transformed data.

3. Simulation Study

In this section, Monte Carlo simulations are conducted to examine the power properties of the proposed tests under different scenarios. We also compare AUC-type measures between the proposed optimal combination case and only one biomarker case.

3.1. Power and type I error

To study the power and the type I error of the proposed test, 2,000 samples of biomarkers' measurements for a new therapy group (X11i, X21i, Y11i, Y21i)T Inline graphic of a sample size of n1 were generated from multivariate normal distribution with the mean vector μ1 and the covariance matrix Σ1, and biomarkers' measurements for a control group (X12j, Y22j, Y12j, Y22j)T Inline graphic of a sample size of n2 were generated from multivariate normal distribution with the mean vector μ2 and the covariance matrix Σ2, where

graphic file with name eq41.gif

We consider the unequal covariance case, where

graphic file with name eq42.gif

and the equal covariance case with the common covariance matrix as shown in Equation (8). These parameters were chosen to reflect a real data example with values close to those in the treatment study for children with ADHD and SMD introduced in Section 1.

The values of μ21 and μ22 are shown from Table 1 to Table 4, which are chosen such that differences in the maximized AUC-type measures between two groups are set to be 0, 0.1, or 0.2; that is, A1A2=0, 0.1 or 0.2, where A1 and A2 denote the maximized AUC-type measures in the context of best linear combinations of biomarkers' values for the new therapy group and the control group, respectively. We consider following scenarios: A1=0.70, A2=0.60; A1=0.85, A2=0.75; A1=0.80, A2=0.60, as well as A1=0.95, A2=0.75 with different sample sizes with Λα=3.84, α=0.05.

Table 1.

The Monte Carlo Powers of the Proposed Test with Different Sample Sizes (n1, n2) at the Expected 0.05 Significance Level in the Equal Covariance Case

Difference AUCmax μ21, μ22 (30,30) (100,100) (300,300)
0.1 A1=0.70, A2=0.60 μ21=9.9784,
μ22=9.0071
0.1131a 0.4299a 0.8867a
      0.1223b 0.2450b 0.5790b
      0.1342c 0.3320c 0.6420c
0.1 A1=0.85, A2=0.75 μ21=11.7739,
μ22=10.5066
0.2300a 0.5884a 0.9611a
      0.0988b 0.1968b 0.3551b
      0.1714c 0.3026c 0.6019c
0.2 A1=0.80, A2=0.60 μ21=11.0925,
μ22=9.0071
0.4649a 0.9600a 1.0000a
      0.1945b 0.3140b 0.5899b
      0.1762c 0.3460c 0.6501c
0.2 A1=0.95, A2=0.75 μ21=13.8977,
μ22=10.5066
0.8111a 0.9994a 1.0000a
      0.1932b 0.3591b 0.4980b
      0.1591c 0.3082c 0.6419c
a

The power of the proposed test with respect to the best linear combination of two biomarkers.

b

The power for one biomarker (X) alone based on values of Xrki.

c

The power for the other biomarker (Y) alone based on values of Yjki.

AUC, area under the receiver operating characteristic curve.

Table 2.

The Monte Carlo Type I Errors of the Proposed Test with the Best Linear Combination of Two Biomarkers with Different Sample Sizes (n1, n2) in the Equal Covariance Case

AUCmax μ2122 (30,30) (100,100) (300,300)
A1=A2=0.60 9.0071 0.0294 0.0431 0.0493
A1=A2=0.75 10.5066 0.0325 0.0524 0.0498

Table 3.

The Monte Carlo Powers of the Proposed Test with Different Sample Sizes (n1, n2) at the Expected 0.05 Significance Level in the Unequal Covariance Case

Difference AUCmax μ2122 (30,30) (100,100) (300,300)
0.1 A1=0.70, A2=0.60 μ21=9.2989,
μ22=9.0071
0.0800a 0.2900a 0.8056a
      0.1518b 0.3620b 0.6898b
      0.1496c 0.3199c 0.6479c
0.1 A1=0.85, A2=0.60 μ21=10.5920,
μ22=10.5066
0.1096a 0.3425a 0.9394a
      0.1507b 0.3594b 0.6723b
      0.1794c 0.3394c 0.6667c
0.2 A1=0.80, A2=0.60 μ21=10.1010,
μ22=9.0071
0.2727a 0.8964a 1.0000a
      0.2479b 0.4540b 0.7819b
      0.1770c 0.3193c 0.6321c
0.2 A1=0.95, A2=0.75 μ21=12.1232,
μ22=10.5066
0.7436a 0.9999a 1.0000a
      0.3037b 0.5778b 0.8480b
      0.1346c 0.2906c 0.6600c
a

The power of the proposed test with respect to the best linear combination of two biomarkers.

b

The power for one biomarker (X) alone based on values of Xrki.

c

The power for the other biomarker (Y) alone based on values of Yjki.

Table 4.

The Monte Carlo Type I Errors of the Proposed Test with the Best Linear Combination of Two Biomarkers with Different Sample Sizes (n1, n2) in the Unequal Covariance Case

AUCmax μ21, μ22 (30,30) (100,100) (300,300)
A1=A2=0.60 μ21=8.6033,
μ22=9.0071
0.0126 0.0485 0.0261
A1=A2=0.75 μ21=9.6790,
μ22=10.5066
0.0513 0.0521 0.0466

In the same setting of parameters, Table 1 compares the Monte Carlo powers of the proposed MLR test in the context of the optimally combined two biomarkers to the powers using one biomarker alone in the equal covariance case. Table 2 depicts type I errors of the proposed MLR test with the best linear combination of two biomarkers in the equal covariance matrix case. With the same setting of parameters, Table 3 compares the Monte Carlo powers of the proposed MLR test in the context of the optimally combined two biomarkers to the powers using one biomarker alone in the unequal covariance case. Table 4 depicts type I errors of the proposed MLR test with the best linear combination of two biomarkers in the unequal covariance case.

When the difference in AUC-type measures between two groups and the sample size increases, the MLR tests provide increased powers as anticipated in both equal and unequal covariance matrix cases. Tables 1 and 3 show that the powers of the proposed test with the best linear combinations of two biomarkers are very high when the sample size is large enough in both equal and unequal covariance cases. The power is close to be 1 when the difference in the AUC-type measures between the two groups is 0.2 and the sample size in each group is 300. Compared with the power of the proposed test with optimal combinations, powers with one biomarker alone are much smaller. The type I errors of the MLR tests are well controlled even for relatively small sample sizes, say, 30 in each group.

3.2. AUC-type measures

To compare AUC-type measures between the proposed optimal combination case and only one biomarker case, 20,000 samples of biomarkers' measurements (X1, X2, Y1, Y2)T of various sample sizes of n (n=30, 50, 100, 300) were generated from multivariate normal distribution with the mean vector μ and the covariance matrix Σ.

We consider the following scenarios: (a) μ=(7.9333, 11.0925, 26.2000, 26.5333)T and the covariance matrix Σ as shown in Equation (8); (b) μ=(7.9333, 8.7008, 26.2000, 26.5333)T and the covariance matrix Σ as shown in Equation (8) with 16.6389 in the (1, 2) element and (2, 1) element instead. The best linear combination of measurements of biomarker X and biomarker Y is proportional to (1, −1.5686) in scenario a and (1, −1.4047) in scenario b. The AUC-type measure associated with the best linear combination has the form of Equation (2), which is 0.8. The AUC-type measure for X alone corresponds to Equation (1), where λ1=1 and λ2=0, while the AUC-type measure for Y alone corresponds to the case where λ1=0 and λ2=1. Table 5 shows the theoretical AUC-type measures and values based on 20,000 simulations as well as the Monte Carlo variance of the simulated AUC-type measures. In scenario a, the AUC-type measure for X alone appears to be similar to that of the best linear combinations, suggesting that Y, in fact, adds little to the discriminating capacity of X. In scenario b, it is observed that the optimal combination provides substantially better discrimination than does X alone or Y alone. When the sample size n is large, the simulated AUC-type measures with small variability are almost exactly the theoretical values as anticipated.

Table 5.

Comparison of AUC-Type Measure Between the Proposed Optimal Combination Case and Only One Biomarker Case (X or Y Alone)

  AUCoptimal AUCX AUCY
Theoretical value 0.8000a 0.7587a 0.5349a
  0.8000b 0.6038b 0.5340b
n=30
 Estimated AUC-type measure (MC variance) 0.8147 (0.0034)a 0.7639 (0.0040)a 0.5355 (0.0057)a
  0.8140 (0.0034)b 0.6069 (0.0054)b 0.5354 (0.0056)b
n=100
 Estimated AUC-type measure (MC variance) 0.8040 (0.0011)a 0.7598 (0.0012)a 0.5340 (0.0016)a
  0.8042 (0.0011)b 0.6042 (0.0016)b 0.5339 (0.0016)b
n=300
 Estimated AUC-type measure (MC variance) 0.8014 (0.0004)a 0.7592 (0.0004)a 0.5340 (0.0005)a
  0.8013 (0.0003)b 0.6042 (0.0005)b 0.5342 (0.0005)b
a

The result for scenario a with the Monte Carlo (MC) variance shown in parentheses.

b

The result for scenario b with the MC variance shown in parentheses.

4. Applications to Data

In this section, we exemplify the proposed method with data from two clinical studies briefly described in the introduction.

4.1. Oral colonization data

A randomized, double-blind, placebo-controlled clinical trial tested oral topical 0.12% chlorhexidine gluconate (n1=14) or placebo (n2=11), applied twice a day by staff nurses. The paired data were constituted by two measurements of plaque on the denture surface taken from the same subjects at the baseline (day 0) and the endpoint (day 4). The goal was to determine the best regimen of oral hygiene in the TICU based on the mean plaque quantification for the sets of left teeth (upper left first bicuspid, lower left first molar, and lower left central incisor) and right teeth (the upper right first molar, upper right central incisor, and lower right first bicuspid). For the treatment group, the best linear combination of right teeth scores and left teeth scores is proportional to (5.000, 1), leading to the maximized AUC-type measure of 0.7687. The optimized AUC-type measure is higher than the AUC-type measure of 0.7677 with the right teeth scores alone and the AUC-type measure of 0.7456 with the right teeth scores alone. For the control group, the best linear combination of right teeth scores and left teeth scores is proportional to (−2.6070, 1), leading to the maximized AUC-type measure of 0.5888. The corresponding p-value of the hypothesis test of Equation (2) is 0.042, indicating the rejection of the null hypothesis of “lack of treatment effect” at the 0.05 significance level. The decontamination of the oral cavity with chlorhexidine improved the oral hygiene among mechanically ventilated patients in TICU, potentially indicating a reduction of potential respiratory pathogens.

4.2. ADHD data

The ADHD and SMD data were produced in Center for Children and Families at University at Buffalo to examine the feasibility and efficacy of a group-based therapy program for children with ADHD and SMD. A novel group-based therapy program was studied to treat ADHD and mood problems since most ADHD treatments have not designed to help mood problems. Children aged 7–12 with ADHD and SMD were randomly assigned to receive either an 11-week experimental group-based therapy program for children and parents (the treatment group, n1=17), or community psychosocial treatment (the control group, n2=15).

Clinicians rate Children's Depression Rating Scale—Revised version (CDRS-R) scores and YMRS scores. The CDRS-R consists of 17 clinician-rated items, with 14 items based on the child's self-report or reports from the parents or teachers and 3 items based on the child's nonverbal behavior during the interviews. The YMRS is an 11-item, multiple-choice diagnostic questionnaire that psychiatrists use to measure the severity of manic episodes in patients. The paired data were constituted by two measurements taken from the same subjects at the baseline (week 0) and the endpoint (week 11). The objective is to compare treatment effects with respect to CDRS-R and YMRS between the treatment group and the control group. Figure 1 displays AUC-type measures with linear combinations λ1YMRS+λ2CDRS-R versus the ratio λ1/λ2 for Inline graphic for the treatment group. For ease of presentation, the plot displays AUC-type measures versus λ2/λ1 when λ1/λ 2>1 or λ1/λ2 < –1. As can be observed in this plot, the best linear combination of YMRS scores and CDRS-R scores is proportional to (0.1076, 1), leading to the maximized AUC-type measure of 0.9350. The maximized AUC-type measure is higher than the AUC-type measure of 0.8449 with the YMRS scores alone (λ1/λ2=0) and the AUC-type measure of 0.9347 with the CDRS-R scores alone (λ1/λ2=0) Similarly for the control group, the best linear combination of YMRS scores and CDRS-R scores is proportional to (0.5903, 1), leading to the maximized AUC-type measure of 0.8507, which is higher than the AUC-type measure of 0.7455 with the YMRS scores alone and the AUC-type measure of 0.8156 with the CDRS-R scores alone. The corresponding p-value of the hypothesis test of Equation (2) is 0.0085, indicating that the null hypothesis of “lack of treatment effect” is rejected at the 0.05 significance level. Since larger AUC values indicate better diagnostic quality, we conclude that the experimental group-based therapy program is better than the community psychosocial treatment.

FIG. 1.

FIG. 1.

Plot of AUC-type measures with linear combinations λ1YMRS+λ2CDRS-R for the treatment group under the normality assumption. The optimal coefficient and the corresponding maximized AUC-type measure are indicated with dashed lines. AUC, area under the receiver operating characteristic curve.

5. Conclusions

It is well known that the ROC curve is the most commonly used statistical tool to assess the quality of diagnostic biomarkers. In this article, we constructed best linear combinations of biomarkers' measurements based on correlated data maximizing the AUC-type criterion among all possible linear combinations of the biomarker values. In a special case of independent pre- and post-treatment measurements of biomarkers, we showed the same result as that proposed by Su and Liu (1993). Thus, the proposed method can be applied to both independent data as well as paired data. In the context of maximized AUC-type measure, we proposed to use maximum likelihood ratio tests to compare treatment effects based on pre- and post-treatment measurements of multiple biomarkers. Through the Monte Carlo study, the proposed methodology has been confirmed to be very efficient and the proposed test demonstrated adequate power properties corresponding to the hypotheses and sample sizes while keeping the type I error under control even with moderate sample sizes. The superiority of the best linear combination over one biomarker alone was also verified. The analyses of a randomized trial of chlorhexidine gluconate on oral bacterial pathogens in mechanically ventilated patients as well as a treatment study for children with ADHD and SMD demonstrated that the proposed method is relevant to compare treatment groups with correlated multiple outcomes and easy to apply.

6. Appendix

6.1. Proof of Proposition 2.1.1.

To maximize the AUC-type measures, we calculate first partial derivatives of the function of AUC-type measures with respect to λ1 and λ2. Set ∂A/∂λ1=0, and ∂A/∂λ2=0. The equations are equivalent to

graphic file with name eq44.gif

Thus,

graphic file with name eq45.gif

It can be confirmed that Inline graphic, and Inline graphic.   ■

Proof of Proposition 2.1.2.

In the special case of independent paired data; that is, σ12341423=0. Based on Proposition 2.1.1., we have

graphic file with name eq48.gif

By Su and Liu's result,

graphic file with name eq49.gif

that is,

graphic file with name eq50.gif

Thus, we have

graphic file with name eq51.gif

It is obvious that the proposed method corresponds to Su and Liu's result.   ■

6.2. The form of the function h associating μ11 with μ21, μ31, μ41, μ2, Σ1, Σ2 under H0

Based on the equation under the null hypothesis Inline graphic and assuming that higher values indicate better performance, that is, A=Pr(combined pretreatment biomarkers' values < combined post-treatment biomarkers' values), the root of the equation under the null hypothesis for μ11 is

graphic file with name eq53.gif

where

graphic file with name eq54.gif
graphic file with name eq55.gif

Assuming that lower values indicate better performance, that is, A=Pr(combined pretreatment biomarkers' values>combined post-treatment biomarkers' values), the root of the equation under the null hypothesis for μ11 is

graphic file with name eq56.gif

Acknowledgment

This research was supported by NIH Grant 1R03DE020851—01A1 (the National Institute of Dental and Craniofacial Research).

Author Disclosure Statement

No competing financial interests exist.

References

  1. Bamber D.1975. The area above the ordinaldominance graph and the area below the receiver operating characteristic graph. J. Math. Psychol. 12, 387–415 [Google Scholar]
  2. Box G.E.P., and Cox D.R.1964. An analysis of transformations. J. R. Stat. Soc. Ser. B 26, 211–243 [Google Scholar]
  3. Choi S., Hall W. J., and Schick A.1996. Asymptotically uniformly most powerful tests in parametric and semiparametric models. Ann. Stat. 24, 841–861 [Google Scholar]
  4. DeLong E.R., DeLong D.M., and Clarke-Pearson D.L.1988. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 4, 837–845 [PubMed] [Google Scholar]
  5. Eckhard L., Werner A.S., and Markus A.2001. Log-normal distributions across the sciences: keys and clues. BioScience 51, 341–352 [Google Scholar]
  6. Hauck W., Hyslop T., and Anderson S.2000. Generalized treatment effects for clinical trials. Stat. Med. 19, 887–899 [DOI] [PubMed] [Google Scholar]
  7. Lehmann E.L., and Romano J.P.1997. Testing Statistical Hypotheses, 2nd edition. John Wiley and Sons, New York [Google Scholar]
  8. Liu C., Liu A., and Halabi S.2011. A min-max combination of biomarkers to improve diagnostic accuracy. Stat. Med. 30, 2005–2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. McClish D.K.1987. Comparing the areas under more than two independent ROC curves. Med. Decis. Making 7, 149–155 [DOI] [PubMed] [Google Scholar]
  10. Pepe M.S., and Thompson M.L.2000. Combining diagnostic test results to increase accuracy. Biostatistics 1–2, 123–140 [DOI] [PubMed] [Google Scholar]
  11. Pepe M.S., Cai T., and Longton G.2006. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62, 221–229 [DOI] [PubMed] [Google Scholar]
  12. R Development Core Team. 2012. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria: www.R-project.org [Google Scholar]
  13. Su J.Q., and Liu J.S.1993. Linear combinations of multiple diagnostic markers. J. Am. Stat. Assoc. 88, 1350–1355 [Google Scholar]
  14. Tian L.2008. Confidence intervals for P (Y1>Y2) with normal outcomes in linear models. Stat. Med. 27, 4221–4237 [DOI] [PubMed] [Google Scholar]
  15. Tian L., Li X., and Yan L.2012. Testing equality of generalized treatment effects. J. Biopharm. Stat. 22, 582–595 [DOI] [PubMed] [Google Scholar]
  16. Vexler A., Liu A., Eliseeva E., and Schisterman E.F.2008a. Maximum likelihood ratio tests for comparing the discriminatory ability of biomarkers subject to limit of detection. Biometrics 64, 895–903 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Vexler A., Liu A., Schisterman E.F., and Wu C.2006. Note on distribution-free estimation of maximum linear separation of two multivariate distributions. J. Nonparametric Stat. 18, 145–158 [Google Scholar]
  18. Vexler A., Schisterman E.F., and Liu A.2008b. Estimation of ROC curves based on stably distributed biomarkers subject to measurement error and pooling mixtures. Stat. Med. 27, 280–296 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Vexler A., Tsai W.M., Gurevich G., and Yu J.2012. Two sample density-based empirical likelihood ratio tests based on paired data, with application to a treatment study of attention-deficit/hyperactivity disorder and severe mood dysregulation. Stat. Med. 31, 1821–1837 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Wieand H.S., Gail M.H., James B.R., and James K.L.1989. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 76, 585–592 [Google Scholar]

Articles from Journal of Computational Biology are provided here courtesy of Mary Ann Liebert, Inc.

RESOURCES