Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Dec 11.
Published in final edited form as: Stat Med. 2011 Dec 5;30(30):3532–3545. doi: 10.1002/sim.4401

Parametric and non-parametric confidence intervals of the probability of identifying early disease stage given sensitivity to full disease and specificity with three ordinal diagnostic groups

Tuochuan Dong 1, Lili Tian 1,*, Alan Hutson 1, Chengjie Xiong 2
PMCID: PMC4263350  NIHMSID: NIHMS398250  PMID: 22139763

SUMMARY

In practice, there exist many disease processes with three ordinal disease classes; i.e. the non-diseased stage, the early disease stage and the fully diseased stage. Since early disease stage is likely the best time window for treatment interventions, it is important to have diagnostic tests which have good diagnostic ability to discriminate the early disease stage from the other two stages. In this paper, we present both parametric and non-parametric approaches for confidence interval estimation of probability of detecting early disease stage given the true classification rates for non-diseased group and diseased group, namely, the specificity and sensitivity to full disease. A data set on the clinical diagnosis of early stage Alzheimers disease (AD) from the neuropsychological database at the Washington University Alzheimers Disease Research Center (WU ADRC) is analyzed using the proposed approaches.

Keywords: Alzheimers disease (AD), generalized inference, Box-Cox transformation, bootstrap method

1. INTRODUCTION

The methods pertaining to statistical inferences involving diagnostic accuracy in the literature have largely focused on the cases when subjects are categorized in a binary fashion, i.e., the non-diseased and the diseased. The primary quantities of interest are the probabilities of an incorrect decision in the healthy population (1- specificity) and of a correct decision in the diseased population (sensitivity), respectively. When a diagnostic test is based on an observed variable that lies on a continuous or graded scale, an assessment of the test can be made through the use of a Receiver Operating Characteristic (ROC) curve, which is a plot of sensitivity against 1-specificity. For excellent reviews of statistical methods involving ROC curves; see Shapiro [1], Zhou et al. [2] and Pepe [3].

In reality, there exists a transitional stage (early disease stage) in many disease processes. In other words, a disease process might involves three ordinal diagnostic stages: the normal healthy stage without even the earliest subtle disease symptoms, the early stage of the disease, and stage of full-blown development of the disease. For example, mild cognitive impairment (MCI) and/or early stage Alzheimers disease (AD) is a transitional stage between the cognitive changes of normal aging and the more serious problems. More details can be seen from Xiong et al. [4]. To be specific, let Y1, Y2 and Y3 denote the results of a diagnostic test and let F1, F2 and F3 denote the corresponding cumulative distribution functions for healthy subjects, the subjects with early stage disease, and fully diseased subjects, respectively. Assume the results are measured on a continuous scale and that higher values indicate greater severity of the disease. Let P1 = F1(c1), P3 = 1 − F3(c3), where c1 and c3 are threshold values (c1 < c3) for classifying a subject into the non-diseased stage group and the fully diseased stage group, given that the subject is from these corresponding groups, respectively. Therefore, P1 is specificity and P3 is sensitivity to full disease. Then the probability that a randomly selected subject from the early disease stage group has a test result between c1 and c3, i.e. being correctly classified, is

P2=F2(c3)F2(c1)=F2[F31(1P3)]F2[F11(P1)]. (1)

The P2 can also be called sensitivity to early disease. As a function of P1 and P3, P2 = P2(P1, P3) defines a surface in the three-dimensional space (P1, P3, P2), i.e., the ROC surface. The point (P1, P3, P2) = (1, 1, 1) indicates the perfect discrimination ability of the marker between three ordinal disease groups. The volume under the ROC surface (VUS) and the partial volume under surface (PVUS) have been widely used as quantitative indexes of discriminating ability of a biomarker measured on a continuous scale; e.g., see Mossman [5], Dreiseitl [6] and Heckerling [7]. Furthermore, Nakas and Yiannoutsos [8] proposed distribution-free approaches for hypothesis testing for a single VUS and paired VUSs; Xiong et. al. [4] developed an asymptotic approach for confidence interval estimation of VUS and PVUS for normally distributed data; Nakas and Alonzo [9], and Alonzo and Nakas [10] proposed nonparametric inference procedures for diagnostic accuracy with three disease classes under umbrella ordering; and Xiong et. al. [11] developed a large sample approach for comparing several VUSs for normally distributed data. Most recently, Tian et al. [12] proposed an approach based on generalized inference for confidence interval estimation of the difference between paired VUSs and PVUSs.

The probability associated with the detection of early disease stage, P2, is especially critical in medical sense. First, in many disease processes such as AD, early detection often means optimum time window for therapeutic treatment due to the fact that no pharmaceutical treatments to-date are effective for the late stage AD. Therefore, estimating the probability that a person is at the early disease stage has direct treatment implications. Second, there are well established and accepted criteria for differentiating normal aging (i.e., P1) and fully developed AD (i.e., P3). However, it is far more challenging to diagnose subjects at the earliest disease stage for clinicians because of the subtle clinical symptoms in the early stage of many complex disease processes. An accurate estimate of P2 therefore helps clinicians to identify the best disease markers for early diagnosis. Finally, it is already a standard practice for clinicians to diagnose subjects into 3 groups: normal aging, early stage/very mild AD, and fully developed AD. For disease processes with three disease stage, the specificity (P1), the sensitivity to early disease (P2), and the sensitivity to full disease (P3) of a test or a biomarker depend on the cut-points c1 and c3 which can be chosen to be some quantile (typically the 80th, 90th or etc) of the distribution of the test values of non-diseased subjects and fully diseased subjects providing a fixed specificity (for example, the 80th percentile provides a specificity of 0.8) and a fixed sensitivity to full disease. In other words, the specification of P1 and P3 only serves to set up the cutoffs c1 and c3 for a disease marker that can be used to diagnose subjects into three groups. Therefore, the sensitivity to early disease (P2) at a given specificity (P1) and a give sensitivity to full disease (P3) provides a measure of the ability of a biomarker for early disease detection and can be used as another diagnostic measure in addition to VUS and PVUS. Hence it is of paramount theoretical and practical importance to develop inference procedures for P2. However, to the best of our knowledge, the problem of making inference about P2 given P1 and P3 has not been addressed in the literatures.

When the disease status is binary, the diagnostic accuracy of a test is usually described by its sensitivity and specificity. For a continuous-scale test or biomarker, it is often of interest to construct a confidence interval for the sensitivity at the cut-off that yields a predetermined level of specificity. Towards this end, some works for estimation of sensitivity given specificity have been done. Linnet [13] proposed both parametric and non-parametric methods for constructing confidence intervals for the sensitivity of a test at a fixed value of specificity, accounting for the random variation associated with the estimated cut-off point. Platt [14] pointed out several shortcomings in Linnet [13] methods and then proposed to use Efron’s bias-corrected acceleration (BCa) bootstrap interval. Zhou and Qin [15] proposed two new intervals for the sensitivity of a diagnostic test at a fixed level of specificity.

The purpose of this paper is two-fold: 1) For disease processes with three disease categories, we propose to use the sensitivity to early disease (P2) given specificity (P1) and sensitivity to full disease (P3) as a diagnostic measure which focuses on ability of early disease detection; 2) we examine the performance of several parametric and nonparametric approaches for confidence interval estimation of P2 given P1 and P3, and then make recommendations about what procedures are most appropriate to use under different scenarios. This paper is organized as follows. In Section 2, the motivating example from Washington University (WU) Alzheimers Disease Research Center (ADRC) is described. In Section 3, the parametric confidence interval estimations for P2 under either normality or normality of transformed data are discussed. In Section 4, nonparametric confidence intervals for P2 are presented. In Section 5, we conduct simulation studies to assess the finite sample performance of the proposed confidence intervals. In Section 6, we analyze the data from the Alzheimer’s disease study. In Section 7, we give a summary and discussion.

2. THE DATA

Alzhermer’s Disease(AD) is one of the most common degenerative dementias for aged people. As “baby boomers” reach retirement, AD is becoming even more prevalent, thus resulting a major health care crisis in the United States. Because AD is irreversible, a major challenge is to identify individuals in the early phase of it. The sample studied here is from the longitudinal cohort of Washington University (WU) Alzheimers Disease Research Center (ADRC). Only individuals with dementia of Alzheimer type (DAT) were included in the demented sample. For each subject, the severity of dementia was staged by the Clinical Dementia Rating (CDR) according to published rules [16]. This data set includes three diagnostic groups: non-demented (CDR 0=group D−), very mildly demented (CDR 0.5=group D0), and mildly demented (CDR 1=group D+). There are 45, 44, and 29 subjects in groups D−, D0, D+ respectively. Besides clinical evaluation, participants also completed standard psychometric tests. Episodic memory was assessed by the Logic Memory, Digit Span (both forward and backward), Associate Learning sub-tests of the Wechsler Memory Scale (WMS), and the Visual Retention Test. Three measures of semantic memory included the Information subset of the Wechsler Adult Intelligence Scale (WAIS), the Boston Naming Test, and word fluency for S and P. The other two tests in the psychometrics battery were an attentional measure (WMS Mental Control) and an un-timed visuospatial measure (Visual Retention Test). The factor scores including primary factor (called global factor), the mental control/frontal factor, the memory-verbal/temporal factor, and the visuospatial/parietal factor were also computed from the database. These composite factor scores reflect the brain regions thought to contribute to performance on the measures that loaded highly on the factors. For more details about this data set and the description of the related psychometric tests, see Xiong et. al. [4].

The data set has been analyzed by Xiong et. al. [4] for confidence interval estimations of VUS and PVUS. The table with summary statistics of neuropsychometric tests from this sample is reproduced as Table 1. For Alzheimer’s disease, a major challenge lies in identifying affected, but not yet fully demented individuals in the earliest phases of illness when treatment can have a more profound impact on functional status and rate of cognitive decline. Therefore, the goal of this paper is to to examine the accuracy of neuropsychological tests in the diagnosis of early stage AD given the sensitivity to full disease and specificity.

Table 1.

Means (standard deviations) of neuropsychometric tests from the WU ADRC sample

Variable CDR 0 (n=45) CDR 0.5 (n=44) CDR 1 (n=29)
Global factor 0.569(0.888) −1.622(1.722) −4.199(1.699)
Frontal factor 2.866(1.777) 0.373(2.212) −2.682(2.067)
Parietal factor 1.803(1.295) −0.241(2.051) −2.377(2.549)
Temporal factor 4.085(2.249) −0.986(3.315) −5.855(3.223)
Associate Learning 0.741(0.890) −0.579(0.888) −1.501(0.871)
Logical Memory 0.730(0.848) −0.858(0.895) −1.766(0.402)
Digit Span Forward 0.579(0.806) −0.212(0.892) −1.210(1.127)
Digit Span Backward 0.546(0.923) −0.400(0.853) −1.824(1.410)
Visual Retention (10 s) 0.636(0.879) −0.821(1.099) −1.658(0.773)
Information 0.631(0.844) −0.607(1.080) −2.302(1.139)
Word Fluency 0.729(1.178) −0.255(0.981) −1.438(0.883)
Mental Control 0.463(0.612) −0.374(1.197) −1.715(1.130)
Boston Naming 0.588(0.531) −0.497(1.635) −3.072(2.148)
Visual Retention (copy) 0.202(0.667) −0.551(1.864) −1.769(2.398)

3. PARAMETRIC CONFIDENCE INTERVAL ESTIMATION OF P2

In this section, we first examine a generalized inference approach for confidence interval estimation of P2 for normally distributed data. For non-normal data, we propose to apply a Box-Cox type power transformation to the data followed by a generalized inference approach. The generalized variables and generalized pivots were introduced by Tsui and Weerahandi [17] and Weerahandi [18]; see the book by Weerahandi [19] for a detailed discussion. A brief summary of the core concepts is included in the Appendix. The concepts of generalized confidence interval and generalized P-value have been successfully applied to a variety of practical settings where standard exact solutions do not exist for confidence intervals and hypothesis testing. It has been shown that generalized inference approaches typically have good performance, even at small sample sizes; e.g. Weerahandi [20], Weerahandi and Berger [21], Krishnamoorthy and Lu [22], Tian and Cappelleri [23], Tian [24], Li, Liao and Liu [25] and Li, Liao and Liu [26].

3.1 Under the Normal Assumptions

Let Y1j (j = 1, 2, …, n1), Y2j (j = 1, 2, …, n2), and Y3j (j = 1, 2, …, n3) denote the n1, n2, n3 observations for the non-diseased, early stage, and diseased groups respectively. Assume Yij(j = 1, 2, …, ni) follows normal distributions with mean μi and variance σi2 for i = 1, 2, 3. Then P2 defined in (1) can be expressed as follows:

P2=Φ[μ3μ2+Φ1(1P3)σ3σ2]Φ[μ1μ2+Φ1(P1)σ1σ2] (2)

where Φ denotes the cumulative distribution function for the standard normal variable. For the ith group, let Ȳi and Si2 be the sample mean and the sample variance, and let ȳi and si2 denote the corresponding observed values. The P2 can be estimated as follows:

2=Φ[Ȳ3Ȳ2+Φ1(1P3)s3s2]Φ[Ȳ1Ȳ2+Φ1(P1)s1s2]. (3)

It is well-known that

Vi=(ni1)Si2/σi2~χni12.

Therefore, the generalized pivotal quantity for σi2 is

Rσi2=(ni1)si2Vi~(ni1)si2χni12,i=1,2,3. (4)

Furthermore,

Zi=iμiσi2/ni~N(0,1),i=1,2,3.

The generalized pivotal quantity of μi is

Rμi=ȳiZiRσi2/ni,i=1,2,3. (5)

The generalized pivotal quantities for normal mean and variance were first proposed in Krishnamoorthy and Mathew [27]. Finally, the generalized pivotal quantity for P2 is

RP2=Φ[Rμ3Rμ2+Φ1(1P3)Rσ3Rσ2]Φ[Rμ1Rμ2+Φ1(P1)Rσ1Rσ2] (6)

where Rσi=Rσi2 for i = 1, 2, 3. One can easily check that RP2 is a bona fide generalized pivot given the following holds: 1) the distributions of RP2 is independent of any unknown parameters; and 2) the observed value of RP2 equals to P2 as defined in equation (2) for given ȳi and si2 (i = 1, 2, 3).

Computing Algorithm

Given a normally distributed data set with n1 non-diseased subject, n2 subjects at early stage of the disease, and n3 diseased subjects, the confidence interval for P2 using generalized inference approach can be obtained via the following steps:

  1. For i = 1, 2, 3, calculate ȳi and si2.

  2. For i = 1, 2, 3, generate independent random numbers Vi from χni12, then calculate Rσi2.

  3. For i = 1, 2, 3, generate independent random numbers Zi from standard normal distributions N(0, 1), and Vi from χni12, then calculate Rμi.

  4. Calculate RP2 as in equation (6).

  5. Repeat Steps 2–4 for a total B = 2500 times to obtain a set of values of RP2.

Denote RP2(α) as the 100αth percentile of RP2 ’s. Then (RP2(α/2);RP2(1 − α/2)) is a two-sided 100(1 − α)% generalized confidence interval of P2.

3.2 Without the Normal Assumptions

Most of the time the normal assumptions as given in Section 3.1 are not satisfied. For such data, we will examine the use of the generalized inference approach to non-normal data by first applying a Box-Cox transformation to the data and then applying the generalized inference procedure proposed in Section 3.1. Due to the fact that ROC is invariant under monotonic transformation, this type of approach has been found useful in the context of ROC analysis for a wide variety of situations (e.g. Zou et al. [28]; Zou and Hall [30]; Faraggi and Reiser [29]; Fluss et al. [31]; Schisterman et al. [32], [33]; Molodianovitch et al. [34]). By employing a similar technique, we can also show that the P2 is invariant under monotonic transformations. Let

Yi(λ)={Yiλ1λλ,0log(Yi),λ=0 (7)

for i = 1, 2, 3, where it is assumed that Yi(λ)~N(μi,σi2). Based on the observations of three groups, the loglikelihood function can be readily obtained as follows:

i3jni[12log(2π)Yijλμi2σi2+(λ1)logYij]. (8)

The maximum likelihood estimate (MLE) of λ can be obtained by maximizing the above function. After applying the Box-Cox transformation, the generalized inference approach proposed in Section 3.1 is applied directly to the transformed data.

4. NON-PARAMETRIC CONFIDENCE INTERVAL ESTIMATION OF P2

The parametric approaches either rely on normality assumption or require solving an equation for Box-Cox transformation. Therefore, it is of also interest to examine the performance of non-parametric approaches for confidence interval estimation of P2. Three nonparametric methods using bootstrap samples will be considered. The first is bootstrap percentile confidence interval, and the other two are based on the intervals proposed by Agresti and Coull [36] with variance estimated from bootstrap samples.

Assume the distributions for the non-diseased group (i.e. F1) and the fully diseased group (i.e. F3) are known. Define Aj=I[F11(P1)Y2jF31(1P3)] for (j = 1, 2, …, n2). Therefore Aj’s are Bernoulli random variables with the successful rate P2=P[F11(P1)Y2jF31(1P3)]. Let 2=j=1n2Aj/n2, the standard (1 − α)100% Wald interval for P2 is

(2z1α/22(12)/n2,2+z1α/22(12)/n2)

where z1−α/2 stands for 100(1 − α/2)% percentile for standard normal distribution.

In reality, the true distributions F1 and F3 are unknown and therefore F11(P1) and F31(1P3) need to be replaced by their sample estimates 11(P1) and 31(1P3). The estimated 2 is given by

2=i=1n2I[11(P1)Yi31(1P3)]n2. (9)

The estimated 100(1 − α)% Wald interval for P2 is

(2z1α/22(12)/n2,2+z1α/22(12)/n2).

The Wald interval is known to have poor performance, especially for small sample sizes [35].

The bootstrap percentile confidence interval (BTP) use bootstrap samples to compute P¯^2b for b = 1 to 500 bootstrap iterations as follows:

(P¯^2b(α),P¯^2b(1α)) (10)

where P¯^2b(α) is the 100α% percentile of the bootstrap distribution of P¯^2.

The AC interval, proposed by Agresti and Coull [36], is know to have good performance for binomial proportions. Applying it to our settings, the 100(1 − α)% AC interval for P2 is:

(2z1α/2Var^AC(2),2+z1α/2Var^AC(2)), (11)

where

2=i=1n2I[F11(P1)Y2iF31(1P3)]+z1α/22/2n2+z1α/22/2 (12)

and

VarAC(2)=2(12)n2+z1α/22/2. (13)

The estimated 2 is given by

P˜^2=i=1n2I[11(P1)Yi31(1P3)]+z1α/22/2n2+z1α/22/2. (14)

The estimated variance VarAC(2) can be obtained directly by substituting 2 with P˜^2 in equation (13).

Zhou and Qin [15] considered the non-parametric solution for estimating confidence intervals for the sensitivity at a fixed level of specificity of a diagnostic test with binary disease status, and proposed to use bootstrap methods to estimate the variance. This idea can be extended to estimate non-parametric confidence interval of P2 given P3 and P1. Follow the same vein, we will use bootstrap methods to estimate the variance of P˜^2. With the estimated variance, we then apply Agresti and Coull’s idea to derive confidence intervals for P2.

Computing Algorithms

Given a data set with n1 non-diseased subjects, n2 subjects at early stage of the disease, and n3 diseased subjects, three nonparametric confidence intervals for P2 discussed in this section can be obtained by the following algorithm.

For b = 1 to B (it is recommended that B ≥ 200, e.g. [15]. In this paper we use 500) bootstrap iterations,

  • Draw resamples of sizes n1, n2, and n3 with replacements from the non-diseased sample Y1j’s, the early stage sample Y2j’s, and the diseased sample Y3j’s, respectively. Denote the bootstrap samples {Yijb}, i = 1, 2, 3, j = 1, 2, …, ni.

  • Calculate the bootstrap version of P¯^2b and P˜^b2 according to (9) and (14) respectively.

The bootstrap percentile confidence interval (BTP) in (10) can be obtained by using the array P¯^2b (b = 1, …, 500).

The proposed bootstrap variance estimator of P˜^2 is defined as:

Var^boot(P˜^2)=1B1b=1B(P˜^2bP˜^¯2b)2, (15)

where P˜^¯2b=(1/B)b=1BP˜^2b. Similarly as Zhou and Qin [15], we propose two Agresti and Coull’s confidence intervals for P2 given P1 and P3. The first (1 − α)100% level interval, called BTI interval, is defined as

(P˜^2z1αVar^boot(P˜^2),P˜^2+z1αVar^boot(P˜^2)) (16)

where P˜^2 is defined in (14). The second 100(1 − α)% level interval, called BTII interval, for P2 is defined by

(P˜^¯2bz1αVar^boot(P˜^2),P˜^¯2b+z1αVar^boot(P˜^2)). (17)

5. SIMULATION STUDIES

Simulation studies are carried out to assess the coverage probabilities of the proposed confidence interval estimations (the generalized inference approach, the generalized inference approach with Box-Cox transformation, the percentile bootstrap interval (BTP), and two intervals based on Agresti and Coull’s paper, namely BTI and BTII) for P2 under different distributional assumptions: normal, beta and gamma. The AC interval proposed in equ. (11) using estimated 2 and variance VarAC(2) has poor coverage accuracy and therefore is not considered here. Beta and gamma distributions are used as representatives of non normal distributions because they are widely used in practical application and also because they come with a variety of shapes.

To represent a wide rage of sample size settings, (n1, n2, n3) is set as (10, 10, 10), (30, 30, 30), (20, 10, 10), (30, 20, 10), (50, 30, 30) and (50, 50, 50). With a fixed 80% specificity and a fixed 80% sensitivity to full disease, the parameters are chosen correspondingly so that P2 equals to 50%, 70%, 80% and 90% respectively. For each parameter setting, 5,000 random samples are generated and the parametric and non-parametric confidence intervals proposed in Sections 3 and 4 are obtained. The simulation results are presented in Tables 24 for the bootstrap percentile approach (BTP), and the BTII approach and the generalized inference approach (without or with Box-Cox transformation). The BTI approach is not presented due to the fact that it is constantly inferior to BTII approach. The coverage probabilities, the coverage errors for lower and upper tails, i.e. the proportion of runs in which the lower (or upper) limit of the confidence interval excluding the true P2 at nominal level, and the average lengths of proposed confidence intervals are presented.

Table 2.

Summary of approximate 95% two-sided confidence bounds of BTP, BTII and GI for P2 under normal distributions (based on 5,000 simulations)

Three Independent Normal Distributions
(μ1, σ1) = (0, 1)′, (μ2, σ2) = (2.5, 1.1)′, (μ3, σ3) = (3.69, 1.2)′, P2 = 0.5
Coverage Probability Lower Tail Upper Tail Length of CI
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9766 0.9360 0.9606 0.0198 0.0502 0.0038 0.0036 0.0138 0.0356 0.8086 0.6364 0.6943
(30, 30, 30) 0.9792 0.9602 0.9572 0.0146 0.0276 0.0110 0.0062 0.0122 0.0318 0.5570 0.5117 0.4330
(20, 10, 10) 0.9774 0.9398 0.9632 0.0190 0.0494 0.0074 0.0036 0.0108 0.0294 0.8013 0.6280 0.6775
(50, 30, 30) 0.9740 0.9464 0.9490 0.0194 0.0382 0.0182 0.0066 0.0154 0.0328 0.5498 0.5032 0.4239
(50, 50, 50) 0.9752 0.9446 0.9564 0.0174 0.0338 0.0130 0.0074 0.0216 0.0306 0.4423 0.3121 0.3356
(μ1, σ1) = (0, 1)′, (μ2, σ2) (2.5, 1.1)′, (μ3, σ3) = (4.31, 1.2)′, P2 = 0.7
Coverage Probability Lower Tail Upper Tail Length of CI
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9834 0.9578 0.9476 0.0146 0.0242 0.0042 0.0020 0.0180 0.0482 0.7278 0.5694 0.6888
(30, 30, 30) 0.9772 0.9622 0.9536 0.0160 0.0236 0.0130 0.0068 0.0142 0.0334 0.4850 0.4445 0.3911
(20, 10, 10) 0.9826 0.9578 0.9506 0.0158 0.0238 0.0042 0.0016 0.0184 0.0452 0.7120 0.5531 0.6578
(50, 30, 30) 0.9786 0.9618 0.9546 0.0164 0.0256 0.0122 0.0050 0.0126 0.0332 0.4764 0.4352 0.3810
(50, 50, 50) 0.9752 0.9532 0.9498 0.0182 0.0254 0.0118 0.0066 0.0214 0.0384 0.3823 0.2681 0.3003
(μ1, σ1) = (0, 1)′, (μ2, σ2) = (2.5, 1.1)′, (μ3, σ3) = (4.73, 1.2)′, P2 = 0.8
Coverage Probability Lower Tail Upper Tail Length of CI
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9720 0.9568 0.9358 0.0266 0.0206 0.0018 0.0014 0.0226 0.0624 0.6372 0.4999 0.6431
(30, 30, 30) 0.9848 0.9720 0.9464 0.0116 0.0142 0.0070 0.0036 0.0138 0.0466 0.4091 0.3704 0.3336
(20, 10, 10) 0.9684 0.9566 0.9360 0.0298 0.0212 0.0048 0.0018 0.0222 0.0592 0.6030 0.4771 0.6024
(50, 30, 30) 0.9812 0.9678 0.9468 0.0156 0.0146 0.0120 0.0032 0.0176 0.0412 0.3951 0.3608 0.3217
(50, 50, 50) 0.9800 0.9620 0.9478 0.0126 0.0214 0.0142 0.0074 0.0166 0.0380 0.3179 0.2188 0.2524
(μ1, σ1) = (0, 1)′, (μ2, σ2) = (2.5, 1.1)′, (μ3, σ3) = (5.51, 1.2)′, P2 = 0.9
Coverage Probability Lower Tail Upper Tail Length of CI
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9468 0.8928 0.9310 0.0530 0.0000 0.0038 0.0002 0.1072 0.0652 0.4583 0.3639 0.5250
(30, 30, 30) 0.9820 0.9704 0.9430 0.0150 0.0114 0.0106 0.0030 0.0182 0.0464 0.2772 0.2591 0.2399
(20, 10, 10) 0.9244 0.8740 0.9216 0.0752 0.0000 0.0056 0.0004 0.1260 0.0728 0.4277 0.3454 0.4711
(50, 30, 30) 0.9828 0.9704 0.9420 0.0144 0.0076 0.0110 0.0028 0.0220 0.0470 0.2626 0.2474 0.2262
(50, 50, 50) 0.9856 0.9558 0.9512 0.0118 0.0192 0.0098 0.0026 0.0250 0.0390 0.2160 0.1464 0.1755

BTP: The confidence interval based on bootstrap percentiles.

BTII: The confidence interval presented in equ (15).

GI: The generalized confidence interval for Box-Cox transferred data.

Low tail (upper tail): One-sided coverage errors, i.e. the proportion of runs in which the lower (or upper) limit of the confidence interval excluded the true P2 at nominal level 0.025.

Length of CI: the average length of two-sided confidence intervals for P2.

Table 4.

Summary of approximate 95% two-sided confidence bounds of BTP, BTII and transformed GI for P2 under gamma distributions (based on 5,000 simulations).

Three Independent Gamma Distributions
(α1, β1) = (1, 6)′, (α2, β2) = (4, 6)′, (α3, β3) = (6.2, 6)′, P2 = 0.5
Coverage Probability Lower Tail Upper Tail Length of CI
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9674 0.9498 0.9628 0.0306 0.0358 0.0038 0.0020 0.0144 0.0334 0.8131 0.6409 0.7133
(30, 30, 30) 0.9762 0.9632 0.9628 0.0176 0.0216 0.0130 0.0062 0.0152 0.0242 0.5550 0.5021 0.4271
(20, 10, 10) 0.9656 0.9518 0.9644 0.0318 0.0350 0.0074 0.0026 0.0132 0.0282 0.7986 0.6246 0.6627
(50, 30, 30) 0.9696 0.9620 0.9586 0.0234 0.0254 0.0172 0.0070 0.0126 0.0242 0.5347 0.4876 0.3988
(50, 50, 50) 0.9734 0.9654 0.9548 0.0176 0.0206 0.0174 0.0090 0.0140 0.0278 0.4370 0.4138 0.3231
(α1, β1) = (1, 6)′, (α2, β2) = (4, 6)′, (α3, β3) = (7.7, 6)′, P2 = 0.7
Coverage Probability Lower Tail Upper Tail Length of CI
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9634 0.9600 0.9578 0.0342 0.0228 0.0026 0.0024 0.0172 0.0396 0.7463 0.5823 0.7318
(30, 30, 30) 0.9726 0.9696 0.9638 0.0204 0.0136 0.0088 0.0070 0.0168 0.0274 0.4905 0.4476 0.3874
(20, 10, 10) 0.9640 0.9652 0.9536 0.0340 0.0184 0.0048 0.0020 0.0164 0.0416 0.7179 0.5565 0.6386
(50, 30, 30) 0.9712 0.9660 0.9594 0.0224 0.0186 0.0108 0.0064 0.0154 0.0298 0.4667 0.4216 0.3571
(50, 50, 50) 0.9714 0.9736 0.9594 0.0208 0.0148 0.0148 0.0078 0.0116 0.0258 0.3819 0.3622 0.2905
(α1, β1) = (1, 6)′, (α2, β2) = (4, 6)′, (α3, β3) = (9.0, 6)′, P2 = 0.8
Coverage Probability Lower Tail Upper Tail Length of CI
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9680 0.9572 0.9570 0.0228 0.0190 0.0022 0.0092 0.0238 0.0408 0.6473 0.5053 0.6967
(30, 30, 30) 0.9776 0.9760 0.9582 0.0136 0.0104 0.0084 0.0088 0.0136 0.0334 0.4166 0.3777 0.3378
(20, 10, 10) 0.9742 0.9626 0.9490 0.0162 0.0136 0.0036 0.0096 0.0238 0.0474 0.6096 0.4795 0.5771
(50, 30, 30) 0.9740 0.9738 0.9508 0.0136 0.0116 0.0102 0.0124 0.0146 0.0390 0.3864 0.3529 0.3053
(50, 50, 50) 0.9746 0.9744 0.9554 0.0104 0.0122 0.0118 0.0150 0.0134 0.0328 0.3213 0.3055 0.2518
(α1, β1) = (1, 6)′, (α2, β2) = (4, 6)′, (α3, β3) = (12.1, 6)′, P2 = 0.9
Coverage Probability Lower Tail Upper Tail Length of CI
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9292 0.8888 0.9526 0.0626 0.0000 0.0056 0.0082 0.1112 0.0418 0.4958 0.3872 0.6012
(30, 30, 30) 0.9754 0.9704 0.9568 0.0146 0.0100 0.0130 0.0100 0.0196 0.0302 0.3072 0.2840 0.2696
(20, 10, 10) 0.9264 0.8924 0.9502 0.0670 0.0000 0.0072 0.0066 0.1076 0.0426 0.4332 0.3544 0.4592
(50, 30, 30) 0.9726 0.9678 0.9598 0.0138 0.0122 0.0142 0.0136 0.0200 0.0260 0.2740 0.2561 0.2305
(50, 50, 50) 0.9678 0.9706 0.9520 0.0206 0.0120 0.0184 0.0116 0.0174 0.0296 0.2401 0.2295 0.1973

BTP: The confidence interval based on bootstrap percentiles.

BTII: The confidence interval presented in equ (15).

GI: The generalized confidence interval for Box-Cox transferred data.

Low tail (upper tail): One-sided coverage errors, i.e. the proportion of runs in which the lower (or upper) limit of the confidence interval excluded the true P2 at nominal level 0.025.

Length of CI: the average length of two-sided confidence intervals for P2.

Table 2 presents simulation results under normal assumption. The bootstrap percentile approach tends to slightly overestimate the coverage probability except it underestimates the coverage probabilities for small sample unbalanced case as P2 = 0.9; while the confidence intervals by the BTII method are reasonably close to the nominal level for most of the scenarios except sometimes they tend to be liberal. The generalized confidence intervals are the most accurate despite the fact they tend to slightly underestimate the coverage probabilities when sample sizes are small and P2 is large. Regardless of sample sizes and true values of P2, the bootstrap percentile confidence interval has the longest length.

In Tables 3 and 4, we present simulation results at the nominal level of 95% for the beta and gamma distributions respectively. For beta and gamma distributions, simulation study shows the Box-Cox transferred data generally satisfies normality. Generally speaking, the Box-Cox transformed generalized approach gives uniformly good coverage probabilities for all cases, except that it might be slightly conservative when the sample sizes are small. The bootstrap percentile confidence interval are generally conservative except for small sample scenarios when P2 = 0.9. The BTII approach performs well except that it tends to be liberal for small sample scenarios when P2 = 0.9. Similarly as normal cases, the bootstrap percentile confidence intervals have the longest length.

Table 3.

Summary of approximate 95% two-sided confidence bounds of BTP, BTII and transformed GI for P2 under beta distributions (based on 5,000 simulations).

Three Independent Beta Distributions
(α1, β1) = (1, 6)′, (α2, β2) = (6, 6)′, (α3, β3) = (9.6, 6)′, P2 = 0.5
Coverage Probability Lower Tail Upper Tail Length of CI
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9722 0.9428 0.9664 0.0146 0.0392 0.0076 0.0132 0.0180 0.0260 0.7917 0.6168 0.6997
(30, 30, 30) 0.9744 0.9608 0.9484 0.0134 0.0260 0.0332 0.0122 0.0132 0.0184 0.5228 0.4725 0.3852
(20, 10, 10) 0.9728 0.9510 0.9680 0.0154 0.0350 0.0148 0.0118 0.0140 0.0172 0.7770 0.6055 0.6396
(50, 30, 30) 0.9728 0.9582 0.9436 0.0170 0.0264 0.0432 0.0102 0.0154 0.0132 0.5079 0.4611 0.3703
(50, 50, 50) 0.9672 0.9608 0.9324 0.0202 0.0240 0.0546 0.0126 0.0152 0.0130 0.4077 0.3851 0.2907
(α1, β1) = (1, 6)′, (α2, β2) = (6, 6)′, (α3, β3) = (12.6, 6)′, P2 = 0.7
Coverage Probability Lower Tail Upper Tail Length of CI
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9706 0.9554 0.9740 0.0154 0.0212 0.0038 0.0140 0.0234 0.0222 0.7172 0.5571 0.7024
(30, 30, 30) 0.9710 0.9654 0.9692 0.0148 0.0164 0.0158 0.0142 0.0182 0.0150 0.4589 0.4178 0.3366
(20, 10, 10) 0.9810 0.9602 0.9744 0.0108 0.0214 0.0062 0.0082 0.0184 0.0194 0.6924 0.5361 0.5904
(50, 30, 30) 0.9720 0.9630 0.9622 0.0134 0.0188 0.0278 0.0146 0.0182 0.0100 0.4422 0.4035 0.3156
(50, 50, 50) 0.9696 0.9662 0.9580 0.0148 0.0188 0.0316 0.0156 0.0150 0.0104 0.3531 0.3360 0.2489
(α1, β1) = (1, 6)′, (α2, β2) = (6, 6)′, (α3, β3) = (15.3, 6)′, P2 = 0.8
Coverage Probability Lower Tail Upper Tail Upper Tail
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9672 0.9520 0.9708 0.0256 0.0170 0.0032 0.0072 0.0310 0.0260 0.6234 0.4873 0.6605
(30, 30, 30) 0.9788 0.9708 0.9732 0.0128 0.0112 0.0092 0.0084 0.0180 0.0176 0.3841 0.3520 0.2907
(20, 10, 10) 0.9648 0.9470 0.9714 0.0280 0.0202 0.0046 0.0072 0.0328 0.0240 0.5808 0.4568 0.5249
(50, 30, 30) 0.9726 0.9656 0.9756 0.0176 0.0152 0.0106 0.0098 0.0192 0.0138 0.3661 0.3370 0.2683
(50, 50, 50) 0.9736 0.9656 0.9760 0.0140 0.0132 0.0108 0.0124 0.0212 0.0132 0.2976 0.2812 0.2126
(α1, β1) = (1, 6)′, (α2, β2) = (6, 6)′, (α3, β3) = (20.4, 6)′, P2 = 0.9
Coverage Probability Lower Tail Upper Tail Upper Tail
Sample Sizes BTP BTII GI BTP BTII GI BTP BTII GI BTP BTII GI
(10, 10, 10) 0.9334 0.8802 0.9482 0.0574 0.0000 0.0028 0.0092 0.1198 0.0490 0.4763 0.3805 0.6004
(30, 30, 30) 0.9710 0.9734 0.9554 0.0164 0.0092 0.0046 0.0126 0.0174 0.0400 0.2853 0.2622 0.2336
(20, 10, 10) 0.9268 0.8714 0.9528 0.0658 0.0000 0.0048 0.0074 0.1286 0.0424 0.4287 0.3458 0.4404
(50, 30, 30) 0.9774 0.9648 0.9590 0.0140 0.0100 0.0058 0.0086 0.0252 0.0352 0.2611 0.2435 0.2104
(50, 50, 50) 0.9716 0.9740 0.9556 0.0154 0.0098 0.0054 0.0130 0.0162 0.0390 0.2163 0.2071 0.1662

BTP: The confidence interval based on bootstrap percentiles.

BTII: The confidence interval presented in equ (15).

GI: The generalized confidence interval for Box-Cox transferred data.

Low tail (upper tail): One-sided coverage errors, i.e. the proportion of runs in which the lower (or upper) limit of the confidence interval excluded the true P2 at nominal level 0.025.

Length of CI: the average length of two-sided confidence intervals for P2.

In summary, as the normality is satisfied for either original data or the transformed data, the parametric approaches, i.e. the generalized approach or the Box-Cox transformed generalized approach can generally provide confidence intervals with satisfactory coverage probabilities. Although the generalized approach is simple to use, the Box-Cox transformation involves solving equation. On the other hand, when the normality assumption can not be met, the BTII approach is a good choice except the scenarios with large P2 and small sample sizes, for which the bootstrap percentile approach can provide reasonable confidence intervals.

6. EXAMPLE: REVISITED

In this section, the confidence intervals of the probabilities of detecting early AD (P2) for all neuropsychometric tests in the data set of Alzheimer’s disease from a study at the Washington University (WU) Alzheimer’s Disease Research Center (ADRC) are estimated by the proposed parametric and nonparametric approaches. The details of the data set are presented in Section 2 and the summary statistics of neuropsychometric tests by three diagnostic groups are presented in Table 1.

A close look at the data using Shapiro-Wilk normality test shows that the frontal factor and temporal factor satisfy normality, while parietal factor, associate learning and word fluency satisfy normality after a Box-Cox transformation. For these variables, the generalized inference approach or the Box-Cox transformed generalized inference are recommended. For the rest of variables, the confidence intervals of P2 can be obtained by both BTP and BTII methods. The specificity P1 and sensitivity to full disease P3 are fixed at 0.8. For comparison purpose, the confidence intervals by all the proposed approaches are presented in Table 5. Furthermore, for each variable, the nonparametric and parametric (with or without Box-Cox transformation) point estimates in (3) and (9) are calculated, and the corresponding most appropriate point estimates are highlighted in Table 5.

Table 5.

Estimated confidence intervals for the probability of detecting early stage Alzheimer's disease using psychometric tests from WU ADRC (sensitivity to full disease and specificity are assumed to equal to 0.8).

Confidence Intervals for the test covariates
Non-parametric Parametric
BTP BTII Normal GI Box-cox GI
Variables
2NP
lb ub lb ub
2P
lb ub
2BoxP
lb ub
Global factor 0.7727 0.5718 0.8875 0.5788 0.9371 0.6107 0.3594 0.7591 0.6267 0.3533 0.7738
Frontal factor* 0.5718 0.1986 0.7440 0.1977 0.7842 0.4340 0.1997 0.6058 0.4375 0.2189 0.5987
Parietal factor** 0.3708 0.1412 0.6866 0.1132 0.6877 0.2290 0.0000 0.4834 0.2499 0.0000 0.4650
Temporal factor* 0.7440 0.5431 0.8875 0.5527 0.9343 0.6330 0.3926 0.7862 0.6337 0.4128 0.7952
Associate Learning** 0.6005 0.2123 0.7440 0.2401 0.7961 0.3383 0.0695 0.5338 0.3436 0.0854 0.5501
Logical Memory 0.7153 0.4856 0.8875 0.5109 0.9225 0.6749 0.4883 0.7985 0.6739 0.4984 0.8021
Digit Span Forward 0.3708 0.0551 0.7153 0.1206 0.7159 0.1939 0.0000 0.4161 0.2001 0.0000 0.4057
Digit Span Backward 0.3421 0.0838 0.8588 0.1139 0.8959 0.2050 0.0000 0.4697 0.2145 0.0000 0.4502
Visual Retention (10 s) 0.7153 0.2273 0.8301 0.3120 0.9332 0.3336 0.1198 0.4961 0.3466 0.1231 0.5112
Information 0.4282 0.1699 0.8014 0.2112 0.8062 0.4387 0.1821 0.6181 0.4378 0.1943 0.6105
Word Fluency** 0.2273 0.0551 0.5718 0.0000 0.5490 0.1643 0.0000 0.3785 0.1799 0.0000 0.3817
Boston Naming 0.3421 0.1412 0.6866 0.0723 0.7036 0.3270 0.0562 0.5002 0.3406 0.0810 0.5206
Mental Control 0.4569 0.0551 0.6579 0.0395 0.7649 0.2369 0.0153 0.4039 0.2375 0.0361 0.4135
Visual Retention (copy) 0.4856 0.0551 0.7877 0.0000 0.8868 0 0.0000 0.0905 0 0.0000 0.1039

BTP: The confidence interval based on bootstrap percentiles.

BTII: Confidence interval is computed by the BTII approach when normality for the data or the Box-Cox transformed data is not satisfied.

GI: The normality is satisfied. Confidence interval is computed by the generalized inference approach.

Box-Cox: The normality of Box-Cox transformed data is satisfied. Confidence interval is computed by the Box-Cox generalized inference approach.

2NP: The nonparametric estimation of P2 in equ (9).

2P: The parametric estimation based on normality of P2 in equ (3).

2BoxP: The parametric estimation based on normality after Box-Cox transformation of P2.

*

: Normality is satisfied. The generalized confidence interval is preferred.

**

: Normality is satisfied for Box-Cox transformed data. The generalized confidence interval for transformed data is preferred.

Note: The most appropriate point estimates are highlighted.

From the point estimates and estimated confidence intervals, we can see that the global factor has the best diagnostic accuracy for identifying early stage dementia, followed by logical memory and visual retention, while word fluency and parietal factor have very poor ability for early stage diagnosis.

7. SUMMARY AND DISCUSSION

The probability of detecting early disease stage (P2) when disease processes involve three ordinal disease stages given sensitivity to the full disease (P3) and specificity (P1) can serve as another diagnostic measure. This article aims to examine the performance of several parametric and nonparametric approaches for confidence interval estimation of P2 given P1 and P3 and to make recommendations about what procedures are most appropriate to use under different scenarios. These methods can be applied to identify important makers for the detection of early stage disease (e.g. preclinical AD) which is usually the most important stage of the disease for intervention. As the simulations results indicate, the parametric approaches generally perform satisfactorily. Out of non-parametric approaches, the bootstrap percentile approach generally slightly overestimate the coverage probabilities, while the BTII method is a good choice except the scenarios with large P2, for which the bootstrap percentile approach can provide reasonable confidence intervals.

Based on the simulation studies, the following recommendations are made. First, if normality is satisfied for either original or transformed data, we suggest the generalized inference approach. This approach is easy to use, and has good coverage probability even for small sample sizes and unbalanced sample sizes. Second, if the normality assumption is not met, the nonparametric BTII approach works well for most scenarios; however, if the estimated P2 is large and the sample sizes are small, we recommended the use of the bootstrap percentile approach (BTP). Furthermore, as sample sizes are ≥ 50, the BTII approach is recommended for normal distribution and the generalized inference approach is recommended for Beta and Gamma distributions due to the following facts: 1) BTII intervals have the shortest length and satisfactory coverage probabilities for normal distribution; 2) the generalized inference approach has the best coverage probabilities and the shortest confidence interval for Beta and Gamma.

All of the proposed approaches are simulation-based approach. The generalized inference approach based on normality is an easy-to-use approach while the Box-Cox transformed generalized inference approach involves solving an equation. The nonparametric approaches are simple except that the variance is computed through bootstrap samples. A R-program is available upon request from ltian@buffalo.edu.

Acknowledgments

The work by Dr. Xiong was partly supported by grants NIH/NIA R01 AG029672, AG003991, AG005681, and AG026276 from the National Institute on Aging and grant NIRG-08-91082 from the Alzheimer’s Association.

Appendix

Generalized Pivots and Generalized Test variables

In the following, the basic concepts for generalized inference developed by Tsui & Weerahandi [17] and Weerahandi [18] are described.

Suppose that Y = (Y1, Y2, …, Yn)′ form a random sample from a distribution which depends on the parameters θ = (ψ, ν) where ψ is the parameter of interest and ν′ is a vector of nuisance parameters. A generalized pivot R(Y; y, ψ, ν), where y is a observed value of Y, for interval estimation defined in Weerahandi [18], has the following two properties:

  1. R(Y; y, ψ, ν) has a distribution free of unknown parameters.

  2. The value of R(y; y, ψ, ν) is ψ.

Let that Rα be the 100αth percentile of R. Then Rα becomes the 100(1 − α)% lower bound for ψ and (Rα/2, R1−α/2) becomes a 100(1 − α)% two-sided generalized confidence interval for ψ.

Now consider testing H0 : ψ = ψ0 vs. H1 : ψ > ψ0 where ψ0 is a specified quantity. A generalized test variable of the form T(Y; y, ψ, ν), where y is an observed value of Y, is chosen to satisfy the following three conditions (Tsui & Weerahandi [17]) :

  1. For fixed y, the distribution of T(Y; y, ψ, ν) is free of the vector of nuisance parameters ν.

  2. The value of T(Y; y, ψ, ν) at Y = y is free of any unknown parameters.

  3. For fixed y and ν, and for all t, Pr[T(Y; y, ψ, ν) > t] is either an increasing or a decreasing function of ψ.

A generalized extreme region is defined as C = [Y : T(Y; y, ψ, ν) ≥ T(y; y, ψ, ν)] if T(Y; y, ψ, ν) is stochastically increasing in ψ. If T(Y; y, ψ, ν) is stochastically decreasing in ψ, a generalized extreme region is defined as C = [Y : T(Y; y, ψ, ν) ≤ T(y; y, ψ, ν)]. Then the generalized P-value is defined as P(C0).

References

  • 1.Shapiro D. The interpretation of diagnostic tests. Statistical Methods in Medical Research. 1999;8:113–134. doi: 10.1177/096228029900800203. [DOI] [PubMed] [Google Scholar]
  • 2.Zhou X-H, Obuchowski N, McClish D. Statistical Methods in Diagnostic Medicine. New York: Wiley; 2002. [Google Scholar]
  • 3.Pepe MS. The statistical evaluation of medical tests for classification and prediction. Vol. 28. Oxford Statistical Science Series; 2003. [Google Scholar]
  • 4.Xiong CJ, Gerald VB, Philip M, John CM. Measuring and estimating diagnostic accuracy when there are three ordinal diagnostic groups. Statistics in Medicine. 2006;25:1251–1273. doi: 10.1002/sim.2433. [DOI] [PubMed] [Google Scholar]
  • 5.Mossman D. Three-way ROCs. Medical Decision Making. 1999;19:78–89. doi: 10.1177/0272989X9901900110. [DOI] [PubMed] [Google Scholar]
  • 6.Dreiseitl S, Ohno-Machado L, Binder M. Comparing three-class diagnostic tests by three-way ROC analysis. Medical Decision Making. 2000;20:323–331. doi: 10.1177/0272989X0002000309. [DOI] [PubMed] [Google Scholar]
  • 7.Heckerling PS. Parametric three-way receiver operating characteristic surface analysis using Mathematica. Medical Decision Making. 2001;21:409–417. doi: 10.1177/0272989X0102100507. [DOI] [PubMed] [Google Scholar]
  • 8.Nakas C, Yiannoutsos C. Ordered multiple-class ROC analysis with continuous measurement. Statistics in Medicine. 2004;23:3437–3449. doi: 10.1002/sim.1917. [DOI] [PubMed] [Google Scholar]
  • 9.Nakas C, Alonzo T. ROC graphs for assessing the ability of a diagnostic marker to detect three disease classes with an umbrella ordering. Biometrics. 2007;63:603–609. doi: 10.1111/j.1541-0420.2006.00715.x. [DOI] [PubMed] [Google Scholar]
  • 10.Alonzo T, Nakas C. Comparison of ROC umbrella volumes with an application to the assessment of lung cancer diagnostic markers. Biometrical Journal. 2007;49:654–664. doi: 10.1002/bimj.200610363. [DOI] [PubMed] [Google Scholar]
  • 11.Xiong C, van Belle G, Miller JP, Yan Y, Gao F, Feng S, Yu K, Morris JC. A parametric comparison of diagnostic accuracy with three ordinal diagnostic groups. Biometrical Journal. 2007;49(5):682–693. doi: 10.1002/bimj.200610359. [DOI] [PubMed] [Google Scholar]
  • 12.Tian L, Xiong C, Lai C, Vexler A. Exact confidence interval estimation for the difference in diagnostic accuracy with three ordinal diagnostic groups. Journal of Statistical Planning and Inference. 2010;141:549–558. doi: 10.1016/j.jspi.2010.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Linnet K. Comparison of quantitative diagnostic tests: type I error, power, and sample size. Statistics in Medicine. 1987;6:147–158. doi: 10.1002/sim.4780060207. [DOI] [PubMed] [Google Scholar]
  • 14.Platt RW, Hanley JA, Yang H. Bootstrap confidence intervals for the sensitivity of a quantitative diagnostic test. Statistics in Medicine. 2000;19:313–322. doi: 10.1002/(sici)1097-0258(20000215)19:3<313::aid-sim370>3.0.co;2-k. [DOI] [PubMed] [Google Scholar]
  • 15.Zhou X-H, Qin GS. Improved Confidence Intervals for the sensitivity to full disease at a fixed Level of specificity of a continuous-scale diagnostic test. Statistics in Medicine. 2005;24:465–477. doi: 10.1002/sim.1563. [DOI] [PubMed] [Google Scholar]
  • 16.Morris JC. The clinical dementia rating (CDR): current version and scoring rules. Neurology. 1993;43:2412–2414. doi: 10.1212/wnl.43.11.2412-a. [DOI] [PubMed] [Google Scholar]
  • 17.Tsui KW, Weerahandi S, et al. Generalized P-values in significance testing of hypotheses in the presence of nuisance parameters. Journal of American Statistical Association. 1989;84:602–607. (1989). [Google Scholar]
  • 18.Weerahandi S. Generalized confidence intervals. Journal of American Statistical Association. 1993;88:899–905. [Google Scholar]
  • 19.Weerahandi S. Exact Statistical Methods for Data Analysis. New York: Springer; 2003. [Google Scholar]
  • 20.Weerahandi S. ANOVA under unequal error variances. Biometrics. 1995;51:589–599. [Google Scholar]
  • 21.Weerahandi S, Berger VW. Exact inference for growth curves with intraclass correlation structure. Biometrics. 1999;55:921–924. doi: 10.1111/j.0006-341x.1999.00921.x. [DOI] [PubMed] [Google Scholar]
  • 22.Krishnamoorthy K, Lu Y. Inference on the common means of several normal populations based on the generalized variable method. Biometrics. 2003;59:237–247. doi: 10.1111/1541-0420.00030. [DOI] [PubMed] [Google Scholar]
  • 23.Tian L, Cappelleri JC. A new approach for interval estimation and hypothesis testing of a certain intraclass correlation coefficient: the generalized variable method. Statistics in Medicine. 2004;23:2125–2135. doi: 10.1002/sim.1782. [DOI] [PubMed] [Google Scholar]
  • 24.Tian L. Confidence intervals for P(Y1 > Y2) with normal outcomes in linear models. Statistics in Medicine. 2008;27:4221–4237. doi: 10.1002/sim.3290. [DOI] [PubMed] [Google Scholar]
  • 25.Li C, Liao C, Liu J. On the exact interval estimation for the difference in paired areas under the ROC curves. Statistics in Medicine. 2008;27:224–242. doi: 10.1002/sim.2760. [DOI] [PubMed] [Google Scholar]
  • 26.Li C, Liao C, Liu J. A non-inferiority test for diagnostic accuracy based on the paired partial areas under ROC curves. Statistics in Medicine. 2008;27:1762–1776. doi: 10.1002/sim.3121. [DOI] [PubMed] [Google Scholar]
  • 27.Krishnamoorthy K, Mathew T. Inferences on the means of lognormal distributions using generalized p-values and generalized confidence intervals. Journal of Statistical Planning and Inference. 2003;115:103–121. [Google Scholar]
  • 28.Zou KH, Hall WJ, Shapiro DE. Smooth Non-Parametric Receiver Operating Characteristic (ROC) Curves for Continuous Diagnostic Tests. Statistics in Medicine. 1997;16:2143–2156. doi: 10.1002/(sici)1097-0258(19971015)16:19<2143::aid-sim655>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  • 29.Faraggi D, Reiser B. Estimation of the Area Under the ROC Curve. Statistics in Medicine. 2002;21:3093–3106. doi: 10.1002/sim.1228. [DOI] [PubMed] [Google Scholar]
  • 30.Zou KH, Hall WJ. Two transformation models for estimating an ROC curve derived from continuous data. Journal of Applied Statistics. 2000;27(5):621–631. [Google Scholar]
  • 31.Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cut-off point. Biometrical Journal. 2005;47:458–472. doi: 10.1002/bimj.200410135. [DOI] [PubMed] [Google Scholar]
  • 32.Schisterman E, Faraggi D, Reiser B. Adjusting the Generalized ROC Curve for Covariates. Statistics in Medicine. 2004;23:3319–3331. doi: 10.1002/sim.1908. [DOI] [PubMed] [Google Scholar]
  • 33.Schisterma E, Reiser B, Faraggi D. ROC analysis for markers with mass at zero. Statistics in Medicine. 2006;25:623–638. doi: 10.1002/sim.2301. [DOI] [PubMed] [Google Scholar]
  • 34.Molodianovitch K, Faraggi D, Reiser B. Comparing the Areas Under Two Correlated ROC Curves: Parametric and Non-Parametric Approaches. Biometrical Journal. 2007;5:745–757. doi: 10.1002/bimj.200610223. [DOI] [PubMed] [Google Scholar]
  • 35.Brown LD, Cai TT, DasGupta A. Interval estimation for a binomial proportion. Statistical Science. 2001;16:101–117. [Google Scholar]
  • 36.Agresti A, Coull BA. Approximate is better than “exact” for interval estimation of Binomial proportions. The American Statistician. 1998;52:119–126. [Google Scholar]

RESOURCES