Parametric and non-parametric confidence intervals of the probability of identifying early disease stage given sensitivity to full disease and specificity with three ordinal diagnostic groups

Tuochuan Dong; Lili Tian; Alan Hutson; Chengjie Xiong

doi:10.1002/sim.4401

. Author manuscript; available in PMC: 2014 Dec 11.

Published in final edited form as: Stat Med. 2011 Dec 5;30(30):3532–3545. doi: 10.1002/sim.4401

Parametric and non-parametric confidence intervals of the probability of identifying early disease stage given sensitivity to full disease and specificity with three ordinal diagnostic groups

Tuochuan Dong ¹, Lili Tian ^1,^*, Alan Hutson ¹, Chengjie Xiong ²

PMCID: PMC4263350 NIHMSID: NIHMS398250 PMID: 22139763

SUMMARY

In practice, there exist many disease processes with three ordinal disease classes; i.e. the non-diseased stage, the early disease stage and the fully diseased stage. Since early disease stage is likely the best time window for treatment interventions, it is important to have diagnostic tests which have good diagnostic ability to discriminate the early disease stage from the other two stages. In this paper, we present both parametric and non-parametric approaches for confidence interval estimation of probability of detecting early disease stage given the true classification rates for non-diseased group and diseased group, namely, the specificity and sensitivity to full disease. A data set on the clinical diagnosis of early stage Alzheimers disease (AD) from the neuropsychological database at the Washington University Alzheimers Disease Research Center (WU ADRC) is analyzed using the proposed approaches.

Keywords: Alzheimers disease (AD), generalized inference, Box-Cox transformation, bootstrap method

1. INTRODUCTION

The methods pertaining to statistical inferences involving diagnostic accuracy in the literature have largely focused on the cases when subjects are categorized in a binary fashion, i.e., the non-diseased and the diseased. The primary quantities of interest are the probabilities of an incorrect decision in the healthy population (1- specificity) and of a correct decision in the diseased population (sensitivity), respectively. When a diagnostic test is based on an observed variable that lies on a continuous or graded scale, an assessment of the test can be made through the use of a Receiver Operating Characteristic (ROC) curve, which is a plot of sensitivity against 1-specificity. For excellent reviews of statistical methods involving ROC curves; see Shapiro [1], Zhou et al. [2] and Pepe [3].

In reality, there exists a transitional stage (early disease stage) in many disease processes. In other words, a disease process might involves three ordinal diagnostic stages: the normal healthy stage without even the earliest subtle disease symptoms, the early stage of the disease, and stage of full-blown development of the disease. For example, mild cognitive impairment (MCI) and/or early stage Alzheimers disease (AD) is a transitional stage between the cognitive changes of normal aging and the more serious problems. More details can be seen from Xiong et al. [4]. To be specific, let Y₁, Y₂ and Y₃ denote the results of a diagnostic test and let F₁, F₂ and F₃ denote the corresponding cumulative distribution functions for healthy subjects, the subjects with early stage disease, and fully diseased subjects, respectively. Assume the results are measured on a continuous scale and that higher values indicate greater severity of the disease. Let P₁ = F₁(c₁), P₃ = 1 − F₃(c₃), where c₁ and c₃ are threshold values (c₁ < c₃) for classifying a subject into the non-diseased stage group and the fully diseased stage group, given that the subject is from these corresponding groups, respectively. Therefore, P₁ is specificity and P₃ is sensitivity to full disease. Then the probability that a randomly selected subject from the early disease stage group has a test result between c₁ and c₃, i.e. being correctly classified, is

P_{2} = F_{2} (c_{3}) - F_{2} (c_{1}) = F_{2} [F_{3}^{- 1} (1 - P_{3})] - F_{2} [F_{1}^{- 1} (P_{1})] .

(1)

The P₂ can also be called sensitivity to early disease. As a function of P₁ and P₃, P₂ = P₂(P₁, P₃) defines a surface in the three-dimensional space (P₁, P₃, P₂), i.e., the ROC surface. The point (P₁, P₃, P₂) = (1, 1, 1) indicates the perfect discrimination ability of the marker between three ordinal disease groups. The volume under the ROC surface (VUS) and the partial volume under surface (PVUS) have been widely used as quantitative indexes of discriminating ability of a biomarker measured on a continuous scale; e.g., see Mossman [5], Dreiseitl [6] and Heckerling [7]. Furthermore, Nakas and Yiannoutsos [8] proposed distribution-free approaches for hypothesis testing for a single VUS and paired VUSs; Xiong et. al. [4] developed an asymptotic approach for confidence interval estimation of VUS and PVUS for normally distributed data; Nakas and Alonzo [9], and Alonzo and Nakas [10] proposed nonparametric inference procedures for diagnostic accuracy with three disease classes under umbrella ordering; and Xiong et. al. [11] developed a large sample approach for comparing several VUSs for normally distributed data. Most recently, Tian et al. [12] proposed an approach based on generalized inference for confidence interval estimation of the difference between paired VUSs and PVUSs.

The probability associated with the detection of early disease stage, P₂, is especially critical in medical sense. First, in many disease processes such as AD, early detection often means optimum time window for therapeutic treatment due to the fact that no pharmaceutical treatments to-date are effective for the late stage AD. Therefore, estimating the probability that a person is at the early disease stage has direct treatment implications. Second, there are well established and accepted criteria for differentiating normal aging (i.e., P₁) and fully developed AD (i.e., P₃). However, it is far more challenging to diagnose subjects at the earliest disease stage for clinicians because of the subtle clinical symptoms in the early stage of many complex disease processes. An accurate estimate of P₂ therefore helps clinicians to identify the best disease markers for early diagnosis. Finally, it is already a standard practice for clinicians to diagnose subjects into 3 groups: normal aging, early stage/very mild AD, and fully developed AD. For disease processes with three disease stage, the specificity (P₁), the sensitivity to early disease (P₂), and the sensitivity to full disease (P₃) of a test or a biomarker depend on the cut-points c₁ and c₃ which can be chosen to be some quantile (typically the 80th, 90th or etc) of the distribution of the test values of non-diseased subjects and fully diseased subjects providing a fixed specificity (for example, the 80th percentile provides a specificity of 0.8) and a fixed sensitivity to full disease. In other words, the specification of P₁ and P₃ only serves to set up the cutoffs c₁ and c₃ for a disease marker that can be used to diagnose subjects into three groups. Therefore, the sensitivity to early disease (P₂) at a given specificity (P₁) and a give sensitivity to full disease (P₃) provides a measure of the ability of a biomarker for early disease detection and can be used as another diagnostic measure in addition to VUS and PVUS. Hence it is of paramount theoretical and practical importance to develop inference procedures for P₂. However, to the best of our knowledge, the problem of making inference about P₂ given P₁ and P₃ has not been addressed in the literatures.

When the disease status is binary, the diagnostic accuracy of a test is usually described by its sensitivity and specificity. For a continuous-scale test or biomarker, it is often of interest to construct a confidence interval for the sensitivity at the cut-off that yields a predetermined level of specificity. Towards this end, some works for estimation of sensitivity given specificity have been done. Linnet [13] proposed both parametric and non-parametric methods for constructing confidence intervals for the sensitivity of a test at a fixed value of specificity, accounting for the random variation associated with the estimated cut-off point. Platt [14] pointed out several shortcomings in Linnet [13] methods and then proposed to use Efron’s bias-corrected acceleration (BCa) bootstrap interval. Zhou and Qin [15] proposed two new intervals for the sensitivity of a diagnostic test at a fixed level of specificity.

The purpose of this paper is two-fold: 1) For disease processes with three disease categories, we propose to use the sensitivity to early disease (P₂) given specificity (P₁) and sensitivity to full disease (P₃) as a diagnostic measure which focuses on ability of early disease detection; 2) we examine the performance of several parametric and nonparametric approaches for confidence interval estimation of P₂ given P₁ and P₃, and then make recommendations about what procedures are most appropriate to use under different scenarios. This paper is organized as follows. In Section 2, the motivating example from Washington University (WU) Alzheimers Disease Research Center (ADRC) is described. In Section 3, the parametric confidence interval estimations for P₂ under either normality or normality of transformed data are discussed. In Section 4, nonparametric confidence intervals for P₂ are presented. In Section 5, we conduct simulation studies to assess the finite sample performance of the proposed confidence intervals. In Section 6, we analyze the data from the Alzheimer’s disease study. In Section 7, we give a summary and discussion.

2. THE DATA

Alzhermer’s Disease(AD) is one of the most common degenerative dementias for aged people. As “baby boomers” reach retirement, AD is becoming even more prevalent, thus resulting a major health care crisis in the United States. Because AD is irreversible, a major challenge is to identify individuals in the early phase of it. The sample studied here is from the longitudinal cohort of Washington University (WU) Alzheimers Disease Research Center (ADRC). Only individuals with dementia of Alzheimer type (DAT) were included in the demented sample. For each subject, the severity of dementia was staged by the Clinical Dementia Rating (CDR) according to published rules [16]. This data set includes three diagnostic groups: non-demented (CDR 0=group D−), very mildly demented (CDR 0.5=group D₀), and mildly demented (CDR 1=group D+). There are 45, 44, and 29 subjects in groups D−, D₀, D+ respectively. Besides clinical evaluation, participants also completed standard psychometric tests. Episodic memory was assessed by the Logic Memory, Digit Span (both forward and backward), Associate Learning sub-tests of the Wechsler Memory Scale (WMS), and the Visual Retention Test. Three measures of semantic memory included the Information subset of the Wechsler Adult Intelligence Scale (WAIS), the Boston Naming Test, and word fluency for S and P. The other two tests in the psychometrics battery were an attentional measure (WMS Mental Control) and an un-timed visuospatial measure (Visual Retention Test). The factor scores including primary factor (called global factor), the mental control/frontal factor, the memory-verbal/temporal factor, and the visuospatial/parietal factor were also computed from the database. These composite factor scores reflect the brain regions thought to contribute to performance on the measures that loaded highly on the factors. For more details about this data set and the description of the related psychometric tests, see Xiong et. al. [4].

The data set has been analyzed by Xiong et. al. [4] for confidence interval estimations of VUS and PVUS. The table with summary statistics of neuropsychometric tests from this sample is reproduced as Table 1. For Alzheimer’s disease, a major challenge lies in identifying affected, but not yet fully demented individuals in the earliest phases of illness when treatment can have a more profound impact on functional status and rate of cognitive decline. Therefore, the goal of this paper is to to examine the accuracy of neuropsychological tests in the diagnosis of early stage AD given the sensitivity to full disease and specificity.

Table 1.

Means (standard deviations) of neuropsychometric tests from the WU ADRC sample

Variable	CDR 0 (n=45)	CDR 0.5 (n=44)	CDR 1 (n=29)
Global factor	0.569(0.888)	−1.622(1.722)	−4.199(1.699)
Frontal factor	2.866(1.777)	0.373(2.212)	−2.682(2.067)
Parietal factor	1.803(1.295)	−0.241(2.051)	−2.377(2.549)
Temporal factor	4.085(2.249)	−0.986(3.315)	−5.855(3.223)
Associate Learning	0.741(0.890)	−0.579(0.888)	−1.501(0.871)
Logical Memory	0.730(0.848)	−0.858(0.895)	−1.766(0.402)
Digit Span Forward	0.579(0.806)	−0.212(0.892)	−1.210(1.127)
Digit Span Backward	0.546(0.923)	−0.400(0.853)	−1.824(1.410)
Visual Retention (10 s)	0.636(0.879)	−0.821(1.099)	−1.658(0.773)
Information	0.631(0.844)	−0.607(1.080)	−2.302(1.139)
Word Fluency	0.729(1.178)	−0.255(0.981)	−1.438(0.883)
Mental Control	0.463(0.612)	−0.374(1.197)	−1.715(1.130)
Boston Naming	0.588(0.531)	−0.497(1.635)	−3.072(2.148)
Visual Retention (copy)	0.202(0.667)	−0.551(1.864)	−1.769(2.398)

Open in a new tab

3. PARAMETRIC CONFIDENCE INTERVAL ESTIMATION OF P₂

In this section, we first examine a generalized inference approach for confidence interval estimation of P₂ for normally distributed data. For non-normal data, we propose to apply a Box-Cox type power transformation to the data followed by a generalized inference approach. The generalized variables and generalized pivots were introduced by Tsui and Weerahandi [17] and Weerahandi [18]; see the book by Weerahandi [19] for a detailed discussion. A brief summary of the core concepts is included in the Appendix. The concepts of generalized confidence interval and generalized P-value have been successfully applied to a variety of practical settings where standard exact solutions do not exist for confidence intervals and hypothesis testing. It has been shown that generalized inference approaches typically have good performance, even at small sample sizes; e.g. Weerahandi [20], Weerahandi and Berger [21], Krishnamoorthy and Lu [22], Tian and Cappelleri [23], Tian [24], Li, Liao and Liu [25] and Li, Liao and Liu [26].

3.1 Under the Normal Assumptions

Let Y_1j (j = 1, 2, …, n₁), Y_2j (j = 1, 2, …, n₂), and Y_3j (j = 1, 2, …, n₃) denote the n₁, n₂, n₃ observations for the non-diseased, early stage, and diseased groups respectively. Assume Y_ij(j = 1, 2, …, n_i) follows normal distributions with mean μ_i and variance $σ_{i}^{2}$ for i = 1, 2, 3. Then P₂ defined in (1) can be expressed as follows:

P_{2} = Φ [\frac{μ_{3} - μ_{2} + Φ^{- 1} (1 - P_{3}) σ_{3}}{σ_{2}}] - Φ [\frac{μ_{1} - μ_{2} + Φ^{- 1} (P_{1}) σ_{1}}{σ_{2}}]

(2)

where Φ denotes the cumulative distribution function for the standard normal variable. For the ith group, let Ȳ_i and $S_{i}^{2}$ be the sample mean and the sample variance, and let ȳ_i and $s_{i}^{2}$ denote the corresponding observed values. The P₂ can be estimated as follows:

{P̂}_{2} = Φ [\frac{Ȳ_{3} - Ȳ_{2} + Φ^{- 1} (1 - P_{3}) s_{3}}{s_{2}}] - Φ [\frac{Ȳ_{1} - Ȳ_{2} + Φ^{- 1} (P_{1}) s_{1}}{s_{2}}] .

(3)

It is well-known that

V_{i} = (n_{i} - 1) S_{i}^{2} / σ_{i}^{2} ~ χ_{n_{i} - 1}^{2} .

Therefore, the generalized pivotal quantity for $σ_{i}^{2}$ is

R_{σ_{i}^{2}} = \frac{(n_{i} - 1) s_{i}^{2}}{V_{i}} ~ \frac{(n_{i} - 1) s_{i}^{2}}{χ_{n_{i} - 1}^{2}}, i = 1, 2, 3 .

(4)

Furthermore,

Z_{i} = \frac{{Y̅}_{i} - μ_{i}}{\sqrt{σ_{i}^{2} / n_{i}}} ~ N (0, 1), i = 1, 2, 3.

The generalized pivotal quantity of μ_i is

R_{μ_{i}} = ȳ_{i} - Z_{i} \sqrt{R_{σ_{i}^{2}} / n_{i}}, i = 1, 2, 3 .

(5)

The generalized pivotal quantities for normal mean and variance were first proposed in Krishnamoorthy and Mathew [27]. Finally, the generalized pivotal quantity for P2 is

R_{P_{2}} = Φ [\frac{R_{μ_{3}} - R_{μ_{2}} + Φ^{- 1} (1 - P_{3}) R_{σ_{3}}}{R_{σ_{2}}}] - Φ [\frac{R_{μ_{1}} - R_{μ_{2}} + Φ^{- 1} (P_{1}) R_{σ_{1}}}{R_{σ_{2}}}]

(6)

where $R_{σ_{i}} = \sqrt{R_{σ_{i}^{2}}}$ for i = 1, 2, 3. One can easily check that R_P₂ is a bona fide generalized pivot given the following holds: 1) the distributions of R_P₂ is independent of any unknown parameters; and 2) the observed value of R_P₂ equals to P₂ as defined in equation (2) for given ȳ_i and $s_{i}^{2}$ (i = 1, 2, 3).

Computing Algorithm

Given a normally distributed data set with n₁ non-diseased subject, n₂ subjects at early stage of the disease, and n₃ diseased subjects, the confidence interval for P₂ using generalized inference approach can be obtained via the following steps:

For i = 1, 2, 3, calculate ȳ_i and $s_{i}^{2}$ .
For i = 1, 2, 3, generate independent random numbers V_i from $χ_{n_{i} - 1}^{2}$ , then calculate $R_{σ_{i}^{2}}$ .
For i = 1, 2, 3, generate independent random numbers Z_i from standard normal distributions N(0, 1), and V_i from $χ_{n_{i} - 1}^{2}$ , then calculate R_{μ_i}.
Calculate R_P₂ as in equation (6).
Repeat Steps 2–4 for a total B = 2500 times to obtain a set of values of R_P₂.

Denote R_P₂(α) as the 100αth percentile of R_P₂ ’s. Then (R_P₂(α/2);R_P₂(1 − α/2)) is a two-sided 100(1 − α)% generalized confidence interval of P₂.

3.2 Without the Normal Assumptions

Most of the time the normal assumptions as given in Section 3.1 are not satisfied. For such data, we will examine the use of the generalized inference approach to non-normal data by first applying a Box-Cox transformation to the data and then applying the generalized inference procedure proposed in Section 3.1. Due to the fact that ROC is invariant under monotonic transformation, this type of approach has been found useful in the context of ROC analysis for a wide variety of situations (e.g. Zou et al. [28]; Zou and Hall [30]; Faraggi and Reiser [29]; Fluss et al. [31]; Schisterman et al. [32], [33]; Molodianovitch et al. [34]). By employing a similar technique, we can also show that the P₂ is invariant under monotonic transformations. Let

Y_{i}^{(λ)} = {\begin{matrix} \frac{Y_{i}^{λ} - 1}{λ} & λ, \neq 0 \\ log (Y_{i}), & λ = 0 \end{matrix}

(7)

for i = 1, 2, 3, where it is assumed that $Y_{i}^{(λ)} ~ N (μ_{i}, σ_{i}^{2})$ . Based on the observations of three groups, the loglikelihood function can be readily obtained as follows:

\sum_{i}^{3} \sum_{j}^{n_{i}} [- \frac{1}{2} log (2 π) - \frac{Y_{i j}^{λ} - μ_{i}}{2 σ_{i}^{2}} + (λ - 1) log Y_{i j}] .

(8)

The maximum likelihood estimate (MLE) of λ can be obtained by maximizing the above function. After applying the Box-Cox transformation, the generalized inference approach proposed in Section 3.1 is applied directly to the transformed data.

4. NON-PARAMETRIC CONFIDENCE INTERVAL ESTIMATION OF P₂

The parametric approaches either rely on normality assumption or require solving an equation for Box-Cox transformation. Therefore, it is of also interest to examine the performance of non-parametric approaches for confidence interval estimation of P₂. Three nonparametric methods using bootstrap samples will be considered. The first is bootstrap percentile confidence interval, and the other two are based on the intervals proposed by Agresti and Coull [36] with variance estimated from bootstrap samples.

Assume the distributions for the non-diseased group (i.e. F₁) and the fully diseased group (i.e. F₃) are known. Define $A_{j} = I_{[F_{1}^{- 1} (P_{1}) \leq Y_{2 j} \leq F_{3}^{- 1} (1 - P_{3})]}$ for (j = 1, 2, …, n₂). Therefore A_j’s are Bernoulli random variables with the successful rate $P_{2} = P [F_{1}^{- 1} (P_{1}) \leq Y_{2 j} \leq F_{3}^{- 1} (1 - P_{3})]$ . Let ${P̄}_{2} = \sum_{j = 1}^{n_{2}} A_{j} / n_{2}$ , the standard (1 − α)100% Wald interval for P₂ is

({P̄}_{2} - z_{1 - α / 2} \sqrt{{P̄}_{2} (1 - {P̄}_{2}) / n_{2}}, {P̄}_{2} + z_{1 - α / 2} \sqrt{{P̄}_{2} (1 - {P̄}_{2}) / n_{2}})

where z_1−α/2 stands for 100(1 − α/2)% percentile for standard normal distribution.

In reality, the true distributions F₁ and F₃ are unknown and therefore $F_{1}^{- 1} (P_{1})$ and $F_{3}^{- 1} (1 - P_{3})$ need to be replaced by their sample estimates ${F̂}_{1}^{- 1} (P_{1})$ and ${F̂}_{3}^{- 1} (1 - P_{3})$ . The estimated P̂₂ is given by

{P̂}_{2} = \frac{\sum_{i = 1}^{n_{2}} I_{[{F̂}_{1}^{- 1} (P_{1}) \leq Y_{i} \leq {F̂}_{3}^{- 1} (1 - P_{3})]}}{n_{2}} .

(9)

The estimated 100(1 − α)% Wald interval for P₂ is

({P̂}_{2} - z_{1 - α / 2} \sqrt{{P̂}_{2} (1 - {P̂}_{2}) / n_{2}}, {P̂}_{2} + z_{1 - α / 2} \sqrt{{P̂}_{2} (1 - {P̂}_{2}) / n_{2}}) .

The Wald interval is known to have poor performance, especially for small sample sizes [35].

The bootstrap percentile confidence interval (BTP) use bootstrap samples to compute ${\hat{\bar{P}}}_{2}^{b}$ for b = 1 to 500 bootstrap iterations as follows:

({\hat{\bar{P}}}_{2}^{b} (α), {\hat{\bar{P}}}_{2}^{b} (1 - α))

(10)

where ${\hat{\bar{P}}}_{2}^{b} (α)$ is the 100α% percentile of the bootstrap distribution of ${\hat{\bar{P}}}_{2}$ .

The AC interval, proposed by Agresti and Coull [36], is know to have good performance for binomial proportions. Applying it to our settings, the 100(1 − α)% AC interval for P₂ is:

({P̃}_{2} - z_{1 - α / 2} \sqrt{{\hat{Var}}_{A C} ({P̃}_{2})}, {P̃}_{2} + z_{1 - α / 2} \sqrt{{\hat{Var}}_{A C} ({P̃}_{2})}),

(11)

where

{P̃}_{2} = \frac{\sum_{i = 1}^{n_{2}} I_{[F_{1}^{- 1} (P_{1}) \leq Y_{2 i} \leq F_{3}^{- 1} (1 - P_{3})]} + z_{1 - α / 2}^{2} / 2}{n_{2} + z_{1 - α / 2}^{2} / 2}

(12)

and

{Var}_{A C} ({P̃}_{2}) = \frac{{P̃}_{2} (1 - {P̃}_{2})}{n_{2} + z_{1 - α / 2}^{2} / 2} .

(13)

The estimated P̃₂ is given by

{\hat{\tilde{P}}}_{2} = \frac{\sum_{i = 1}^{n_{2}} I_{[{F̂}_{1}^{- 1} (P_{1}) \leq Y_{i} \leq {F̂}_{3}^{- 1} (1 - P_{3})]} + z_{1 - α / 2}^{2} / 2}{n_{2} + z_{1 - α / 2}^{2} / 2} .

(14)

The estimated variance Var_AC(P̃₂) can be obtained directly by substituting P̃₂ with ${\hat{\tilde{P}}}_{2}$ in equation (13).

Zhou and Qin [15] considered the non-parametric solution for estimating confidence intervals for the sensitivity at a fixed level of specificity of a diagnostic test with binary disease status, and proposed to use bootstrap methods to estimate the variance. This idea can be extended to estimate non-parametric confidence interval of P₂ given P₃ and P₁. Follow the same vein, we will use bootstrap methods to estimate the variance of ${\hat{\tilde{P}}}_{2}$ . With the estimated variance, we then apply Agresti and Coull’s idea to derive confidence intervals for P₂.

Computing Algorithms

Given a data set with n₁ non-diseased subjects, n₂ subjects at early stage of the disease, and n₃ diseased subjects, three nonparametric confidence intervals for P₂ discussed in this section can be obtained by the following algorithm.

For b = 1 to B (it is recommended that B ≥ 200, e.g. [15]. In this paper we use 500) bootstrap iterations,

Draw resamples of sizes n₁, n₂, and n₃ with replacements from the non-diseased sample Y_1j’s, the early stage sample Y_2j’s, and the diseased sample Y_3j’s, respectively. Denote the bootstrap samples ${Y_{i j}^{b}}$ , i = 1, 2, 3, j = 1, 2, …, n_i.
Calculate the bootstrap version of ${\hat{\bar{P}}}_{2}^{b}$ and ${\hat{\tilde{P}}}_{b}^{2}$ according to (9) and (14) respectively.

The bootstrap percentile confidence interval (BTP) in (10) can be obtained by using the array ${\hat{\bar{P}}}_{2}^{b}$ (b = 1, …, 500).

The proposed bootstrap variance estimator of ${\hat{\tilde{P}}}_{2}$ is defined as:

{\hat{Var}}^{boot} ({\hat{\tilde{P}}}_{2}) = \frac{1}{B - 1} \sum_{b = 1}^{B} {({\hat{\tilde{P}}}_{2}^{b} - {\bar{\hat{\tilde{P}}}}_{2}^{b})}^{2},

(15)

where ${\bar{\hat{\tilde{P}}}}_{2}^{b} = (1 / B) \sum_{b = 1}^{B} {\hat{\tilde{P}}}_{2}^{b}$ . Similarly as Zhou and Qin [15], we propose two Agresti and Coull’s confidence intervals for P₂ given P₁ and P₃. The first (1 − α)100% level interval, called BTI interval, is defined as

({\hat{\tilde{P}}}_{2} - z_{1 - α} \sqrt{{\hat{Var}}^{boot} ({\hat{\tilde{P}}}_{2})}, {\hat{\tilde{P}}}_{2} + z_{1 - α} \sqrt{{\hat{Var}}^{boot} ({\hat{\tilde{P}}}_{2})})

(16)

where ${\hat{\tilde{P}}}_{2}$ is defined in (14). The second 100(1 − α)% level interval, called BTII interval, for P₂ is defined by

({\bar{\hat{\tilde{P}}}}_{2}^{b} - z_{1 - α} \sqrt{{\hat{Var}}^{boot} ({\hat{\tilde{P}}}_{2})}, {\bar{\hat{\tilde{P}}}}_{2}^{b} + z_{1 - α} \sqrt{{\hat{Var}}^{boot} ({\hat{\tilde{P}}}_{2})}) .

(17)

5. SIMULATION STUDIES

Simulation studies are carried out to assess the coverage probabilities of the proposed confidence interval estimations (the generalized inference approach, the generalized inference approach with Box-Cox transformation, the percentile bootstrap interval (BTP), and two intervals based on Agresti and Coull’s paper, namely BTI and BTII) for P₂ under different distributional assumptions: normal, beta and gamma. The AC interval proposed in equ. (11) using estimated P̃₂ and variance Var_AC(P̃₂) has poor coverage accuracy and therefore is not considered here. Beta and gamma distributions are used as representatives of non normal distributions because they are widely used in practical application and also because they come with a variety of shapes.

To represent a wide rage of sample size settings, (n₁, n₂, n₃) is set as (10, 10, 10), (30, 30, 30), (20, 10, 10), (30, 20, 10), (50, 30, 30) and (50, 50, 50). With a fixed 80% specificity and a fixed 80% sensitivity to full disease, the parameters are chosen correspondingly so that P₂ equals to 50%, 70%, 80% and 90% respectively. For each parameter setting, 5,000 random samples are generated and the parametric and non-parametric confidence intervals proposed in Sections 3 and 4 are obtained. The simulation results are presented in Tables 2–4 for the bootstrap percentile approach (BTP), and the BTII approach and the generalized inference approach (without or with Box-Cox transformation). The BTI approach is not presented due to the fact that it is constantly inferior to BTII approach. The coverage probabilities, the coverage errors for lower and upper tails, i.e. the proportion of runs in which the lower (or upper) limit of the confidence interval excluding the true P₂ at nominal level, and the average lengths of proposed confidence intervals are presented.

Table 2.

Summary of approximate 95% two-sided confidence bounds of BTP, BTII and GI for P₂ under normal distributions (based on 5,000 simulations)

	Three Independent Normal Distributions
	(μ₁, σ₁) = (0, 1)′, (μ₂, σ₂) = (2.5, 1.1)′, (μ₃, σ₃) = (3.69, 1.2)′, P₂ = 0.5
	Coverage Probability			Lower Tail			Upper Tail			Length of CI
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9766	0.9360	0.9606	0.0198	0.0502	0.0038	0.0036	0.0138	0.0356	0.8086	0.6364	0.6943
(30, 30, 30)	0.9792	0.9602	0.9572	0.0146	0.0276	0.0110	0.0062	0.0122	0.0318	0.5570	0.5117	0.4330
(20, 10, 10)	0.9774	0.9398	0.9632	0.0190	0.0494	0.0074	0.0036	0.0108	0.0294	0.8013	0.6280	0.6775
(50, 30, 30)	0.9740	0.9464	0.9490	0.0194	0.0382	0.0182	0.0066	0.0154	0.0328	0.5498	0.5032	0.4239
(50, 50, 50)	0.9752	0.9446	0.9564	0.0174	0.0338	0.0130	0.0074	0.0216	0.0306	0.4423	0.3121	0.3356
	(μ₁, σ₁) = (0, 1)′, (μ₂, σ₂) (2.5, 1.1)′, (μ₃, σ₃) = (4.31, 1.2)′, P₂ = 0.7
	Coverage Probability			Lower Tail			Upper Tail			Length of CI
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9834	0.9578	0.9476	0.0146	0.0242	0.0042	0.0020	0.0180	0.0482	0.7278	0.5694	0.6888
(30, 30, 30)	0.9772	0.9622	0.9536	0.0160	0.0236	0.0130	0.0068	0.0142	0.0334	0.4850	0.4445	0.3911
(20, 10, 10)	0.9826	0.9578	0.9506	0.0158	0.0238	0.0042	0.0016	0.0184	0.0452	0.7120	0.5531	0.6578
(50, 30, 30)	0.9786	0.9618	0.9546	0.0164	0.0256	0.0122	0.0050	0.0126	0.0332	0.4764	0.4352	0.3810
(50, 50, 50)	0.9752	0.9532	0.9498	0.0182	0.0254	0.0118	0.0066	0.0214	0.0384	0.3823	0.2681	0.3003
	(μ₁, σ₁) = (0, 1)′, (μ₂, σ₂) = (2.5, 1.1)′, (μ₃, σ₃) = (4.73, 1.2)′, P₂ = 0.8
	Coverage Probability			Lower Tail			Upper Tail			Length of CI
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9720	0.9568	0.9358	0.0266	0.0206	0.0018	0.0014	0.0226	0.0624	0.6372	0.4999	0.6431
(30, 30, 30)	0.9848	0.9720	0.9464	0.0116	0.0142	0.0070	0.0036	0.0138	0.0466	0.4091	0.3704	0.3336
(20, 10, 10)	0.9684	0.9566	0.9360	0.0298	0.0212	0.0048	0.0018	0.0222	0.0592	0.6030	0.4771	0.6024
(50, 30, 30)	0.9812	0.9678	0.9468	0.0156	0.0146	0.0120	0.0032	0.0176	0.0412	0.3951	0.3608	0.3217
(50, 50, 50)	0.9800	0.9620	0.9478	0.0126	0.0214	0.0142	0.0074	0.0166	0.0380	0.3179	0.2188	0.2524
	(μ₁, σ₁) = (0, 1)′, (μ₂, σ₂) = (2.5, 1.1)′, (μ₃, σ₃) = (5.51, 1.2)′, P₂ = 0.9
	Coverage Probability			Lower Tail			Upper Tail			Length of CI
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9468	0.8928	0.9310	0.0530	0.0000	0.0038	0.0002	0.1072	0.0652	0.4583	0.3639	0.5250
(30, 30, 30)	0.9820	0.9704	0.9430	0.0150	0.0114	0.0106	0.0030	0.0182	0.0464	0.2772	0.2591	0.2399
(20, 10, 10)	0.9244	0.8740	0.9216	0.0752	0.0000	0.0056	0.0004	0.1260	0.0728	0.4277	0.3454	0.4711
(50, 30, 30)	0.9828	0.9704	0.9420	0.0144	0.0076	0.0110	0.0028	0.0220	0.0470	0.2626	0.2474	0.2262
(50, 50, 50)	0.9856	0.9558	0.9512	0.0118	0.0192	0.0098	0.0026	0.0250	0.0390	0.2160	0.1464	0.1755

Open in a new tab

BTP: The confidence interval based on bootstrap percentiles.

BTII: The confidence interval presented in equ (15).

GI: The generalized confidence interval for Box-Cox transferred data.

Low tail (upper tail): One-sided coverage errors, i.e. the proportion of runs in which the lower (or upper) limit of the confidence interval excluded the true P₂ at nominal level 0.025.

Length of CI: the average length of two-sided confidence intervals for P₂.

Table 4.

Summary of approximate 95% two-sided confidence bounds of BTP, BTII and transformed GI for P₂ under gamma distributions (based on 5,000 simulations).

	Three Independent Gamma Distributions
	(α₁, β₁) = (1, 6)′, (α₂, β₂) = (4, 6)′, (α₃, β₃) = (6.2, 6)′, P₂ = 0.5
	Coverage Probability			Lower Tail			Upper Tail			Length of CI
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9674	0.9498	0.9628	0.0306	0.0358	0.0038	0.0020	0.0144	0.0334	0.8131	0.6409	0.7133
(30, 30, 30)	0.9762	0.9632	0.9628	0.0176	0.0216	0.0130	0.0062	0.0152	0.0242	0.5550	0.5021	0.4271
(20, 10, 10)	0.9656	0.9518	0.9644	0.0318	0.0350	0.0074	0.0026	0.0132	0.0282	0.7986	0.6246	0.6627
(50, 30, 30)	0.9696	0.9620	0.9586	0.0234	0.0254	0.0172	0.0070	0.0126	0.0242	0.5347	0.4876	0.3988
(50, 50, 50)	0.9734	0.9654	0.9548	0.0176	0.0206	0.0174	0.0090	0.0140	0.0278	0.4370	0.4138	0.3231
	(α₁, β₁) = (1, 6)′, (α₂, β₂) = (4, 6)′, (α₃, β₃) = (7.7, 6)′, P₂ = 0.7
	Coverage Probability			Lower Tail			Upper Tail			Length of CI
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9634	0.9600	0.9578	0.0342	0.0228	0.0026	0.0024	0.0172	0.0396	0.7463	0.5823	0.7318
(30, 30, 30)	0.9726	0.9696	0.9638	0.0204	0.0136	0.0088	0.0070	0.0168	0.0274	0.4905	0.4476	0.3874
(20, 10, 10)	0.9640	0.9652	0.9536	0.0340	0.0184	0.0048	0.0020	0.0164	0.0416	0.7179	0.5565	0.6386
(50, 30, 30)	0.9712	0.9660	0.9594	0.0224	0.0186	0.0108	0.0064	0.0154	0.0298	0.4667	0.4216	0.3571
(50, 50, 50)	0.9714	0.9736	0.9594	0.0208	0.0148	0.0148	0.0078	0.0116	0.0258	0.3819	0.3622	0.2905
	(α₁, β₁) = (1, 6)′, (α₂, β₂) = (4, 6)′, (α₃, β₃) = (9.0, 6)′, P₂ = 0.8
	Coverage Probability			Lower Tail			Upper Tail			Length of CI
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9680	0.9572	0.9570	0.0228	0.0190	0.0022	0.0092	0.0238	0.0408	0.6473	0.5053	0.6967
(30, 30, 30)	0.9776	0.9760	0.9582	0.0136	0.0104	0.0084	0.0088	0.0136	0.0334	0.4166	0.3777	0.3378
(20, 10, 10)	0.9742	0.9626	0.9490	0.0162	0.0136	0.0036	0.0096	0.0238	0.0474	0.6096	0.4795	0.5771
(50, 30, 30)	0.9740	0.9738	0.9508	0.0136	0.0116	0.0102	0.0124	0.0146	0.0390	0.3864	0.3529	0.3053
(50, 50, 50)	0.9746	0.9744	0.9554	0.0104	0.0122	0.0118	0.0150	0.0134	0.0328	0.3213	0.3055	0.2518
	(α₁, β₁) = (1, 6)′, (α₂, β₂) = (4, 6)′, (α₃, β₃) = (12.1, 6)′, P₂ = 0.9
	Coverage Probability			Lower Tail			Upper Tail			Length of CI
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9292	0.8888	0.9526	0.0626	0.0000	0.0056	0.0082	0.1112	0.0418	0.4958	0.3872	0.6012
(30, 30, 30)	0.9754	0.9704	0.9568	0.0146	0.0100	0.0130	0.0100	0.0196	0.0302	0.3072	0.2840	0.2696
(20, 10, 10)	0.9264	0.8924	0.9502	0.0670	0.0000	0.0072	0.0066	0.1076	0.0426	0.4332	0.3544	0.4592
(50, 30, 30)	0.9726	0.9678	0.9598	0.0138	0.0122	0.0142	0.0136	0.0200	0.0260	0.2740	0.2561	0.2305
(50, 50, 50)	0.9678	0.9706	0.9520	0.0206	0.0120	0.0184	0.0116	0.0174	0.0296	0.2401	0.2295	0.1973

Open in a new tab

BTP: The confidence interval based on bootstrap percentiles.

BTII: The confidence interval presented in equ (15).

GI: The generalized confidence interval for Box-Cox transferred data.

Low tail (upper tail): One-sided coverage errors, i.e. the proportion of runs in which the lower (or upper) limit of the confidence interval excluded the true P₂ at nominal level 0.025.

Length of CI: the average length of two-sided confidence intervals for P₂.

Table 2 presents simulation results under normal assumption. The bootstrap percentile approach tends to slightly overestimate the coverage probability except it underestimates the coverage probabilities for small sample unbalanced case as P₂ = 0.9; while the confidence intervals by the BTII method are reasonably close to the nominal level for most of the scenarios except sometimes they tend to be liberal. The generalized confidence intervals are the most accurate despite the fact they tend to slightly underestimate the coverage probabilities when sample sizes are small and P₂ is large. Regardless of sample sizes and true values of P₂, the bootstrap percentile confidence interval has the longest length.

In Tables 3 and 4, we present simulation results at the nominal level of 95% for the beta and gamma distributions respectively. For beta and gamma distributions, simulation study shows the Box-Cox transferred data generally satisfies normality. Generally speaking, the Box-Cox transformed generalized approach gives uniformly good coverage probabilities for all cases, except that it might be slightly conservative when the sample sizes are small. The bootstrap percentile confidence interval are generally conservative except for small sample scenarios when P₂ = 0.9. The BTII approach performs well except that it tends to be liberal for small sample scenarios when P₂ = 0.9. Similarly as normal cases, the bootstrap percentile confidence intervals have the longest length.

Table 3.

Summary of approximate 95% two-sided confidence bounds of BTP, BTII and transformed GI for P₂ under beta distributions (based on 5,000 simulations).

	Three Independent Beta Distributions
	(α₁, β₁) = (1, 6)′, (α₂, β₂) = (6, 6)′, (α₃, β₃) = (9.6, 6)′, P₂ = 0.5
	Coverage Probability			Lower Tail			Upper Tail			Length of CI
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9722	0.9428	0.9664	0.0146	0.0392	0.0076	0.0132	0.0180	0.0260	0.7917	0.6168	0.6997
(30, 30, 30)	0.9744	0.9608	0.9484	0.0134	0.0260	0.0332	0.0122	0.0132	0.0184	0.5228	0.4725	0.3852
(20, 10, 10)	0.9728	0.9510	0.9680	0.0154	0.0350	0.0148	0.0118	0.0140	0.0172	0.7770	0.6055	0.6396
(50, 30, 30)	0.9728	0.9582	0.9436	0.0170	0.0264	0.0432	0.0102	0.0154	0.0132	0.5079	0.4611	0.3703
(50, 50, 50)	0.9672	0.9608	0.9324	0.0202	0.0240	0.0546	0.0126	0.0152	0.0130	0.4077	0.3851	0.2907
	(α₁, β₁) = (1, 6)′, (α₂, β₂) = (6, 6)′, (α₃, β₃) = (12.6, 6)′, P₂ = 0.7
	Coverage Probability			Lower Tail			Upper Tail			Length of CI
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9706	0.9554	0.9740	0.0154	0.0212	0.0038	0.0140	0.0234	0.0222	0.7172	0.5571	0.7024
(30, 30, 30)	0.9710	0.9654	0.9692	0.0148	0.0164	0.0158	0.0142	0.0182	0.0150	0.4589	0.4178	0.3366
(20, 10, 10)	0.9810	0.9602	0.9744	0.0108	0.0214	0.0062	0.0082	0.0184	0.0194	0.6924	0.5361	0.5904
(50, 30, 30)	0.9720	0.9630	0.9622	0.0134	0.0188	0.0278	0.0146	0.0182	0.0100	0.4422	0.4035	0.3156
(50, 50, 50)	0.9696	0.9662	0.9580	0.0148	0.0188	0.0316	0.0156	0.0150	0.0104	0.3531	0.3360	0.2489
	(α₁, β₁) = (1, 6)′, (α₂, β₂) = (6, 6)′, (α₃, β₃) = (15.3, 6)′, P₂ = 0.8
	Coverage Probability			Lower Tail			Upper Tail			Upper Tail
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9672	0.9520	0.9708	0.0256	0.0170	0.0032	0.0072	0.0310	0.0260	0.6234	0.4873	0.6605
(30, 30, 30)	0.9788	0.9708	0.9732	0.0128	0.0112	0.0092	0.0084	0.0180	0.0176	0.3841	0.3520	0.2907
(20, 10, 10)	0.9648	0.9470	0.9714	0.0280	0.0202	0.0046	0.0072	0.0328	0.0240	0.5808	0.4568	0.5249
(50, 30, 30)	0.9726	0.9656	0.9756	0.0176	0.0152	0.0106	0.0098	0.0192	0.0138	0.3661	0.3370	0.2683
(50, 50, 50)	0.9736	0.9656	0.9760	0.0140	0.0132	0.0108	0.0124	0.0212	0.0132	0.2976	0.2812	0.2126
	(α₁, β₁) = (1, 6)′, (α₂, β₂) = (6, 6)′, (α₃, β₃) = (20.4, 6)′, P₂ = 0.9
	Coverage Probability			Lower Tail			Upper Tail			Upper Tail
Sample Sizes	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI	BTP	BTII	GI
(10, 10, 10)	0.9334	0.8802	0.9482	0.0574	0.0000	0.0028	0.0092	0.1198	0.0490	0.4763	0.3805	0.6004
(30, 30, 30)	0.9710	0.9734	0.9554	0.0164	0.0092	0.0046	0.0126	0.0174	0.0400	0.2853	0.2622	0.2336
(20, 10, 10)	0.9268	0.8714	0.9528	0.0658	0.0000	0.0048	0.0074	0.1286	0.0424	0.4287	0.3458	0.4404
(50, 30, 30)	0.9774	0.9648	0.9590	0.0140	0.0100	0.0058	0.0086	0.0252	0.0352	0.2611	0.2435	0.2104
(50, 50, 50)	0.9716	0.9740	0.9556	0.0154	0.0098	0.0054	0.0130	0.0162	0.0390	0.2163	0.2071	0.1662

Open in a new tab

BTP: The confidence interval based on bootstrap percentiles.

BTII: The confidence interval presented in equ (15).

GI: The generalized confidence interval for Box-Cox transferred data.

Low tail (upper tail): One-sided coverage errors, i.e. the proportion of runs in which the lower (or upper) limit of the confidence interval excluded the true P₂ at nominal level 0.025.

Length of CI: the average length of two-sided confidence intervals for P₂.

In summary, as the normality is satisfied for either original data or the transformed data, the parametric approaches, i.e. the generalized approach or the Box-Cox transformed generalized approach can generally provide confidence intervals with satisfactory coverage probabilities. Although the generalized approach is simple to use, the Box-Cox transformation involves solving equation. On the other hand, when the normality assumption can not be met, the BTII approach is a good choice except the scenarios with large P₂ and small sample sizes, for which the bootstrap percentile approach can provide reasonable confidence intervals.

6. EXAMPLE: REVISITED

In this section, the confidence intervals of the probabilities of detecting early AD (P₂) for all neuropsychometric tests in the data set of Alzheimer’s disease from a study at the Washington University (WU) Alzheimer’s Disease Research Center (ADRC) are estimated by the proposed parametric and nonparametric approaches. The details of the data set are presented in Section 2 and the summary statistics of neuropsychometric tests by three diagnostic groups are presented in Table 1.

A close look at the data using Shapiro-Wilk normality test shows that the frontal factor and temporal factor satisfy normality, while parietal factor, associate learning and word fluency satisfy normality after a Box-Cox transformation. For these variables, the generalized inference approach or the Box-Cox transformed generalized inference are recommended. For the rest of variables, the confidence intervals of P₂ can be obtained by both BTP and BTII methods. The specificity P₁ and sensitivity to full disease P₃ are fixed at 0.8. For comparison purpose, the confidence intervals by all the proposed approaches are presented in Table 5. Furthermore, for each variable, the nonparametric and parametric (with or without Box-Cox transformation) point estimates in (3) and (9) are calculated, and the corresponding most appropriate point estimates are highlighted in Table 5.

Table 5.

Estimated confidence intervals for the probability of detecting early stage Alzheimer's disease using psychometric tests from WU ADRC (sensitivity to full disease and specificity are assumed to equal to 0.8).

Confidence Intervals for the test covariates

Non-parametric

Parametric

BTP

BTII

Normal GI

Box-cox GI

Variables

{P̂}_{2}^{N P}

{P̂}_{2}^{P}

{P̂}_{2}^{Box P}

Global factor

0.7727

0.5718

0.8875

0.5788

0.9371

0.6107

0.3594

0.7591

0.6267

0.3533

0.7738

Frontal factor^*

0.5718

0.1986

0.7440

0.1977

0.7842

0.4340

0.1997

0.6058

0.4375

0.2189

0.5987

Parietal factor^**

0.3708

0.1412

0.6866

0.1132

0.6877

0.2290

0.0000

0.4834

0.2499

0.0000

0.4650

Temporal factor^*

0.7440

0.5431

0.8875

0.5527

0.9343

0.6330

0.3926

0.7862

0.6337

0.4128

0.7952

Associate Learning^**

0.6005

0.2123

0.7440

0.2401

0.7961

0.3383

0.0695

0.5338

0.3436

0.0854

0.5501

Logical Memory

0.7153

0.4856

0.8875

0.5109

0.9225

0.6749

0.4883

0.7985

0.6739

0.4984

0.8021

Digit Span Forward

0.3708

0.0551

0.7153

0.1206

0.7159

0.1939

0.0000

0.4161

0.2001

0.0000

0.4057

Digit Span Backward

0.3421

0.0838

0.8588

0.1139

0.8959

0.2050

0.0000

0.4697

0.2145

0.0000

0.4502

Visual Retention (10 s)

0.7153

0.2273

0.8301

0.3120

0.9332

0.3336

0.1198

0.4961

0.3466

0.1231

0.5112

Information

0.4282

0.1699

0.8014

0.2112

0.8062

0.4387

0.1821

0.6181

0.4378

0.1943

0.6105

Word Fluency^**

0.2273

0.0551

0.5718

0.0000

0.5490

0.1643

0.0000

0.3785

0.1799

0.0000

0.3817

Boston Naming

0.3421

0.1412

0.6866

0.0723

0.7036

0.3270

0.0562

0.5002

0.3406

0.0810

0.5206

Mental Control

0.4569

0.0551

0.6579

0.0395

0.7649

0.2369

0.0153

0.4039

0.2375

0.0361

0.4135

Visual Retention (copy)

0.4856

0.0551

0.7877

0.0000

0.8868

0.0000

0.0905

0.0000

0.1039

Open in a new tab

BTP: The confidence interval based on bootstrap percentiles.

BTII: Confidence interval is computed by the BTII approach when normality for the data or the Box-Cox transformed data is not satisfied.

GI: The normality is satisfied. Confidence interval is computed by the generalized inference approach.

Box-Cox: The normality of Box-Cox transformed data is satisfied. Confidence interval is computed by the Box-Cox generalized inference approach.

${P̂}_{2}^{N P}$ : The nonparametric estimation of P₂ in equ (9).

${P̂}_{2}^{P}$ : The parametric estimation based on normality of P₂ in equ (3).

${P̂}_{2}^{Box P}$ : The parametric estimation based on normality after Box-Cox transformation of P₂.

: Normality is satisfied. The generalized confidence interval is preferred.

^**

: Normality is satisfied for Box-Cox transformed data. The generalized confidence interval for transformed data is preferred.

Note: The most appropriate point estimates are highlighted.

From the point estimates and estimated confidence intervals, we can see that the global factor has the best diagnostic accuracy for identifying early stage dementia, followed by logical memory and visual retention, while word fluency and parietal factor have very poor ability for early stage diagnosis.

7. SUMMARY AND DISCUSSION

The probability of detecting early disease stage (P₂) when disease processes involve three ordinal disease stages given sensitivity to the full disease (P₃) and specificity (P₁) can serve as another diagnostic measure. This article aims to examine the performance of several parametric and nonparametric approaches for confidence interval estimation of P₂ given P₁ and P₃ and to make recommendations about what procedures are most appropriate to use under different scenarios. These methods can be applied to identify important makers for the detection of early stage disease (e.g. preclinical AD) which is usually the most important stage of the disease for intervention. As the simulations results indicate, the parametric approaches generally perform satisfactorily. Out of non-parametric approaches, the bootstrap percentile approach generally slightly overestimate the coverage probabilities, while the BTII method is a good choice except the scenarios with large P₂, for which the bootstrap percentile approach can provide reasonable confidence intervals.

Based on the simulation studies, the following recommendations are made. First, if normality is satisfied for either original or transformed data, we suggest the generalized inference approach. This approach is easy to use, and has good coverage probability even for small sample sizes and unbalanced sample sizes. Second, if the normality assumption is not met, the nonparametric BTII approach works well for most scenarios; however, if the estimated P₂ is large and the sample sizes are small, we recommended the use of the bootstrap percentile approach (BTP). Furthermore, as sample sizes are ≥ 50, the BTII approach is recommended for normal distribution and the generalized inference approach is recommended for Beta and Gamma distributions due to the following facts: 1) BTII intervals have the shortest length and satisfactory coverage probabilities for normal distribution; 2) the generalized inference approach has the best coverage probabilities and the shortest confidence interval for Beta and Gamma.

All of the proposed approaches are simulation-based approach. The generalized inference approach based on normality is an easy-to-use approach while the Box-Cox transformed generalized inference approach involves solving an equation. The nonparametric approaches are simple except that the variance is computed through bootstrap samples. A R-program is available upon request from ltian@buffalo.edu.

Acknowledgments

The work by Dr. Xiong was partly supported by grants NIH/NIA R01 AG029672, AG003991, AG005681, and AG026276 from the National Institute on Aging and grant NIRG-08-91082 from the Alzheimer’s Association.

Appendix

Generalized Pivots and Generalized Test variables

In the following, the basic concepts for generalized inference developed by Tsui & Weerahandi [17] and Weerahandi [18] are described.

Suppose that Y = (Y₁, Y₂, …, Y_n)′ form a random sample from a distribution which depends on the parameters θ = (ψ, ν) where ψ is the parameter of interest and ν′ is a vector of nuisance parameters. A generalized pivot R(Y; y, ψ, ν), where y is a observed value of Y, for interval estimation defined in Weerahandi [18], has the following two properties:

R(Y; y, ψ, ν) has a distribution free of unknown parameters.
The value of R(y; y, ψ, ν) is ψ.

Let that R_α be the 100αth percentile of R. Then R_α becomes the 100(1 − α)% lower bound for ψ and (R_α/2, R_1−α/2) becomes a 100(1 − α)% two-sided generalized confidence interval for ψ.

Now consider testing H₀ : ψ = ψ₀ vs. H₁ : ψ > ψ₀ where ψ₀ is a specified quantity. A generalized test variable of the form T(Y; y, ψ, ν), where y is an observed value of Y, is chosen to satisfy the following three conditions (Tsui & Weerahandi [17]) :

For fixed y, the distribution of T(Y; y, ψ, ν) is free of the vector of nuisance parameters ν.
The value of T(Y; y, ψ, ν) at Y = y is free of any unknown parameters.
For fixed y and ν, and for all t, Pr[T(Y; y, ψ, ν) > t] is either an increasing or a decreasing function of ψ.

A generalized extreme region is defined as C = [Y : T(Y; y, ψ, ν) ≥ T(y; y, ψ, ν)] if T(Y; y, ψ, ν) is stochastically increasing in ψ. If T(Y; y, ψ, ν) is stochastically decreasing in ψ, a generalized extreme region is defined as C = [Y : T(Y; y, ψ, ν) ≤ T(y; y, ψ, ν)]. Then the generalized P-value is defined as P(C|ψ₀).

References

1.Shapiro D. The interpretation of diagnostic tests. Statistical Methods in Medical Research. 1999;8:113–134. doi: 10.1177/096228029900800203. [DOI] [PubMed] [Google Scholar]
2.Zhou X-H, Obuchowski N, McClish D. Statistical Methods in Diagnostic Medicine. New York: Wiley; 2002. [Google Scholar]
3.Pepe MS. The statistical evaluation of medical tests for classification and prediction. Vol. 28. Oxford Statistical Science Series; 2003. [Google Scholar]
4.Xiong CJ, Gerald VB, Philip M, John CM. Measuring and estimating diagnostic accuracy when there are three ordinal diagnostic groups. Statistics in Medicine. 2006;25:1251–1273. doi: 10.1002/sim.2433. [DOI] [PubMed] [Google Scholar]
5.Mossman D. Three-way ROCs. Medical Decision Making. 1999;19:78–89. doi: 10.1177/0272989X9901900110. [DOI] [PubMed] [Google Scholar]
6.Dreiseitl S, Ohno-Machado L, Binder M. Comparing three-class diagnostic tests by three-way ROC analysis. Medical Decision Making. 2000;20:323–331. doi: 10.1177/0272989X0002000309. [DOI] [PubMed] [Google Scholar]
7.Heckerling PS. Parametric three-way receiver operating characteristic surface analysis using Mathematica. Medical Decision Making. 2001;21:409–417. doi: 10.1177/0272989X0102100507. [DOI] [PubMed] [Google Scholar]
8.Nakas C, Yiannoutsos C. Ordered multiple-class ROC analysis with continuous measurement. Statistics in Medicine. 2004;23:3437–3449. doi: 10.1002/sim.1917. [DOI] [PubMed] [Google Scholar]
9.Nakas C, Alonzo T. ROC graphs for assessing the ability of a diagnostic marker to detect three disease classes with an umbrella ordering. Biometrics. 2007;63:603–609. doi: 10.1111/j.1541-0420.2006.00715.x. [DOI] [PubMed] [Google Scholar]
10.Alonzo T, Nakas C. Comparison of ROC umbrella volumes with an application to the assessment of lung cancer diagnostic markers. Biometrical Journal. 2007;49:654–664. doi: 10.1002/bimj.200610363. [DOI] [PubMed] [Google Scholar]
11.Xiong C, van Belle G, Miller JP, Yan Y, Gao F, Feng S, Yu K, Morris JC. A parametric comparison of diagnostic accuracy with three ordinal diagnostic groups. Biometrical Journal. 2007;49(5):682–693. doi: 10.1002/bimj.200610359. [DOI] [PubMed] [Google Scholar]
12.Tian L, Xiong C, Lai C, Vexler A. Exact confidence interval estimation for the difference in diagnostic accuracy with three ordinal diagnostic groups. Journal of Statistical Planning and Inference. 2010;141:549–558. doi: 10.1016/j.jspi.2010.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Linnet K. Comparison of quantitative diagnostic tests: type I error, power, and sample size. Statistics in Medicine. 1987;6:147–158. doi: 10.1002/sim.4780060207. [DOI] [PubMed] [Google Scholar]
14.Platt RW, Hanley JA, Yang H. Bootstrap confidence intervals for the sensitivity of a quantitative diagnostic test. Statistics in Medicine. 2000;19:313–322. doi: 10.1002/(sici)1097-0258(20000215)19:3<313::aid-sim370>3.0.co;2-k. [DOI] [PubMed] [Google Scholar]
15.Zhou X-H, Qin GS. Improved Confidence Intervals for the sensitivity to full disease at a fixed Level of specificity of a continuous-scale diagnostic test. Statistics in Medicine. 2005;24:465–477. doi: 10.1002/sim.1563. [DOI] [PubMed] [Google Scholar]
16.Morris JC. The clinical dementia rating (CDR): current version and scoring rules. Neurology. 1993;43:2412–2414. doi: 10.1212/wnl.43.11.2412-a. [DOI] [PubMed] [Google Scholar]
17.Tsui KW, Weerahandi S, et al. Generalized P-values in significance testing of hypotheses in the presence of nuisance parameters. Journal of American Statistical Association. 1989;84:602–607. (1989). [Google Scholar]
18.Weerahandi S. Generalized confidence intervals. Journal of American Statistical Association. 1993;88:899–905. [Google Scholar]
19.Weerahandi S. Exact Statistical Methods for Data Analysis. New York: Springer; 2003. [Google Scholar]
20.Weerahandi S. ANOVA under unequal error variances. Biometrics. 1995;51:589–599. [Google Scholar]
21.Weerahandi S, Berger VW. Exact inference for growth curves with intraclass correlation structure. Biometrics. 1999;55:921–924. doi: 10.1111/j.0006-341x.1999.00921.x. [DOI] [PubMed] [Google Scholar]
22.Krishnamoorthy K, Lu Y. Inference on the common means of several normal populations based on the generalized variable method. Biometrics. 2003;59:237–247. doi: 10.1111/1541-0420.00030. [DOI] [PubMed] [Google Scholar]
23.Tian L, Cappelleri JC. A new approach for interval estimation and hypothesis testing of a certain intraclass correlation coefficient: the generalized variable method. Statistics in Medicine. 2004;23:2125–2135. doi: 10.1002/sim.1782. [DOI] [PubMed] [Google Scholar]
24.Tian L. Confidence intervals for P(Y1 > Y2) with normal outcomes in linear models. Statistics in Medicine. 2008;27:4221–4237. doi: 10.1002/sim.3290. [DOI] [PubMed] [Google Scholar]
25.Li C, Liao C, Liu J. On the exact interval estimation for the difference in paired areas under the ROC curves. Statistics in Medicine. 2008;27:224–242. doi: 10.1002/sim.2760. [DOI] [PubMed] [Google Scholar]
26.Li C, Liao C, Liu J. A non-inferiority test for diagnostic accuracy based on the paired partial areas under ROC curves. Statistics in Medicine. 2008;27:1762–1776. doi: 10.1002/sim.3121. [DOI] [PubMed] [Google Scholar]
27.Krishnamoorthy K, Mathew T. Inferences on the means of lognormal distributions using generalized p-values and generalized confidence intervals. Journal of Statistical Planning and Inference. 2003;115:103–121. [Google Scholar]
28.Zou KH, Hall WJ, Shapiro DE. Smooth Non-Parametric Receiver Operating Characteristic (ROC) Curves for Continuous Diagnostic Tests. Statistics in Medicine. 1997;16:2143–2156. doi: 10.1002/(sici)1097-0258(19971015)16:19<2143::aid-sim655>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
29.Faraggi D, Reiser B. Estimation of the Area Under the ROC Curve. Statistics in Medicine. 2002;21:3093–3106. doi: 10.1002/sim.1228. [DOI] [PubMed] [Google Scholar]
30.Zou KH, Hall WJ. Two transformation models for estimating an ROC curve derived from continuous data. Journal of Applied Statistics. 2000;27(5):621–631. [Google Scholar]
31.Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cut-off point. Biometrical Journal. 2005;47:458–472. doi: 10.1002/bimj.200410135. [DOI] [PubMed] [Google Scholar]
32.Schisterman E, Faraggi D, Reiser B. Adjusting the Generalized ROC Curve for Covariates. Statistics in Medicine. 2004;23:3319–3331. doi: 10.1002/sim.1908. [DOI] [PubMed] [Google Scholar]
33.Schisterma E, Reiser B, Faraggi D. ROC analysis for markers with mass at zero. Statistics in Medicine. 2006;25:623–638. doi: 10.1002/sim.2301. [DOI] [PubMed] [Google Scholar]
34.Molodianovitch K, Faraggi D, Reiser B. Comparing the Areas Under Two Correlated ROC Curves: Parametric and Non-Parametric Approaches. Biometrical Journal. 2007;5:745–757. doi: 10.1002/bimj.200610223. [DOI] [PubMed] [Google Scholar]
35.Brown LD, Cai TT, DasGupta A. Interval estimation for a binomial proportion. Statistical Science. 2001;16:101–117. [Google Scholar]
36.Agresti A, Coull BA. Approximate is better than “exact” for interval estimation of Binomial proportions. The American Statistician. 1998;52:119–126. [Google Scholar]

[R1] 1.Shapiro D. The interpretation of diagnostic tests. Statistical Methods in Medical Research. 1999;8:113–134. doi: 10.1177/096228029900800203. [DOI] [PubMed] [Google Scholar]

[R2] 2.Zhou X-H, Obuchowski N, McClish D. Statistical Methods in Diagnostic Medicine. New York: Wiley; 2002. [Google Scholar]

[R3] 3.Pepe MS. The statistical evaluation of medical tests for classification and prediction. Vol. 28. Oxford Statistical Science Series; 2003. [Google Scholar]

[R4] 4.Xiong CJ, Gerald VB, Philip M, John CM. Measuring and estimating diagnostic accuracy when there are three ordinal diagnostic groups. Statistics in Medicine. 2006;25:1251–1273. doi: 10.1002/sim.2433. [DOI] [PubMed] [Google Scholar]

[R5] 5.Mossman D. Three-way ROCs. Medical Decision Making. 1999;19:78–89. doi: 10.1177/0272989X9901900110. [DOI] [PubMed] [Google Scholar]

[R6] 6.Dreiseitl S, Ohno-Machado L, Binder M. Comparing three-class diagnostic tests by three-way ROC analysis. Medical Decision Making. 2000;20:323–331. doi: 10.1177/0272989X0002000309. [DOI] [PubMed] [Google Scholar]

[R7] 7.Heckerling PS. Parametric three-way receiver operating characteristic surface analysis using Mathematica. Medical Decision Making. 2001;21:409–417. doi: 10.1177/0272989X0102100507. [DOI] [PubMed] [Google Scholar]

[R8] 8.Nakas C, Yiannoutsos C. Ordered multiple-class ROC analysis with continuous measurement. Statistics in Medicine. 2004;23:3437–3449. doi: 10.1002/sim.1917. [DOI] [PubMed] [Google Scholar]

[R9] 9.Nakas C, Alonzo T. ROC graphs for assessing the ability of a diagnostic marker to detect three disease classes with an umbrella ordering. Biometrics. 2007;63:603–609. doi: 10.1111/j.1541-0420.2006.00715.x. [DOI] [PubMed] [Google Scholar]

[R10] 10.Alonzo T, Nakas C. Comparison of ROC umbrella volumes with an application to the assessment of lung cancer diagnostic markers. Biometrical Journal. 2007;49:654–664. doi: 10.1002/bimj.200610363. [DOI] [PubMed] [Google Scholar]

[R11] 11.Xiong C, van Belle G, Miller JP, Yan Y, Gao F, Feng S, Yu K, Morris JC. A parametric comparison of diagnostic accuracy with three ordinal diagnostic groups. Biometrical Journal. 2007;49(5):682–693. doi: 10.1002/bimj.200610359. [DOI] [PubMed] [Google Scholar]

[R12] 12.Tian L, Xiong C, Lai C, Vexler A. Exact confidence interval estimation for the difference in diagnostic accuracy with three ordinal diagnostic groups. Journal of Statistical Planning and Inference. 2010;141:549–558. doi: 10.1016/j.jspi.2010.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Linnet K. Comparison of quantitative diagnostic tests: type I error, power, and sample size. Statistics in Medicine. 1987;6:147–158. doi: 10.1002/sim.4780060207. [DOI] [PubMed] [Google Scholar]

[R14] 14.Platt RW, Hanley JA, Yang H. Bootstrap confidence intervals for the sensitivity of a quantitative diagnostic test. Statistics in Medicine. 2000;19:313–322. doi: 10.1002/(sici)1097-0258(20000215)19:3<313::aid-sim370>3.0.co;2-k. [DOI] [PubMed] [Google Scholar]

[R15] 15.Zhou X-H, Qin GS. Improved Confidence Intervals for the sensitivity to full disease at a fixed Level of specificity of a continuous-scale diagnostic test. Statistics in Medicine. 2005;24:465–477. doi: 10.1002/sim.1563. [DOI] [PubMed] [Google Scholar]

[R16] 16.Morris JC. The clinical dementia rating (CDR): current version and scoring rules. Neurology. 1993;43:2412–2414. doi: 10.1212/wnl.43.11.2412-a. [DOI] [PubMed] [Google Scholar]

[R17] 17.Tsui KW, Weerahandi S, et al. Generalized P-values in significance testing of hypotheses in the presence of nuisance parameters. Journal of American Statistical Association. 1989;84:602–607. (1989). [Google Scholar]

[R18] 18.Weerahandi S. Generalized confidence intervals. Journal of American Statistical Association. 1993;88:899–905. [Google Scholar]

[R19] 19.Weerahandi S. Exact Statistical Methods for Data Analysis. New York: Springer; 2003. [Google Scholar]

[R20] 20.Weerahandi S. ANOVA under unequal error variances. Biometrics. 1995;51:589–599. [Google Scholar]

[R21] 21.Weerahandi S, Berger VW. Exact inference for growth curves with intraclass correlation structure. Biometrics. 1999;55:921–924. doi: 10.1111/j.0006-341x.1999.00921.x. [DOI] [PubMed] [Google Scholar]

[R22] 22.Krishnamoorthy K, Lu Y. Inference on the common means of several normal populations based on the generalized variable method. Biometrics. 2003;59:237–247. doi: 10.1111/1541-0420.00030. [DOI] [PubMed] [Google Scholar]

[R23] 23.Tian L, Cappelleri JC. A new approach for interval estimation and hypothesis testing of a certain intraclass correlation coefficient: the generalized variable method. Statistics in Medicine. 2004;23:2125–2135. doi: 10.1002/sim.1782. [DOI] [PubMed] [Google Scholar]

[R24] 24.Tian L. Confidence intervals for P(Y1 > Y2) with normal outcomes in linear models. Statistics in Medicine. 2008;27:4221–4237. doi: 10.1002/sim.3290. [DOI] [PubMed] [Google Scholar]

[R25] 25.Li C, Liao C, Liu J. On the exact interval estimation for the difference in paired areas under the ROC curves. Statistics in Medicine. 2008;27:224–242. doi: 10.1002/sim.2760. [DOI] [PubMed] [Google Scholar]

[R26] 26.Li C, Liao C, Liu J. A non-inferiority test for diagnostic accuracy based on the paired partial areas under ROC curves. Statistics in Medicine. 2008;27:1762–1776. doi: 10.1002/sim.3121. [DOI] [PubMed] [Google Scholar]

[R27] 27.Krishnamoorthy K, Mathew T. Inferences on the means of lognormal distributions using generalized p-values and generalized confidence intervals. Journal of Statistical Planning and Inference. 2003;115:103–121. [Google Scholar]

[R28] 28.Zou KH, Hall WJ, Shapiro DE. Smooth Non-Parametric Receiver Operating Characteristic (ROC) Curves for Continuous Diagnostic Tests. Statistics in Medicine. 1997;16:2143–2156. doi: 10.1002/(sici)1097-0258(19971015)16:19<2143::aid-sim655>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]

[R29] 29.Faraggi D, Reiser B. Estimation of the Area Under the ROC Curve. Statistics in Medicine. 2002;21:3093–3106. doi: 10.1002/sim.1228. [DOI] [PubMed] [Google Scholar]

[R30] 30.Zou KH, Hall WJ. Two transformation models for estimating an ROC curve derived from continuous data. Journal of Applied Statistics. 2000;27(5):621–631. [Google Scholar]

[R31] 31.Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cut-off point. Biometrical Journal. 2005;47:458–472. doi: 10.1002/bimj.200410135. [DOI] [PubMed] [Google Scholar]

[R32] 32.Schisterman E, Faraggi D, Reiser B. Adjusting the Generalized ROC Curve for Covariates. Statistics in Medicine. 2004;23:3319–3331. doi: 10.1002/sim.1908. [DOI] [PubMed] [Google Scholar]

[R33] 33.Schisterma E, Reiser B, Faraggi D. ROC analysis for markers with mass at zero. Statistics in Medicine. 2006;25:623–638. doi: 10.1002/sim.2301. [DOI] [PubMed] [Google Scholar]

[R34] 34.Molodianovitch K, Faraggi D, Reiser B. Comparing the Areas Under Two Correlated ROC Curves: Parametric and Non-Parametric Approaches. Biometrical Journal. 2007;5:745–757. doi: 10.1002/bimj.200610223. [DOI] [PubMed] [Google Scholar]

[R35] 35.Brown LD, Cai TT, DasGupta A. Interval estimation for a binomial proportion. Statistical Science. 2001;16:101–117. [Google Scholar]

[R36] 36.Agresti A, Coull BA. Approximate is better than “exact” for interval estimation of Binomial proportions. The American Statistician. 1998;52:119–126. [Google Scholar]

PERMALINK

Parametric and non-parametric confidence intervals of the probability of identifying early disease stage given sensitivity to full disease and specificity with three ordinal diagnostic groups

Tuochuan Dong

Lili Tian

Alan Hutson

Chengjie Xiong

SUMMARY

1. INTRODUCTION

2. THE DATA

Table 1.

3. PARAMETRIC CONFIDENCE INTERVAL ESTIMATION OF P₂

3.1 Under the Normal Assumptions

Computing Algorithm

3.2 Without the Normal Assumptions

4. NON-PARAMETRIC CONFIDENCE INTERVAL ESTIMATION OF P₂

Computing Algorithms

5. SIMULATION STUDIES

Table 2.

Table 4.

Table 3.

6. EXAMPLE: REVISITED

Table 5.

7. SUMMARY AND DISCUSSION

Acknowledgments

Appendix

Generalized Pivots and Generalized Test variables

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Parametric and non-parametric confidence intervals of the probability of identifying early disease stage given sensitivity to full disease and specificity with three ordinal diagnostic groups

Tuochuan Dong

Lili Tian

Alan Hutson

Chengjie Xiong

SUMMARY

1. INTRODUCTION

2. THE DATA

Table 1.

3. PARAMETRIC CONFIDENCE INTERVAL ESTIMATION OF P2

3.1 Under the Normal Assumptions

Computing Algorithm

3.2 Without the Normal Assumptions

4. NON-PARAMETRIC CONFIDENCE INTERVAL ESTIMATION OF P2

Computing Algorithms

5. SIMULATION STUDIES

Table 2.

Table 4.

Table 3.

6. EXAMPLE: REVISITED

Table 5.

7. SUMMARY AND DISCUSSION

Acknowledgments

Appendix

Generalized Pivots and Generalized Test variables

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3. PARAMETRIC CONFIDENCE INTERVAL ESTIMATION OF P₂

4. NON-PARAMETRIC CONFIDENCE INTERVAL ESTIMATION OF P₂