CONFIDENCE INTERVAL ESTIMATION FOR SENSITIVITY TO THE EARLY DISEASED STAGE BASED ON EMPIRICAL LIKELIHOOD

Tuochuan Dong; Lili Tian

doi:10.1080/10543406.2014.971173

. Author manuscript; available in PMC: 2017 Aug 2.

Published in final edited form as: J Biopharm Stat. 2014 Nov 5;25(6):1215–1233. doi: 10.1080/10543406.2014.971173

CONFIDENCE INTERVAL ESTIMATION FOR SENSITIVITY TO THE EARLY DISEASED STAGE BASED ON EMPIRICAL LIKELIHOOD

Tuochuan Dong ¹, Lili Tian ^1,^*

PMCID: PMC5540368 NIHMSID: NIHMS744355 PMID: 25372999

Abstract

Many disease processes can be divided into three stages: i.e. the non-diseased stage, the early diseased stage and the fully diseased stage. To assess the accuracy of diagnostic tests for such diseases, various summary indexes have been proposed, such as volume under the surface (VUS), partial volume under the surface (PVUS), and the sensitivity to the early diseased stage given specificity and the sensitivity to the fully diseased stage (P₂). This paper focuses on confidence interval estimation for P₂ based on empirical likelihood. Simulation studies are carried out to assess the performance of the new methods compared to the existing parametric and non-parametric ones. A real data set from Alzheimer’s Disease Neuroimaging Initiative (ANDI)² is analyzed. Key Words: Empirical Likelihood; Diagnostic tests; The sensitivity to the early diseased stage.

Keywords: Empirical Likelihood, Diagnostic tests, The sensitivity to the early diseased stage

1. INTRODUCTION

Disease process is usually divided into two stages: the non-diseased and the diseased, and diagnostic tests are utilized to classify the subjects into different stages. The probability that a non-diseased subject is correctly classified is defined as the specificity, and the probability that a diseased subject is correctly identified is called sensitivity. When the outcome of diagnostic test is continuous, both sensitivity and specificity are functions of the cut-off value. As the cut-off value changes, sensitivity and specificity vary inversely to each other. The Receiver Operating Characteristic (ROC) curve, a plot of sensitivity versus (1-specificity) as the cut-off value runs through the whole range of all possible outcome values, is a popular graphical assessment of the diagnostic accuracy for a diagnostic test. For detailed review of statistical methods in ROC analysis, please see Shapiro (1999), Zhou et al. (2002), Pepe (2003) and Zou et al. (2010).

To assess the diagnostic accuracy of a binary-scale test, there exist many diagnostic accuracy measures such as the area under the curve (AUC). The AUC indicates the overall performance of a diagnostic test for all the cut-off values. However, in medical practice, a cut-off value is often chosen by medical practitioners so that a fixed value of specificity is achieved (typically 80, 90, or 95 per cent). Hence, the sensitivity given the specificity serves as a meaningful diagnostic measure. Towards this end, several papers discussed the issues of estimation of sensitivity given specificity. For example, Greenhouse and Mantel (1950) presented the inference procedures for a diagnostic test with continuous range, either with or without normal distribution assumptions; McNeil and Hanley (1984) estimated the point-wise confidence interval for sensitivity at a fixed specificity in the bi-normal model; Linnet (1987) took into account the sampling variation of the discrimination limits and proposed both parametric and non-parametric methods to construct the confidence interval; Platt et al. (2000) recommended a confidence interval by using Efron’s bias-corrected acceleration (BCa) bootstrap; and Zhou and Qin (2005) introduced two non-parametric confidence intervals. Most recently, Qin et al. (2011) presented empirical likelihood-based confidence intervals for the sensitivity at a fixed level of specificity.

In practice, a disease process might involve three ordinal diagnostic stages: the normal healthy stage without even the earliest subtle disease symptoms, the early stage of the disease, and the stage of full-blown development of the disease. For example, mild cognitive impairment (MCI) and/or early stage Alzheimer’s disease (AD) is a transitional stage between the cognitive changes of normal aging and the more serious AD. Recently, the traditional ROC analysis has already been extended to three-stage cases, see e.g., Mossman (1999), Dreiseitl et al. (2000), Heckerling (2001), Nakas and Yiannoutsos (2004), Xiong et al. (2006), He and Frey (2008), Li and Zhou (2009), Nakas et al. (2010), Tian et al. (2010), He et al. (2010), Dong et al. (2011) and Li et al. (2012). For diseases such as AD, early detection is critical since it often means optimal time window for therapeutic treatment due to the fact that no pharmaceutical treatments to-date are effective for the late stage AD. However, it is far more challenging to diagnose subjects at the earliest disease stage for clinicians because of the subtle clinical symptoms in the early stage of many complex disease processes. Hence, the probability associated with the detection of early diseased stage is critical in medical science and serves as a very important diagnostic accuracy measure for diseases with three ordinal stages.

To be more specific, let Y₁, Y₂ and Y₃ denote the test results for the non-diseased, the early diseased, and the fully diseased group of a diagnostic test respectively, F₁, F₂ and F₃ denote corresponding cumulative distribution functions, and n₁, n₂ and n₃ denote sample sizes. Assume that the test results are measured on a continuous scale and that higher values indicate greater severity of the disease. Given a pair of threshold values c₁ and c₂ (c₁ < c₂), the subject is identified as non-diseased if the test result is smaller than c₁, as fully diseased if the test result is larger than c₂, and as early diseased if the test result is between c₁ and c₂. The specificity P₁, which is the correct classification rate for the non-diseased stage, sensitivity to the early diseased stage P₂, and the sensitivity to the fully diseased stage P₃ are defined as

\begin{matrix} P_{1} = F_{1} (c_{1}) \\ P_{2} = F_{2} (c_{2}) - F_{2} (c_{1}) = F_{2} [F_{3}^{- 1} (1 - P_{3})] - F_{2} [F_{1}^{- 1} (P_{1})] \\ P_{3} = 1 - F_{3} (c_{2}) . \end{matrix}

(1)

Given P₁ and P₃, c₁ and c₂ can be determined. Consequently, P₂, the sensitivity to the early diseased stage given the specificity P₁ and the sensitivity to the fully diseased stage P₃, can be formulated as a function of P₁ and P₃, i.e. P₂ = P₂(P₁, P₃) which also defines a surface in the three-dimensional space (P₁, P₃, P₂), namely, the ROC surface. The point (P₁, P₃, P₂) = (1, 1, 1) indicates the perfect discrimination ability.

To evaluate the diagnostic accuracy of the biomarkers for three-class diseases, various summary measures of the ROC surface have been proposed. Among them, the volume under the ROC surface (VUS), considered as the extension of AUC in the three-class disease paradigm, is a very popular one. The VUS denotes the probability that a randomly chosen subject from the non-diseased group, that from early diseased group and that from fully diseased group follow simple order, i.e., VUS = P(Y₁ < Y₂ < Y₃). More details about VUS can be found in Nakas and Yiannoutsos (2004), Xiong et al. (2006), He and Frey (2008), Wan (2012) and Kang and Tian (2013).

In addition to the overall performance of a biomarker measured by VUS, an accurate estimate of P₂ helps clinicians to identify the best disease markers for early diagnosis and therefore the inference procedures for P₂ are very useful. Dong et al. (2011) first attempted to provide parametric and non-parametric confidence interval estimation methods for P₂. However, the most recommended methods depend on either normality assumption or Box-Cox transformation to normality. It is well known that not all of the non-normal distributions can be transformed to normal via Box-Cox transformation. Therefore, some alternative approaches for estimating the confidence interval of P₂ which do not depend on distributional assumption and also provide good coverage probabilities are worth exploring.

The goal of this paper is to present empirical likelihood-based confidence intervals for P₂, i.e. the sensitivity to the early diseased stage given specificity and the sensitivity to the fully diseased stage. Empirical likelihood is introduced by Owen (1990, 2001) and has many advantages over normal approximation-based methods. For instance, empirical likelihood-based confidence regions are range preserving and transformation respecting, the regularity conditions for empirical likelihood-based methods are weak and natural, and it utilizes the power of likelihood-based approaches to solve complex statistical problems. The empirical likelihood has been used widely in many applied areas including diagnostic tests with binary outcomes, e.g., Claeskens et al. (2003) suggested a smoothed empirical likelihood-based method (SEL) to estimate the sensitivity, and Qin et al. (2011) proposed two empirical likelihood-based confidence intervals for the sensitivity at a fixed level of specificity. The rest of this paper is organized as follows. Section 2 presents a review of existing methods. In Section 3, the large sample properties of P₂ and the empirical likelihood approaches are proposed. In Section 4, simulation studies are conducted to evaluate the proposed methods. In Section 5, a real data set from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database is analyzed. Section 6 is the discussion. The proofs for the formula of the variance for an estimator of P₂ and the empirical likelihood theorem are given in the Appendix.

2. EXISTING METHODS

This section presents a brief review of the existing methods including the generalized inference method and bootstrap approaches for confidence interval estimation of sensitivity to the early diseased stage by Dong et al. (2011).

2.1. Generalized Inference Method

Assume Y_i follows normal distributions with mean μ_i and variance $σ_{i}^{2}$ for i = 1, 2, 3, the generalized pivotal quantity for P₂ as given in (1) can be written as

R_{P_{2}} = Φ [\frac{R_{μ_{3}} - R_{μ_{2}} + Φ^{- 1} (1 - P_{3}) R_{σ_{3}}}{R_{σ_{2}}}] - Φ [\frac{R_{μ_{1}} - R_{μ_{2}} + Φ^{- 1} (P_{1}) R_{σ_{1}}}{R_{σ_{2}}}]

where $R_{μ_{i}} = {\bar{y}}_{i} - Z_{i} \sqrt{R_{σ_{i}^{2}} / n_{i}}$ , Z_i ~ N(0, 1) and $R_{σ_{i}} = \sqrt{\frac{(n_{i} - 1) s_{i}^{2}}{V_{i}}}$ where $V_{i} ~ χ_{n_{i} - 1}^{2}$ for i = 1, 2, 3. By generating V_i and Z_i repeatedly, an array of R_P₂’s can be obtained. A two-sided 100(1 − α)% generalized inference confidence interval for P₂, GI, is (R_P₂(α/2), R_P₂(1 − α/2)) where R_P₂(α) denotes the 100αth percentile of R_P₂.

When the normality assumptions are violated, the Box-Cox transformation is utilized as P₂ is invariant under monotonic transformations. Assume the data after transformation does follow the normality assumptions, then the GI method can be applied. Such confidence interval is noted as BCGI hereafter.

2.2. Non-parametric Approaches

The P₂ as given in (1) can be non-parametrically estimated as

{\hat{\bar{P}}}_{2} = \frac{\sum_{i = 1}^{n_{2}} I_{[{\hat{F}}_{1}^{- 1} (P_{1}) \leq Y_{i} \leq {\hat{F}}_{3}^{- 1} (1 - P_{3})]}}{n_{2}}

(2)

With a bootstrap sample ${\hat{\bar{P}}}_{2}^{b}$ (b = 1 to 500), the 100(1 − α)% bootstrap percentile confidence interval (BTP) can be obtained as

({\hat{\bar{P}}}_{2}^{b} (α / 2), {\hat{\bar{P}}}_{2}^{b} (1 - α / 2))

where ${\hat{\bar{P}}}_{2}^{b} (α)$ is the 100α% percentile. An adjusted estimator of P₂ proposed by Agresti and Coull (1998) is

{\hat{\tilde{P}}}_{2} = \frac{\sum_{i = 1}^{n_{2}} I_{[{\hat{F}}_{1}^{- 1} (P_{1}) \leq Y_{i} \leq {\hat{F}}_{3}^{- 1} (1 - P_{3})]} + z_{1 - α / 2}^{2} / 2}{n_{2} + z_{1 - α / 2}^{2}}

(3)

where z_1−α/2 stands for 100(1 − α/2)% percentile for standard normal distribution. The 100(1 − α)% BTI confidence interval is

({\hat{\tilde{P}}}_{2} - z_{1 - α / 2} \sqrt{{\hat{Var}}^{boot} ({\hat{\tilde{P}}}_{2})}, {\hat{\tilde{P}}}_{2} + z_{1 - α / 2} \sqrt{{\hat{Var}}^{boot} ({\hat{\tilde{P}}}_{2})})

where ${\hat{Var}}^{boot} ({\hat{\tilde{P}}}_{2})$ is the bootstrap estimate for the variance of ${\hat{\tilde{P}}}_{2}$ (more details can be found in Dong et al. (2011)). Replacing ${\hat{\tilde{P}}}_{2}$ with the mean ${\bar{\hat{\tilde{P}}}}_{2}^{b}$ obtained from the bootstrap sample, the 100(1 − α)% BTII confidence interval is given as

({\bar{\hat{\tilde{P}}}}_{2}^{b} - z_{1 - α / 2} \sqrt{{\hat{Var}}^{boot} ({\hat{\tilde{P}}}_{2})}, {\bar{\hat{\tilde{P}}}}_{2}^{b} + z_{1 - α / 2} \sqrt{{\hat{Var}}^{boot} ({\hat{\tilde{P}}}_{2})}) .

In Dong et al. (2011), through a simulation study, GI and BCGI were shown to provide accurate confidence intervals, given the corresponding normality assumptions were satisfied. Otherwise, BTII was recommended except in the scenarios with large P₂ and small sample sizes where BTP was preferred.

3. TWO NEW APPROACHES

In this section, two new methods for confidence interval estimation of P₂ are presented. Section 3.1 presents a method based on asymptotic normality and Section 3.2 presents two confidence intervals based on empirical likelihood.

3.1. Normal Approximation-Based Confidence Interval

For the diagnostic tests with binary diagnostic outcomes, Linnet (1987) provided the parametric formula for the variance of estimated sensitivity given the specificity, based on which normal approximation-based confidence interval was constructed. Further details can also be found in Zhou and Qin (2005) and Qin et al. (2011). Following the same vein, the variance of ${\hat{\bar{P}}}_{2}$ can be proven as (see Appendix 1)

σ_{{\hat{\bar{P}}}_{2}}^{2} = \frac{P_{2} (1 - P_{2})}{n_{2}} + \frac{P_{1} (1 - P_{1})}{n_{1}} \cdot \frac{f_{2}^{2} [F_{1}^{- 1} (P_{1})]}{f_{1}^{2} [F_{1}^{- 1} (P_{1})]} + \frac{P_{3} (1 - P_{3})}{n_{3}} \cdot \frac{f_{2}^{2} [F_{3}^{- 1} (1 - P_{3})]}{f_{3}^{2} [F_{3}^{- 1} (1 - P_{3})]}

(4)

where f₁, f₂ and f₃ are the probability density functions for Y₁, Y₂ and Y₃ respectively. It can be shown that when n₁, n₂ and n₃ are large, ${\hat{\bar{P}}}_{2}$ has an approximately normal distribution with mean P₂ and variance $σ_{{\hat{\bar{P}}}_{2}}^{2}$ . The $σ_{{\hat{\bar{P}}}_{2}}^{2}$ can be estimated as

\hat{σ_{{\hat{\bar{P}}}_{2}}^{2}} = \frac{{\hat{\bar{P}}}_{2} (1 - {\hat{\bar{P}}}_{2})}{n_{2}} + \frac{P_{1} (1 - P_{1})}{n_{1}} \cdot \frac{{\hat{f}}_{2}^{2} [{\hat{F}}_{1}^{- 1} (P_{1})]}{{\hat{f}}_{1}^{2} [{\hat{F}}_{1}^{- 1} (P_{1})]} + \frac{P_{3} (1 - P_{3})}{n_{3}} \cdot \frac{{\hat{f}}_{2}^{2} [{\hat{F}}_{3}^{- 1} (1 - P_{3})]}{{\hat{f}}_{3}^{2} [{\hat{F}}_{3}^{- 1} (1 - P_{3})]}

(5)

where ${\hat{F}}_{1}^{- 1}$ (P₁) is the P₁th sample quantile of Y₁s, ${\hat{F}}_{3}^{- 1}$ (1 − P₃) is the (1 − P₃)th sample quantile of Y₃s, and f̂_i is the kernel density estimate of f_i, i = 1, 2, 3. We use the “over-smoothed bandwidth selector” by Wand and Jones (1995) to select the bandwidth for the Gaussian kernel function. The (1 − α)100% normal approximation-based confidence interval

({\hat{\bar{P}}}_{2} - z_{1 - α / 2} \sqrt{\hat{σ_{{\hat{\bar{P}}}_{2}}^{2}}}, {\hat{\bar{P}}}_{2} + z_{1 - α / 2} \sqrt{\hat{σ_{{\hat{\bar{P}}}_{2}}^{2}}})

is referred as asymptotic parametric variance confidence interval (APV) hereafter.

3.2. Empirical Likelihood Confidence Interval

Define an indicator function ϕ as

ϕ (X, Y, Z) = {\begin{matrix} 1, & X < Y < Z \\ \frac{1}{2}, & X = Y < Z or X < Y = Z \\ \frac{1}{6}, & X = Y = Z \\ 0, & otherwise . \end{matrix}

Given P₁ and P₃, for a test result Y of a subject from the early diseased group, define a random variable

U = ϕ [F_{1}^{- 1} (P_{1}), Y, F_{3}^{- 1} (1 - P_{3})] .

It is evident that

\begin{matrix} E (U) & = & E {ϕ [F_{1}^{- 1} (P_{1}), Y, F_{3}^{- 1} (1 - P_{3})]} \\ = & P [F_{1}^{- 1} (P_{1}) < Y < F_{3}^{- 1} (1 - P_{3})] \\ = & P [F_{1}^{- 1} (P_{1}) < Y \leq F_{3}^{- 1} (1 - P_{3})] \\ = & P_{2} . \end{matrix}

Based on this relationship between P₂ and U, we can develop an empirical likelihood procedure for making inference about P₂. Let p = (p₁,…,p_n₂) be a probability vector for the early diseased group, and $\sum_{i = 1}^{n_{2}} p_{i} = 1$ and p_i ≥ 0 for all i. The empirical likelihood for P₂ can be defined as

\tilde{L} (P_{2}) = sup {\prod_{i = 1}^{n_{2}} p_{i} : \sum_{i = 1}^{n_{2}} p_{i} = 1, \sum_{i = 1}^{n_{2}} p_{i} (U_{i} - P_{2}) = 0}

where $U_{i} = ϕ [F_{1}^{- 1} (P_{1}), Y_{i}, F_{3}^{- 1} (1 - P_{3})]$ , i = 1,2,…,n₂. Since U_i’s depend on the unknown distribution functions F₁ and F₃, we replace them by their empirical distributions F̂₁ and F̂₃, and obtain a profile empirical likelihood for P₂

L (P_{2}) = sup {\prod_{i = 1}^{n_{2}} p_{i} : \sum_{i = 1}^{n_{2}} p_{i} = 1, \sum_{i = 1}^{n_{2}} p_{i} ({\hat{U}}_{i} - P_{2}) = 0}

where ${\hat{U}}_{i} = ϕ [{\hat{F}}_{1}^{- 1} (P_{1}), Y_{i}, {\hat{F}}_{3}^{- 1} (1 - P_{3})]$ , i = 1, 2,…, n₂. By the Lagrange multiplier method, we can easily obtain the following expression for p_i

{\tilde{p}}_{i} = \frac{1}{n_{2}} {1 + \tilde{λ} ({\hat{U}}_{i} - P_{2})}^{- 1}

where λ̃ is the solution of

\frac{1}{n_{2}} \sum_{i = 1}^{n_{2}} \frac{{\hat{U}}_{i} - P_{2}}{1 + \tilde{λ} ({\hat{U}}_{i} - P_{2})} = 0 .

(6)

Note that $\prod_{i = 1}^{n_{2}} p_{i}$ , subject to $\sum_{i = 1}^{n_{2}} p_{i} = 1$ , attains its maximum $n_{2}^{- n_{2}}$ at $p_{i} = n_{2}^{- 1}$ . The profile empirical likelihood ratio for P₂ is defined as

r (P_{2}) = \prod_{i = 1}^{n_{2}} (n_{2} {\tilde{p}}_{i}) = \prod_{i = 1}^{n_{2}} {1 + \tilde{λ} ({\hat{U}}_{i} - P_{2})}^{- 1} .

Hence the corresponding profile empirical log-likelihood ratio is

l (P_{2}) \equiv - 2 log r (P_{2}) = 2 \sum_{i = 1}^{n_{2}} log {1 + \tilde{λ} ({\hat{U}}_{i} - P_{2})}

(7)

where λ̃ is the solution of (6).

Since the profile empirical log-likelihood ratio l(P₂) is a sum of dependent variables, its asymptotic distribution is no longer a standard chi-square distribution. In the Appendix 2, it is proven that l(P₂) follows a scaled χ² distribution. The asymptotic distribution of l(P₂) is summarized in the following theorem.

Theorem

Assume that F₁, F₂ and F₃ are continuous distribution functions, and the density functions f₁, f₂ and f₃ are positive and continuous at c₁ and c₂. If 0 < ρ₁ = lim_{n₁,n₂→∞} n₁/n₂ < ∞, 0 < ρ₂ = lim_{n₂,n₃→∞} n₃/n₂ < ∞, and P₂ is the true value of the sensitivity to the early diseased stage given specificity and the sensitivity to the fully diseased stage, the limiting distribution of l(P₂), defined by (7), is a scaled chi-square distribution with one degree of freedom. That is,

r_{P_{1}, P_{2}, P_{3}} \cdot l (P_{2}) \overset{ℒ}{\to} χ_{1}^{2}

where the scale constant r_{P₁,P₂,P₃} is

r_{P_{1}, P_{2}, P_{3}} = \frac{σ_{{\hat{U}}_{i}}^{2}}{n_{2} \cdot σ_{{\hat{\bar{P}}}_{2}}^{2}}

with $σ_{{\hat{U}}_{i}}^{2} = P_{2} (1 - P_{2})$ and $σ_{{\hat{\bar{P}}}_{2}}^{2}$ as given in (4).

In order to construct confidence interval for P₂ based on the above Theorem, we need to estimate $σ_{{\hat{U}}_{i}}^{2}$ and $σ_{{\hat{\bar{P}}}_{2}}^{2}$ . The $σ_{{\hat{U}}_{i}}^{2}$ can be estimated as ${\hat{\bar{P}}}_{2} 1 (- {\hat{\bar{P}}}_{2})$ and a Gaussian kernel was used to obtain a parametric estimation of $σ_{{\hat{\bar{P}}}_{2}}^{2}$ , as shown in (5). The 100(1 − α)% ELP confidence interval for P₂ is

{CI}_{α} (P_{2}) = {P_{2} : r_{P_{1}, P_{2}, P_{3}}^{*} \cdot l (P_{2}) \leq χ_{1}^{2} (1 - α)}

where $r_{P_{1}, P_{2}, P_{3}}^{*} = \frac{{\hat{\bar{P}}}_{2} (1 - {\hat{\bar{P}}}_{2})}{n_{2} \cdot \hat{σ_{{\hat{\bar{P}}}_{2}}^{2}}}$ and $χ_{1}^{2} (1 - α)$ is the (1 − α)th quantile of $χ_{1}^{2}$ . The performance of this ELP method highly depends on the density estimates from the Gaussian kernel, whose bandwidth is chosen without a well recognized standard. Therefore, the following bootstrap approach is proposed to estimate $σ_{{\hat{\bar{P}}}_{2}}^{2}$ instead:

For b = 1 to B = 500 bootstrap iterations,

Step 1

Draw re-samples of sizes n₁, n₂, and n₃ with replacement from the non-diseased sample Y_1j’s, the early diseased sample Y_2j’s, and the fully diseased sample Y_3j’s respectively. Denote the bootstrap samples as ${Y_{ij}^{b}}$ , i = 1, 2, 3, j = 1, 2,…,n_i.

Step 2

Calculate the bootstrap version of ${\hat{\bar{P}}}_{2}^{b}$ according to (2).

Step 3

The proposed bootstrap variance estimator for ${\hat{\bar{P}}}_{2}$ is defined as

{\hat{σ_{{\hat{\bar{P}}}_{2}}^{2}}}^{b} = \frac{1}{B - 1} \sum_{b = 1}^{B} {({\hat{\bar{P}}}_{2}^{b} - {\bar{\hat{\bar{P}}}}_{2}^{b})}^{2}

where ${\hat{\bar{P}}}_{2}$ is defined in (2).

This leads to the second 100(1 − α)% empirical likelihood confidence interval (ELB) for P₂

{CI}_{α} (P_{2}) = {P_{2} : r_{P_{1}, P_{2}, P_{3}}^{*} \cdot l (P_{2}) \leq χ_{1}^{2} (1 - α)}

where $r_{P_{1}, P_{2}, P_{3}}^{*} = \frac{{\hat{\bar{P}}}_{2} (1 - {\hat{\bar{P}}}_{2})}{n_{2} \cdot {\hat{σ_{{\hat{\bar{P}}}_{2}}^{2}}}^{b}}$ and $χ_{1}^{2} (1 - α)$ is the (1 − α)th quantile of $χ_{1}^{2}$ .

4. SIMULATION STUDIES

Simulation studies are carried out to compare the performance of the proposed empirical likelihood confidence intervals ELP and ELB, as well as the asymptotic confidence interval APV, with the existing ones, i.e. GI, BCGI, BTP, BTII proposed in Dong et al. (2011). As BTI is always inferior than BTII, it is not included in the tables.

We evaluate these approaches under the normal and beta distribution scenarios proposed in Dong et al. (2011), to check whether the new approaches can give comparable performance as the recommended GI/BCGI parametric approach where the normality assupmtions are satisfied with or without Box-Cox transformation. In addition, we also investigated the combined scenario where the normality assumptions cannot be met; that is, gamma for the non-diseased, log-normal for the early diseased and Weibull for the fully diseased group. The density functions for the combined distribution scenario are plotted in Figure 1. Sample sizes (n₁, n₂, n₃) are set as (10, 10, 10), (30, 30, 30), (50, 30, 30), (50, 50, 50), (100, 100, 100), (100, 50, 50) and (100, 100, 50). With a fixed 80% specificity and a fixed 80% sensitivity to the fully diseased stage, the parameters for the distributions are chosen correspondingly so that P₂ equals to 50% or 90%. Under each setting, 5,000 random samples are generated. The simulation results are presented in Tables 1–3.

Density functions for the non-diseased, early diseased and fully diseased group for the two simulation scenarios in Table 3.

Table 1.

Summary of approximate 95% two-sided confidence bounds of BTII, BTP, ELB, ELP, GI and APV for P₂ under normal distributions (based on 5,000 simulations).

	Three Independent Normal Distributions
	Coverage Probability						Length of Confidence Interval
	(μ₁, σ₁) = (0, 1)′, (μ₂, σ₂) = (2.5, 1.1)′, (μ₃, σ₃) = (3.69, 1.2)′, P₂ = 0.5
	Non-parametric		Empirical		Parametric		Non-parametric		Empirical		Parametric
Sample Sizes	BTII	BTP	ELB	ELP	GI	APV	BTII	BTP	ELB	ELP	GI	APV
(10, 10, 10)	0.9376	0.9774	0.9782	0.9976	0.9632	0.9782	0.6372	0.8109	0.6990	0.7072	0.6930	0.6990
(30, 30, 30)	0.9580	0.9756	0.9622	0.9468	0.9576	0.9622	0.5107	0.5571	0.5154	0.4927	0.4328	0.5154
(50, 30, 30)	0.9538	0.9728	0.9584	0.9478	0.9518	0.9584	0.5026	0.5487	0.5112	0.4878	0.4223	0.5112
(50, 50, 50)	0.9604	0.9724	0.9564	0.9440	0.9516	0.9564	0.4230	0.4441	0.4271	0.4035	0.3359	0.4271
(100, 100, 100)	0.9532	0.9642	0.9554	0.9490	0.9488	0.9554	0.3121	0.3168	0.3140	0.2982	0.2383	0.3140
(100, 50, 50)	0.9502	0.9710	0.9518	0.9414	0.9486	0.9518	0.4130	0.4346	0.4175	0.3963	0.3302	0.4175
(100, 100, 50)	0.9416	0.9656	0.9486	0.9392	0.9524	0.9486	0.3813	0.3880	0.3764	0.3610	0.3063	0.3764
	(μ₁, σ₁)= (0, 1)′, (μ₂, σ₂) = (2.5, 1.1)′, (μ₃, σ₃) = (5.51, 1.2)′, P₂ = 0.9
	Non-parametric		Empirical		Parametric		Non-parametric		Empirical		Parametric
Sample Sizes	BTII	BTP	ELB	ELP	GI	APV	BTII	BTP	ELB	ELP	GI	APV
(10, 10, 10)	0.8956	0.9460	0.9600	0.9588	0.9350	0.9600	0.3639	0.4577	0.5525	0.5853	0.5243	0.5525
(30, 30, 30)	0.9696	0.9836	0.9732	0.9794	0.9454	0.9732	0.2607	0.2763	0.3010	0.3258	0.2386	0.3010
(50, 30, 30)	0.9636	0.9868	0.9690	0.9748	0.9458	0.9690	0.2458	0.2611	0.2854	0.3110	0.2249	0.2854
(50, 50, 50)	0.9754	0.9816	0.9594	0.9798	0.9440	0.9594	0.2065	0.2160	0.2219	0.2341	0.1757	0.2219
(100, 100, 100)	0.9670	0.9774	0.9556	0.9608	0.9478	0.9556	0.1470	0.1497	0.1489	0.1522	0.1194	0.1489
(100, 50, 50)	0.9732	0.9806	0.9576	0.9758	0.9424	0.9576	0.1922	0.2011	0.2088	0.2211	0.1640	0.2088
(100, 100, 50)	0.9716	0.9812	0.9582	0.9638	0.9516	0.9582	0.1605	0.1625	0.1623	0.1651	0.1304	0.1623

Open in a new tab

BTII: Confidence interval is computed by the BTII approach.

BTP: Confidence interval is computed by the BTP approach.

ELB: Confidence interval is computed by the ELB approach.

ELP: Confidence interval is computed by the ELP approach.

GI: Confidence interval is computed by the GI approach.

APV: Confidence interval is computed by the APV approach.

Table 3.

Summary of approximate 95% two-sided confidence bounds of BTII, BTP, ELB, ELP, BCGI and APV for P₂ under the combined distributions (based on 5,000 simulations).

	Independent Gamma, Log-normal and Weibull Distributions
	Coverage Probability						Length of Confidence Interval
	Gamma(α, β) = (6, 12)′, LN(μ, σ) = (1.5, 0.5)′, Weibull(a, b) = (4, 6.6)′, P₂ = 0.5
	Non-parametric		Empirical		Parametric		Non-parametric		Empirical		Parametric
Sample Sizes	BTII	BTP	ELB	ELP	BCGI	APV	BTII	BTP	ELB	ELP	BCGI	APV
(10, 10, 10)	0.9242	0.9640	0.9716	0.9972	0.9374	0.8792	0.5512	0.7247	0.6301	0.6175	0.5895	0.6642
(30, 30, 30)	0.9538	0.9646	0.9596	0.9460	0.9120	0.9254	0.4254	0.4701	0.4468	0.4159	0.3524	0.4427
(50, 30, 30)	0.9570	0.9674	0.9568	0.9440	0.9098	0.9256	0.4281	0.4727	0.4468	0.4171	0.3528	0.4440
(50, 50, 50)	0.9562	0.9600	0.9564	0.9432	0.8984	0.9288	0.3513	0.3716	0.3600	0.3370	0.2741	0.3500
(100, 100, 100)	0.9586	0.9596	0.9530	0.9448	0.8702	0.9400	0.2591	0.2654	0.2619	0.2474	0.1944	0.2528
(100, 50, 50)	0.9536	0.9620	0.9516	0.9404	0.9028	0.9246	0.3528	0.3733	0.3584	0.3366	0.2742	0.3504
(100, 100, 50)	0.9530	0.9578	0.9436	0.9330	0.8698	0.9294	0.3085	0.3123	0.3080	0.2890	0.2255	0.2979
	Gamma(α, β) = (6, 12)′, LN(μ, σ) = (1.5, 0.5)′, Weibull(a, b) = (4, 12.5)′, P₂ = 0.9
	Nonparametric		Empirical		Parametric		Nonparametric		Empirical		Parametric
Sample Sizes	BTII	BTP	ELB	ELP	BCGI	APV	BTII	BTP	ELB	ELP	BCGI	APV
(10, 10, 10)	0.7848	0.8628	0.9682	0.9702	0.9422	0.6998	0.3174	0.4037	0.5712	0.5724	0.3895	0.3043
(30, 30, 30)	0.9566	0.9628	0.9582	0.9824	0.9188	0.9238	0.2394	0.2534	0.2777	0.3180	0.2066	0.2266
(50, 30, 30)	0.9520	0.9638	0.9620	0.9862	0.9202	0.9284	0.2371	0.2520	0.2797	0.3178	0.2063	0.2260
(50, 50, 50)	0.9590	0.9620	0.9504	0.9822	0.9030	0.9312	0.1923	0.2002	0.2093	0.2276	0.1592	0.1910
(100, 100, 100)	0.9580	0.9606	0.9582	0.9642	0.8884	0.9436	0.1393	0.1413	0.1414	0.1474	0.1122	0.1447
(100, 50, 50)	0.9620	0.9606	0.9562	0.9876	0.9100	0.9248	0.1919	0.2004	0.2062	0.2265	0.1598	0.1916
(100, 100, 50)	0.9728	0.9612	0.9508	0.9658	0.8878	0.9332	0.1622	0.1645	0.1661	0.1728	0.1274	0.1661

Open in a new tab

BTII: Confidence interval is computed by the BTII approach.

BTP: Confidence interval is computed by the BTP approach.

ELB: Confidence interval is computed by the ELB approach.

ELP: Confidence interval is computed by the ELP approach.

BCGI: Confidence interval is computed by the BCGI approach.

APV: Confidence interval is computed by the APV approach.

Table 1 presents simulation results under the normal distributions. The performance of the newly proposed empirical likelihood confidence interval ELB is satisfactory in terms of coverage probability although the ELB tends to be slightly conservative for the small sample sizes. ELP performs well for P₂ = 0.5 except at the sample size (10, 10, 10), but becomes conservative when P₂ = 0.9. BTII gives good estimates at P₂ = 0.5, but when P₂ increases to 0.9, BTII obtains a 0.8956 coverage probability at the sample size 11 (10, 10, 10), which is much lower than the 95% nominal level. In addition, as the sample size increases, BTII grows conservative. The BTP interval is generally conservative. The normal approximation-based confidence interval APV is slightly conservative at small sample sizes. The generalized inference method GI performs the best in the closeness of coverage probability to the nominal level and the length of the confidence interval.

Table 2 presents simulation results for the beta distribution. The coverage probability of ELB remains conservative for the small sample sizes at P₂ = 0.5, however, when P₂ = 0.9, for the small sample size (10, 10, 10), ELB attains coverage probability which is very close to the nominal level, and is even better than the BCGI approach. The other empirical likelihood method ELP, yields satisfactory coverage probabilities when P₂ = 0.5 except at the sample size (10, 10, 10), while it is conservative for medium sample sizes when P₂ = 0.9. The non-parametric method BTII is satisfactory at P₂ = 0.5; while at P₂ = 0.9, it changes from being liberal to being conservative as sample sizes increase. The large sample method APV is generally liberal when sample sizes are small. The generalized inference approach with Box-Cox transformation is usually satisfactory, but it can be worse than ELB for a few scenarios, such as (100, 100, 50) at P₂ = 0.5 or (10, 10, 10) at P₂ = 0.9.

Table 2.

Summary of approximate 95% two-sided confidence bounds of BTII, BTP, ELB, ELP, BCGI and APV for P₂ under beta distributions (based on 5,000 simulations).

	Three Independent Beta Distributions
	Coverage Probability						Length of Confidence Interval
	(α₁, β₁) = (1, 6)′, (α₂, β₂) = (6, 6)′, (α₃, β₃) = (9.6, 6)′, P₂ = 0.5
	Non-parametric		Empirical		Parametric		Non-parametric		Empirical		Parametric
Sample Sizes	BTII	BTP	ELB	ELP	BCGI	APV	BTII	BTP	ELB	ELP	BCGI	APV
(10, 10, 10)	0.9426	0.9752	0.9818	0.9988	0.9630	0.8980	0.6124	0.7938	0.6827	0.6688	0.6530	0.7104
(30, 30, 30)	0.9632	0.9720	0.9724	0.9554	0.9484	0.9268	0.4755	0.5212	0.4892	0.4562	0.3798	0.4896
(50, 30, 30)	0.9588	0.9692	0.9626	0.9468	0.9490	0.9192	0.4611	0.5086	0.4743	0.4479	0.3724	0.4808
(50, 50, 50)	0.9580	0.9732	0.9626	0.9520	0.9524	0.9320	0.3850	0.4081	0.3930	0.3676	0.2930	0.3853
(100, 100, 100)	0.9596	0.9692	0.9544	0.9442	0.9314	0.9334	0.2819	0.2881	0.2829	0.2686	0.2064	0.2748
(100, 50, 50)	0.9598	0.9640	0.9636	0.9516	0.9400	0.9226	0.3760	0.3961	0.3839	0.3621	0.2881	0.3780
(100, 100, 50)	0.9578	0.9586	0.9510	0.9412	0.9348	0.9328	0.3350	0.3403	0.3352	0.3173	0.2525	0.3281
	(α₁, β₁) = (1, 6)′, (α₂, β₂) = (6, 6)′, (α₃, β₃) = (20.4, 6)′, P₂ = 0.9
	Non-parametric		Empirical		Parametric		Non-parametric		Empirical		Parametric
Sample Sizes	BTII	BTP	ELB	ELP	BCGI	APV	BTII	BTP	ELB	ELP	BCGI	APV
(10, 10, 10)	0.8842	0.9398	0.9578	0.9528	0.9282	0.7494	0.3785	0.4839	0.5588	0.5596	0.4577	0.3129
(30, 30, 30)	0.9696	0.9726	0.9648	0.9722	0.9358	0.9262	0.2629	0.2832	0.3054	0.3120	0.2157	0.2267
(50, 30, 30)	0.9648	0.9742	0.9652	0.9712	0.9494	0.9246	0.2410	0.2594	0.2833	0.2983	0.2063	0.2185
(50, 50, 50)	0.9740	0.9744	0.9634	0.9768	0.9404	0.9408	0.2072	0.2165	0.2248	0.2241	0.1598	0.1912
(100, 100, 100)	0.9696	0.9706	0.9602	0.9582	0.9428	0.9438	0.1461	0.1488	0.1491	0.1461	0.1088	0.1419
(100, 50, 50)	0.9692	0.9656	0.9588	0.9762	0.9536	0.9320	0.1910	0.1979	0.2045	0.2119	0.1517	0.1813
(100, 100, 50)	0.9736	0.9752	0.9564	0.9582	0.9434	0.9372	0.1585	0.1615	0.1599	0.1560	0.1150	0.1519

Open in a new tab

BTII: Confidence interval is computed by the BTII approach.

BTP: Confidence interval is computed by the BTP approach.

ELB: Confidence interval is computed by the ELB approach.

ELP: Confidence interval is computed by the ELP approach.

BCGI: Confidence interval is computed by the BCGI approach.

APV: Confidence interval is computed by the APV approach.

In Table 3, the simulation results for the combined distribution are presented. For such cases, the Box-Cox transformation fails to transform the data to the normal distributions. Therefore, as expected, the performance of BCGI is unsatisfactory. Generally speaking, the ELB method is close to the 95% nominal level except being slightly conservative at the sample size (10, 10, 10). The ELP method provide reasonable coverage at P₂ = 0.5 except for the sample size (10, 10, 10). however, it becomes conservative for P₂ = 0.9. BTII maintains the nominal level for most cases except for the sample size (10, 10, 10), where the coverage probability can be as low as 0.7848. In addition, for scenarios such as (100, 50, 50) and (100, 100, 50), BTII becomes more conservative than ELB. The BTP method is generally conservative except at the sample size (10, 10, 10) when P₂ = 0.9. The asymptotic approach APV remains liberal for most of the cases; 12 however, as the sample size increases to (100, 100, 100), the coverage probability is very close to the 95% nominal level.

In summary, the GI method or the BCGI method work well for normal and beta distributions, but becomes unusable for the combined distributions case, where the Box- Cox transformation fails to work. The performance of APV is very unstable as it is slightly conservative for the normal case and is generally liberal for the non-normal ones. The BTII, for large P₂’s, is conservative under large unbalanced sample sizes and gives very liberal estimates under small sample sizes. The BTP produces conservative confidence intervals for most of the cases. The ELP performs well for scenarios with smaller P₂, but it turns out to be conservative for the cases with higher P₂. Finally, the proposed ELB method gives stable confidence interval estimation with coverage probability close to the nominal level for almost all cases, except that it can be slightly conservative under small sample sizes. Therefore, overall speaking, the ELB method is highly recommended, especially for the cases when normality assumptions are violated and Box-Cox transformation fails to work.

5. EXAMPLE

Alzheimer’s disease (AD) is the most common form of dementia, and it is one of the most costly diseases for society in Europe and the United States. According to Wimo et al. (2013), the total estimated worldwide costs of dementia were US$604 billion in 2010. About 70% of the costs occurred in western Europe and North America. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) is a research project that is designed to validate the use of biomarkers including blood tests, tests of cerebrospinal fluid, and MRI/PET imaging for Alzheimer’s disease clinical trials and diagnosis. It aims to define the rate of progress of mild cognitive impairment (MCI) and AD, to develop improved methods for clinical trials, and to provide a large database which will improve design of clinical treatment trials.

In the ADNI database, there are many biomarkers to measure the disease progress of AD. Here we use a small subset which includes ratio of levels of protein Tau and protein Aβ₄₂ (TAU/ABETA), Fluoro Deoxy Glucose (FDG) and Alzheimer’s Disease Assessment Scale (ADAS11) at the 24th month visit. The clinical dementia rating (CDR) denotes the severity of dementia and a global CDR is derived from individual ratings in multiple domains by an experienced clinician. CDR 0 indicates no dementia and CDR 0.5, 1, 2 and 3 represent very mild, mild, moderate, and severe dementia, respectively. Since patients with large CDR such as 2 or 3 are rarely available, patients with CDR greater than or equal to 1 are referred as the fully diseased group. CDR 0 and 0.5 refer to the non-diseased group and the early diseased group respectively. This subset contains 194, 290 and 183 subjects for the non-diseased, the early diseased, and the fully diseased group respectively. Due to missing values, the actual sample sizes for each variable may vary, as reported in Table 4. Figures 2 presents the estimated kernel densities of the three disease groups for TAU/ABETA, FDG and ADAS11 respectively. By utilizing the Shapiro-Wilk’s normality test, TAU/ABETA is found to satisfy the normality assumptions after the Box-Cox transformation; for FDG, the original data meets the normality assumptions; and for ADAS11, the data either with or without the Box-Cox transformation cannot achieve the normality assumptions for all three groups simultaneously. Since the parametric assumptions are not met, GI/BCGI cannot be rationally applied. Therefore, only the other methods are used to analyze this variable. Table 5 presents the estimated confidence intervals of P₂ for each variable. Under the recommended ELB approach, ADAS11 achieves (0.4660, 0.6657) as its 95% confidence interval for P₂, suggesting it gives a mediocre performance to diagnose the early stage AD patients.

Table 4.

Summary Statistcs for ADNI data.

	CDR 0			CDR 0.5			CDR 1
Biomarker	N	Mean	Std	N	Mean	Std	N	Mean	Std	VUS
TAU/ABETA	24	0.37	0.21	48	0.72	0.48	26	0.89	0.48	0.3890
FDG	82	6.37	0.56	130	5.86	0.68	70	4.95	0.74	0.5560
ADAS11	193	5.44	2.83	288	12.26	5.84	180	26.23	11.70	0.7575

Open in a new tab

Estimated kernel densities for TAU/ABETA, FDG and ADAS11 in the ADNI data.

Table 5.

Estimated confidence intervals for the probability of detecting early diseased individuals using TAU/ABETA, FDG and ADAS11 of the ADNI data (sensitivity to fully diseased stage and specificity are assumed to equal to 0.8).

Confidence Intervals for the test covariates

BTII

BTP

ELB

BCGI

Biomarkers

{\hat{P}}_{2}^{NP}

TAU/ABETA

0.1335

0.0052

0.2614

0.0371

0.2685

0.0073

0.3712

0.0000

0.2104

FDG

0.2011

0.0875

0.3388

0.1001

0.3620

0.0724

0.3716

0.0349

0.3152

ADAS11

0.5754

0.4806

0.6927

0.4829

0.6834

0.4660

0.6657

Open in a new tab

TAU/ABETA: Ratio of the CSF parameters: protein Tau and protein Aβ₄₂.

FDG: Fluoro Deoxy Glucose.

ADAS11: Alzheimer’s Disease Assessment Scale.

BTII: Confidence interval is computed by the BTII approach.

ELB: Confidence interval is computed by the ELB approach.

ELP: Confidence interval is computed by the ELP approach.

APV: Confidence interval is computed by the APV approach.

${\hat{P}}_{2}^{NP}$ : The nonparametric estimation of P₂ in equ (3).

6. SUMMARY AND DISCUSSION

For disease processes with three ordinal stages, the sensitivity to the early diseased stage given specificity and sensitivity to the fully diseased stage, P₂, is considered as an important diagnostic accuracy index, especially for early disease detection. The higher P₂, the better the diagnostic ability of the diagnostic test or biomarker for identifying the early diseased stage. Therefore, an accurate estimation of the confidence interval for P₂ will facilitate investigators to identify the good biomarkers. This article proposes the ELB approach and compares it with the existing confidence intervals. Simulation studies show that ELB not only is more robust than parametric methods which heavily rely on the normality assumptions, but also generally gives more accurate confidence intervals than non-parametric methods, especially for unbalanced data sets. Therefore, the ELB method is highly recommended in practice.

For future work, following the same vein of Dong et al. (2014), we would like to develop the semi-parametric inference procedure for the difference of two correlated P₂’s, based on the empirical likelihood technique.

Acknowledgments

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimers Association; Alzheimers Drug Discovery Foundation; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. The ADNI research was also supported by NIH grants P30 AG010129 and K01 AG030514.

APPENDIX 1: PROOF OF VARIANCE of ${\hat{\bar{P}}}_{2}$ in (4)

The asymptotic variance of ${\hat{\bar{P}}}_{2}$ is shown in (4). The following is the proof.

Proof:

\begin{matrix} σ_{{\hat{\bar{P}}}_{2}}^{2} & = & E_{\hat{C_{1}}, \hat{C_{2}}} {{Var}_{{\hat{F}}_{2}} [{\hat{F}}_{2} ({\hat{c}}_{1} \leq Y \leq {\hat{c}}_{2})]} + {Var}_{\hat{C_{1}}, \hat{C_{2}}} {E_{{\hat{F}}_{2}} [{\hat{F}}_{2} ({\hat{c}}_{1} \leq Y \leq {\hat{c}}_{2})]} \\ = & E_{\hat{C_{1}}, \hat{C_{2}}} {\frac{P_{2} ({\hat{c}}_{1} \leq Y \leq {\hat{c}}_{2}) [1 - P_{2} ({\hat{c}}_{1} \leq Y \leq {\hat{c}}_{2})]}{n_{2}}} + {Var}_{\hat{C_{1}}, \hat{C_{2}}} [P_{2} ({\hat{c}}_{1} \leq Y \leq {\hat{c}}_{2})] \cdot \end{matrix}

As ${\hat{c}}_{1} \overset{P}{\to} c_{1}$ and ${\hat{c}}_{2} \overset{P}{\to} c_{2}$ , and we assume P₂ is continuous, so

E_{\hat{C_{1}}, \hat{C_{2}}} {\frac{P_{2} ({\hat{c}}_{1} \leq Y \leq {\hat{c}}_{2}) [1 - P_{2} ({\hat{c}}_{1} \leq Y \leq {\hat{c}}_{2})]}{n_{2}}} \overset{P}{\to} \frac{P_{2} (c_{1} \leq Y \leq c_{2}) [1 - P_{2} (c_{1} \leq Y \leq c_{2})]}{n_{2}} = \frac{P_{2} (1 - P_{2})}{n_{2}} .

Furthermore, since ĉ₁ ⊥ ĉ₂, we have

\begin{matrix} {Var}_{\hat{c_{1}}, \hat{c_{2}}} [P_{2} ({\hat{c}}_{1} \leq Y \leq {\hat{c}}_{2})] & = & {Var}_{\hat{c_{1}}, \hat{c_{2}}} [F_{2} ({\hat{c}}_{2}) - F_{2} ({\hat{c}}_{1})] \\ = & {Var}_{\hat{c_{2}}} [F_{2} ({\hat{c}}_{2})] + {Var}_{\hat{c_{1}}} [F_{2} ({\hat{c}}_{1})] \\ = & f_{2}^{2} ({\hat{c}}_{2}) \cdot Var ({\hat{c}}_{2}) + f_{2}^{2} ({\hat{c}}_{1}) \cdot Var ({\hat{c}}_{1}) \\ = & \frac{P_{3} (1 - P_{3})}{n_{3} \cdot f_{3}^{2} ({\hat{c}}_{2})} \cdot f_{2}^{2} ({\hat{c}}_{2}) + \frac{P_{1} (1 - P_{1})}{n_{1} \cdot f_{1}^{2} ({\hat{c}}_{1})} \cdot f_{2}^{2} ({\hat{c}}_{1}) \\ \overset{P}{\to} & \frac{P_{3} (1 - P_{3})}{n_{3} \cdot f_{3}^{2} (c_{2})} \cdot f_{2}^{2} (c_{2}) + \frac{P_{1} (1 - P_{1})}{n_{1} \cdot f_{1}^{2} (c_{1})} \cdot f_{2}^{2} (c_{1}) . \end{matrix}

Hence

σ_{{\hat{\bar{P}}}_{2}}^{2} = \frac{P_{2} (1 - P_{2})}{n_{2}} + \frac{P_{1} (1 - P_{1})}{n_{1}} \cdot \frac{f_{2}^{2} [F_{1}^{- 1} (P_{1})]}{f_{1}^{2} [F_{1}^{- 1} (P_{1})]} + \frac{P_{3} (1 - P_{3})}{n_{3}} \cdot \frac{f_{2}^{2} [F_{3}^{- 1} (1 - P_{3})]}{f_{3}^{2} [F_{3}^{- 1} (1 - P_{3})]} .

APPENDIX 2: PROOF OF THEOREM IN SECTION 3

Proof:

By similar arguments used in Owen (1990), we can easily show that $| λ | = O_{p} (n_{2}^{- 1 / 2})$ and max_1≤j≤n₂ |Û − P₂| = O(1) a.s.. Then we have

\begin{matrix} l (P_{2}) & = & 2 \sum_{j = 1}^{n_{2}} log {1 + \tilde{λ} ({\hat{U}}_{j} - P_{2})} \\ = & 2 \sum_{j = 1}^{n_{2}} {λ ({\hat{U}}_{j} - P_{2}) - \frac{1}{2} λ^{2} {({\hat{U}}_{j} - P_{2})}^{2}} + r_{n_{2}} \end{matrix}

where $| r_{n_{2}} | \leq c \sum_{j = 1}^{n_{2}} {| λ ({\hat{U}}_{j} - P_{2}) |}^{3} \leq c | λ^{3} | n_{2} = O_{p} (n_{2}^{- 1 / 2})$ .

From (6),

\begin{matrix} λ & = & \frac{\sum_{j = 1}^{n_{2}} ({\hat{U}}_{j} - P_{2})}{\sum_{j = 1}^{n_{2}} {({\hat{U}}_{j} - P_{2})}^{2}} + o_{p} (n_{2}^{- 1 / 2}), \\ \sum_{j = 1}^{n_{2}} λ ({\hat{U}}_{j} - P_{2}) & = & \sum_{j = 1}^{n_{2}} λ ({\hat{U}}_{j} - P_{2}) + o_{p} (1) . \end{matrix}

Therefore,

\begin{matrix} l (P_{2}) & = & \sum_{j = 1}^{n_{2}} λ ({\hat{U}}_{j} - P_{2}) + o_{p} (1) \\ = & \frac{{[\sum_{j = 1}^{n_{2}} ({\hat{U}}_{j} - P_{2})]}^{2}}{\sum_{j = 1}^{n_{2}} {({\hat{U}}_{j} - P_{2})}^{2}} + o_{p} (1) \\ = & \frac{{[\sqrt{n_{2}} ({\hat{P}}_{2} - P_{2})]}^{2}}{\frac{1}{n_{2}} \sum_{j = 1}^{n_{2}} {({\hat{U}}_{j} - P_{2})}^{2}} + o_{p} (1) \end{matrix}

where ϕ is defined in (6) and P̂₂ is a three-sample statistic and

{\hat{P}}_{2} = \frac{1}{n_{2}} \sum_{j = 1}^{n_{2}} ϕ [{\hat{F}}_{1}^{- 1} (P_{1}), y_{j}, {\hat{F}}_{3}^{- 1} (1 - P_{3})] .

From the previous proof and the central limit theorem, we know that ${[\sqrt{n_{2}} ({\hat{P}}_{2} - P_{2})]}^{2}$ is asymptotically normal with the variance $n_{2} \cdot σ_{{\hat{\bar{P}}}_{2}}^{2}$ . From the law of large numbers, we have

\frac{1}{n_{2}} \sum_{j = 1}^{n_{2}} {(U_{j} - P_{2})}^{2} \overset{P}{\to} Var (U_{j}) .

It is easy to check

| \frac{1}{n_{2}} \sum_{j = 1}^{n_{2}} {({\hat{U}}_{j} - P_{2})}^{2} - \frac{1}{n_{2}} \sum_{j = 1}^{n_{2}} {(U_{j} - P_{2})}^{2} | \overset{P}{\to} 0 .

Therefore, by the Slutsky Theorem,

r_{P_{1}, P_{2}, P_{3}} \cdot l (P_{2}) \overset{ℒ}{\to} χ_{1}^{2}

where the scale constant r_{P₁,P₂,P₃} is

r_{P_{1}, P_{2}, P_{3}} = \frac{σ_{{\hat{U}}_{i}}^{2}}{n_{2} \cdot σ_{{\hat{\bar{P}}}_{2}}^{2}} .

Footnotes

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf

References

Agresti A, Coull BA. Approximate is better than ”exact” for interval estimation of Binomial proportions. The American Statistician. 1998;52:119–126. [Google Scholar]
Claeskens G, Jing BY, Peng L, Zhou W. An empirical likelihood confidence interval for an ROC curve. The Canadian Journal of Statistics. 2003;31:173–190. [Google Scholar]
Dong T, Tian L, Hutson A, Xiong CJ. Parametric and non-parametric confidence intervals of the probability of identifying early disease stage given sensitivity to full disease and specificity with three ordinal diagnostic groups. Statistics in Medicine. 2011;30:3532–3545. doi: 10.1002/sim.4401. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dong T, Kang L, Hutson A, Xiong CJ, Tian L. Confidence interval estimation of the difference between two sensitivities to the early disease stage. Biometrical Journal. 2014;56:270–286. doi: 10.1002/bimj.201200012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dreiseitl S, Ohno-Machado L, Binder M. Comparing three-class diagnostic tests by three-way ROC analysis. Medical Decision Making. 2000;20:323–331. doi: 10.1177/0272989X0002000309. [DOI] [PubMed] [Google Scholar]
Greenhouse SW, Mantel N. The evaluation of diagnostic tests. Biometrics. 1950;6:399–412. [PubMed] [Google Scholar]
Heckerling PS. Parametric three-way receiver operating characteristic surface analysis using Mathematica. Medical Decision Making. 2001;21:409–417. doi: 10.1177/0272989X0102100507. [DOI] [PubMed] [Google Scholar]
He X, Frey EC. The meaning and use of the volume under a three-class ROC surface (VUS) IEEE Transactions on Medical Imaging. 2008;27:577–588. doi: 10.1109/TMI.2007.908687. [DOI] [PMC free article] [PubMed] [Google Scholar]
He X, Gallas BD, Frey EC. Three-Class ROC analysis—toward a general decision theoretic solution. IEEE Transactions on Medical Imaging. 2010;29:206–215. doi: 10.1109/TMI.2009.2034516. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kang L, Tian L. Estimation of the volume under the ROC surface with three ordinal diagnostic categories. Computational Statistics & Data Analysis. 2013;62:39–51. doi: 10.1016/j.csda.2013.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li J, Zhou XH. Nonparametric and semiparametric estimation of the three way receiver operating characteristic surface. Journal of Statistical Planning and Inference. 2009;139:4133–4142. [Google Scholar]
Li J, Zhou XH, Fine JP. A regression approach to ROC surface, with applications to Alzheimer’s disease. Science China Mathematics. 2012;55:1583–1595. doi: 10.1007/s11425-012-4462-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Linnet K. Comparison of quantitative diagnostic tests: type I error, power, and sample size. Statistics in Medicine. 1987;6:147–158. doi: 10.1002/sim.4780060207. [DOI] [PubMed] [Google Scholar]
McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Medical Decision Making. 1984;4:137–150. doi: 10.1177/0272989X8400400203. [DOI] [PubMed] [Google Scholar]
Mossman D. Three-way ROCs. Medical Decision Making. 1999;19:78–89. doi: 10.1177/0272989X9901900110. [DOI] [PubMed] [Google Scholar]
Nakas CT, Alonzo TA, Yiannoutsos CT. Accuracy and cut-off point selection in three-class classification problems using a generalization of the Youden index. Statistics in Medicine. 2010;29:2946–2955. doi: 10.1002/sim.4044. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nakas CT, Yiannoutsos CT. Ordered multiple-class ROC analysis with continuous measurements. Statistics in Medicine. 2004;23:3437–3449. doi: 10.1002/sim.1917. [DOI] [PubMed] [Google Scholar]
Owen A. Empirical likelihood ratio confidence regions. Annals of Statistics. 1990;18:90–120. [Google Scholar]
Owen A. Empirical likelihood. New York: Chapman & Hall/CRC; 2001. [Google Scholar]
Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford Statistical Science Series; 2003. p. 28. [Google Scholar]
Platt RW, Hanley JA, Yang H. Bootstrap confidence intervals for the sensitivity of a quantitative diagnostic test. Statistics in Medicine. 2000;19:313–322. doi: 10.1002/(sici)1097-0258(20000215)19:3<313::aid-sim370>3.0.co;2-k. [DOI] [PubMed] [Google Scholar]
Qin GS, Davis AE, Jing BY. Empirical likelihood-based confidence intervals for the sensitivity of a continuous-scale diagnostic test at a fixed level of specificity. Statistical Methods in Medical Research. 2011;20:217–231. doi: 10.1177/0962280209105512. [DOI] [PubMed] [Google Scholar]
Shapiro D. The interpretation of diagnostic tests. Statistical Methods in Medical Research. 1999;8:113–134. doi: 10.1177/096228029900800203. [DOI] [PubMed] [Google Scholar]
Tian L, Xiong C, Lai C, Vexler A. Exact confidence interval estimation for the difference in diagnostic accuracy with three ordinal diagnostic groups. Journal of Statistical Planning and Inference. 2010;141:549–558. doi: 10.1016/j.jspi.2010.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wan S. An empirical likelihood confidence interval for the volume under ROC surface. Statistics & Probability Letters. 2012;82:1463–1467. [Google Scholar]
Wand MP, Jones MC. Kernel smoothing. New York: Chapman & Hall/CRC; 1995. [Google Scholar]
Wimo A, Jönsson L, Bond J, Prince M, Winblad B. The worldwide economic impact of dementia 2010. Alzheimer’s & dementia : the journal of the Alzheimer’s Association. 2013;9:1–11. doi: 10.1016/j.jalz.2012.11.006. [DOI] [PubMed] [Google Scholar]
Xiong CJ, Gerald VB, Philip M, John CM. Measuring and estimating diagnostic accuracy when there are three ordinal diagnostic groups. Statistics in Medicine. 2006;25:1251–1273. doi: 10.1002/sim.2433. [DOI] [PubMed] [Google Scholar]
Zhou XH, Obuchowski N, McClish D. Statistical methods in diagnostic medicine. Wiley; New York: 2002. [Google Scholar]
Zhou XH, Qin GS. Improved Confidence Intervals for the sensitivity to full disease at a fixed Level of specificity of a continuous-scale diagnostic test. Statistics in Medicine. 2005;24:465–477. doi: 10.1002/sim.1563. [DOI] [PubMed] [Google Scholar]
Zou KH, Liu A, Bandos A, Ohno-Machado L, Rockette H. Statistical evaluation of diagnostic performance: topics in ROC analysis. CRC Press; 2010. [Google Scholar]

[R1] Agresti A, Coull BA. Approximate is better than ”exact” for interval estimation of Binomial proportions. The American Statistician. 1998;52:119–126. [Google Scholar]

[R2] Claeskens G, Jing BY, Peng L, Zhou W. An empirical likelihood confidence interval for an ROC curve. The Canadian Journal of Statistics. 2003;31:173–190. [Google Scholar]

[R3] Dong T, Tian L, Hutson A, Xiong CJ. Parametric and non-parametric confidence intervals of the probability of identifying early disease stage given sensitivity to full disease and specificity with three ordinal diagnostic groups. Statistics in Medicine. 2011;30:3532–3545. doi: 10.1002/sim.4401. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Dong T, Kang L, Hutson A, Xiong CJ, Tian L. Confidence interval estimation of the difference between two sensitivities to the early disease stage. Biometrical Journal. 2014;56:270–286. doi: 10.1002/bimj.201200012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Dreiseitl S, Ohno-Machado L, Binder M. Comparing three-class diagnostic tests by three-way ROC analysis. Medical Decision Making. 2000;20:323–331. doi: 10.1177/0272989X0002000309. [DOI] [PubMed] [Google Scholar]

[R6] Greenhouse SW, Mantel N. The evaluation of diagnostic tests. Biometrics. 1950;6:399–412. [PubMed] [Google Scholar]

[R7] Heckerling PS. Parametric three-way receiver operating characteristic surface analysis using Mathematica. Medical Decision Making. 2001;21:409–417. doi: 10.1177/0272989X0102100507. [DOI] [PubMed] [Google Scholar]

[R8] He X, Frey EC. The meaning and use of the volume under a three-class ROC surface (VUS) IEEE Transactions on Medical Imaging. 2008;27:577–588. doi: 10.1109/TMI.2007.908687. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] He X, Gallas BD, Frey EC. Three-Class ROC analysis—toward a general decision theoretic solution. IEEE Transactions on Medical Imaging. 2010;29:206–215. doi: 10.1109/TMI.2009.2034516. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Kang L, Tian L. Estimation of the volume under the ROC surface with three ordinal diagnostic categories. Computational Statistics & Data Analysis. 2013;62:39–51. doi: 10.1016/j.csda.2013.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Li J, Zhou XH. Nonparametric and semiparametric estimation of the three way receiver operating characteristic surface. Journal of Statistical Planning and Inference. 2009;139:4133–4142. [Google Scholar]

[R12] Li J, Zhou XH, Fine JP. A regression approach to ROC surface, with applications to Alzheimer’s disease. Science China Mathematics. 2012;55:1583–1595. doi: 10.1007/s11425-012-4462-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Linnet K. Comparison of quantitative diagnostic tests: type I error, power, and sample size. Statistics in Medicine. 1987;6:147–158. doi: 10.1002/sim.4780060207. [DOI] [PubMed] [Google Scholar]

[R14] McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Medical Decision Making. 1984;4:137–150. doi: 10.1177/0272989X8400400203. [DOI] [PubMed] [Google Scholar]

[R15] Mossman D. Three-way ROCs. Medical Decision Making. 1999;19:78–89. doi: 10.1177/0272989X9901900110. [DOI] [PubMed] [Google Scholar]

[R16] Nakas CT, Alonzo TA, Yiannoutsos CT. Accuracy and cut-off point selection in three-class classification problems using a generalization of the Youden index. Statistics in Medicine. 2010;29:2946–2955. doi: 10.1002/sim.4044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Nakas CT, Yiannoutsos CT. Ordered multiple-class ROC analysis with continuous measurements. Statistics in Medicine. 2004;23:3437–3449. doi: 10.1002/sim.1917. [DOI] [PubMed] [Google Scholar]

[R18] Owen A. Empirical likelihood ratio confidence regions. Annals of Statistics. 1990;18:90–120. [Google Scholar]

[R19] Owen A. Empirical likelihood. New York: Chapman & Hall/CRC; 2001. [Google Scholar]

[R20] Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford Statistical Science Series; 2003. p. 28. [Google Scholar]

[R21] Platt RW, Hanley JA, Yang H. Bootstrap confidence intervals for the sensitivity of a quantitative diagnostic test. Statistics in Medicine. 2000;19:313–322. doi: 10.1002/(sici)1097-0258(20000215)19:3<313::aid-sim370>3.0.co;2-k. [DOI] [PubMed] [Google Scholar]

[R22] Qin GS, Davis AE, Jing BY. Empirical likelihood-based confidence intervals for the sensitivity of a continuous-scale diagnostic test at a fixed level of specificity. Statistical Methods in Medical Research. 2011;20:217–231. doi: 10.1177/0962280209105512. [DOI] [PubMed] [Google Scholar]

[R23] Shapiro D. The interpretation of diagnostic tests. Statistical Methods in Medical Research. 1999;8:113–134. doi: 10.1177/096228029900800203. [DOI] [PubMed] [Google Scholar]

[R24] Tian L, Xiong C, Lai C, Vexler A. Exact confidence interval estimation for the difference in diagnostic accuracy with three ordinal diagnostic groups. Journal of Statistical Planning and Inference. 2010;141:549–558. doi: 10.1016/j.jspi.2010.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Wan S. An empirical likelihood confidence interval for the volume under ROC surface. Statistics & Probability Letters. 2012;82:1463–1467. [Google Scholar]

[R26] Wand MP, Jones MC. Kernel smoothing. New York: Chapman & Hall/CRC; 1995. [Google Scholar]

[R27] Wimo A, Jönsson L, Bond J, Prince M, Winblad B. The worldwide economic impact of dementia 2010. Alzheimer’s & dementia : the journal of the Alzheimer’s Association. 2013;9:1–11. doi: 10.1016/j.jalz.2012.11.006. [DOI] [PubMed] [Google Scholar]

[R28] Xiong CJ, Gerald VB, Philip M, John CM. Measuring and estimating diagnostic accuracy when there are three ordinal diagnostic groups. Statistics in Medicine. 2006;25:1251–1273. doi: 10.1002/sim.2433. [DOI] [PubMed] [Google Scholar]

[R29] Zhou XH, Obuchowski N, McClish D. Statistical methods in diagnostic medicine. Wiley; New York: 2002. [Google Scholar]

[R30] Zhou XH, Qin GS. Improved Confidence Intervals for the sensitivity to full disease at a fixed Level of specificity of a continuous-scale diagnostic test. Statistics in Medicine. 2005;24:465–477. doi: 10.1002/sim.1563. [DOI] [PubMed] [Google Scholar]

[R31] Zou KH, Liu A, Bandos A, Ohno-Machado L, Rockette H. Statistical evaluation of diagnostic performance: topics in ROC analysis. CRC Press; 2010. [Google Scholar]

PERMALINK

CONFIDENCE INTERVAL ESTIMATION FOR SENSITIVITY TO THE EARLY DISEASED STAGE BASED ON EMPIRICAL LIKELIHOOD

Tuochuan Dong

Lili Tian

Abstract

1. INTRODUCTION

2. EXISTING METHODS

2.1. Generalized Inference Method

2.2. Non-parametric Approaches

3. TWO NEW APPROACHES

3.1. Normal Approximation-Based Confidence Interval

3.2. Empirical Likelihood Confidence Interval

Theorem

Step 1

Step 2

Step 3

4. SIMULATION STUDIES

Figure 1.

Table 1.

Table 3.

Table 2.

5. EXAMPLE

Table 4.

Figure 2.

Table 5.

6. SUMMARY AND DISCUSSION

Acknowledgments

APPENDIX 1: PROOF OF VARIANCE of ${\hat{\bar{P}}}_{2}$ in (4)

APPENDIX 2: PROOF OF THEOREM IN SECTION 3

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

CONFIDENCE INTERVAL ESTIMATION FOR SENSITIVITY TO THE EARLY DISEASED STAGE BASED ON EMPIRICAL LIKELIHOOD

Tuochuan Dong

Lili Tian

Abstract

1. INTRODUCTION

2. EXISTING METHODS

2.1. Generalized Inference Method

2.2. Non-parametric Approaches

3. TWO NEW APPROACHES

3.1. Normal Approximation-Based Confidence Interval

3.2. Empirical Likelihood Confidence Interval

Theorem

Step 1

Step 2

Step 3

4. SIMULATION STUDIES

Figure 1.

Table 1.

Table 3.

Table 2.

5. EXAMPLE

Table 4.

Figure 2.

Table 5.

6. SUMMARY AND DISCUSSION

Acknowledgments

APPENDIX 1: PROOF OF VARIANCE of P¯^2 in (4)

APPENDIX 2: PROOF OF THEOREM IN SECTION 3

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

APPENDIX 1: PROOF OF VARIANCE of ${\hat{\bar{P}}}_{2}$ in (4)