Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 May 1.
Published in final edited form as: J Stat Plan Inference. 2016 May 1;172:23–25. doi: 10.1016/j.jspi.2015.12.006

A general approach to categorizing a continuous scale according to an ordinal outcome

Limin Peng a,*, Amita Manatunga a, Ming Wang b, Ying Guo a, AKM Fazlur Rahman a
PMCID: PMC4770918  NIHMSID: NIHMS748277  PMID: 26941475

Abstract

In practice, disease outcomes are often measured in a continuous scale, and classification of subjects into meaningful disease categories is of substantive interest. To address this problem, we propose a general analytic framework for determining cut-points of the continuous scale. We develop a unified approach to assessing optimal cut-points based on various criteria, including common agreement and association measures. We study the nonparametric estimation of optimal cut-points. Our investigation reveals that the proposed estimator, though it has been ad-hocly used in practice, pertains to nonstandard asymptotic theory and warrants modifications to traditional inferential procedures. The techniques developed in this work are generally adaptable to study other estimators that are maximizers of nonsmooth objective functions while not belonging to the paradigm of M-estimation. We conduct extensive simulations to evaluate the proposed method and confirm the derived theoretical results. The new method is illustrated by an application to a mental health study.

Keywords: Agreement, Association, Empirical process, M-estimation, Non-smooth objective function, Subsampling

1. Introduction

In many biomedical and behavioral studies, to identify a certain disease in human body, different instruments or rating scales are utilized. Typically, measurements are made on a continuous scale; however, researchers are often interested in dividing a continuous scale into ordered categories for reasons such as clinical interpretations of results and simplification of the instrument (O'Brien, 2004). For example, depression is a common problem in medically-ill patients with diabetes and other chronic diseases (Moussavi et al., 2007). The psychiatric diagnostic interview instruments, such as the Mini International Neuropsychiatric Interview (MINI) diagnostic interview (Sheehan et al., 1998), in general provide accurate psychiatric diagnoses in the medically healthy individuals. However, the MINI interview instrument is too time-consuming for sick patients and requires trained psychiatric interviewers, which are not always affordable or available. On the other hand, dimensional psychometric instruments designed to measure the same state of the disease, such as the 20-item Zung Depression rating scale (Zung, 1965), require less time and could be self-administered by patients. The established total observer-rated MINI score has been interpreted with well-accepted graded severity of depression such as no depression, mild depression, and markedly severe depression (Sheehan et al., 1998). While the self-reported Zung scale has many advantages, there are no routine, reliable cut-points of Zung provided to reflect the degree of the severity of depression. Establishment of such cut-points would enhance the utility of the convenient Zung rating scale in the screening or diagnosis of depression, particularly in large medically-ill patient populations.

Analytic methods for determining cut-points of a continuous scale based on validated categorical measurements, despite their practical importance, have not been well studied. For example, Youden index (Youden, 1950) and its variants based on receiver operating curve (ROC) (Pepe, 2003) were studied for identifying the “best” cut-point to dichotomize a continuous scale (Kraemer, 1988; Schisterman et al., 2005; Perkins and Schisterman, 2006). However, this type of approach can only deal with a single cut-point, and moreover, lack formal inference procedures about the estimated cut-point. For the general cases possibly involving two or more cut-points, existing approaches are mostly based on ad hoc arguments and generally lack statistical rigor. For example, the following methods were used in the literature: (a) considering arbitrary cut-points or a certain sample quantile like the median or a cut-point that corresponds to the highest proportions of correct classification with a gold standard (Altman et al., 1994; Mazumdar and Glassman, 2000); (b) finding out cut-points that result in disease rates consistent with a known population disease prevalence (Altman, 1991); and (c) depending on clinicians’ experience (Altman et al., 1994). The fundamental deficiency of all these methods is that the criterion levels or cut-points are generally decided subjectively on the basis of the probabilities of Type I/II misclassification, and “judgment” or “experience”. Several authors (James, 1978; Brownie and Habicht, 1984) proposed to minimize the variance estimator of the prevalence of a disease by assuming a mixture of normal distributions; however, this criterion is restricted by the distribution and is valid only for the cases with a single cut-point. Baughman et al. (2006) also used a mixture model but the maximum likelihood estimators may not be accurate when the distributions are not well separated, the sample size is small, or the mixture model is misspecified.

In this work, we seek to develop a general and objective analytic framework for addressing the problem of determining cut-points in a continuous scale according to an established ordinal scale. To this end, a fundamental question is, what are the most desirable or optimal cut-points? A common viewpoint in practice is that meaningful cut-points in a continuous scale should produce high agreement or association between the newly categorized continuous scale and established categories. Therefore, we propose to evaluate each set of cut-points by some criterion that reflects a desirable relationship between two ordinal scales (e.g. high agreement or association). From doing so, we obtain a function of cut-points, which we shall refer to as a criterion function. We then define the optimal set of cut-points as the one that optimizes the criterion function.

We propose a general formulation of criterion functions, which are expressed as a smooth function of cell probabilities of the contingency table formed from cross-tabulating an established categorical scale and the newly categorized continuous scale based on a given set of cut-points. Our definition encompasses many important special cases, including those where cut-points are evaluated by weighted kappa (Cohen, 1960; Agresti, 1990), Kendall's τb, correct classification rate (Kendall, 1938; Agresti, 1990), or Youden's index in ROC. The general specification of criterion functions forms the foundation of the proposed unified framework for investigating cut-points based on various criteria.

We study the estimation of optimal cut-points without imposing any parametric assumptions on the distributions of the data. We consider a natural approach, which is to optimize a nonparametric estimator of the criterion function, called empirical criterion function hereafter. For instance, one may adopt weighted kappa statistic when the criterion for optimal cut-points is weighted kappa coefficient. While the basic idea is conceptually intuitive and has been adopted in practice in an ad-hoc way, our detailed investigation indicates that such a method is subject to nonstandard theory and requires special attention to its inference procedures. For example, the resulting estimator has a slower convergence rate lower than the usual root n, and may not possess asymptotic normality. The main issue is that an empirical criterion function is usually not smooth, and more specifically, involves cut-points through indicator functions. The challenge resembles the difficulty in M-estimation with non-smooth objective function (Chernoff, 1964, for example). Nevertheless, the proposed estimator is yet not a M-estimator. As a result, existing methods that deal with non-smooth M-estimation, for example, Kim and Pollard (1990), are not directly applicable.

In this work, we employ the technique of empirical processes and conduct rigorous asymptotic studies for the proposed nonparametric cut-point estimator. It is important to point out that our theoretical framework is quite general and may be adapted to many other estimation settings that involve maximization of non-smooth objective functions. Provided the nonstandard asymptotic properties of the proposed estimator, bootstrapping fails to work properly (Kosorok, 2008). We propose to use subsampling (Politis et al., 1999) as a well justified device for inferences including variance estimation and confidence intervals.

We elaborate in Section 2 the proposed method for determining cut-points in a continuous scale. We present the general problem formulation, the proposed nonparametric estimation, and the corresponding asymptotic results and inference. Extensive simulation studies reported in Section 3 demonstrate satisfactory finite-sample performance of our proposals, and also help confirm some of our theoretical results. In Section 4, we illustrate our method via an application to a mental health study. Our analysis suggests a refinement of current empirical rules for categorizing depression among diabetic subjects based on the Zung rating scale. A few concluding remarks are provided in Section 5.

2. The proposed method

2.1. A general formulation of optimal cut-points

Let Y denote an ordinal measurement that takes ordinal values, 1 < . . . < L. Let X denote a continuous measurement bounded between xL and xU. All possible cut-points for X according to the L categories of Y form a compact parameter space, denoted by ϴ={(d1,,dL1):d0=xL<d1<<dL1<dL=dU}. For d=(d1,,dL1)ϴ, we define X~(d)=k=1LkI(dk1X<dk), where I(·) is the indicator function.

It is clear that X~(d) and Y form a L × L contingency table, which has cell probabilities, as arranged in a vector, P(d)=(p1,1(d),,p1,L(d),,pL,1(d),,pL,L(d))T, where pij(d)=Pr(X~(d)=i,Y=j). Let ϑ() denote a smooth function from [0, 1]L2 to R, at least twice differentiable almost everywhere. We define the optimal set of cut-points, d00, as

d0=argmaxdϴϑ(P(d)). (1)

Here ϑ(P(d)) serves as a general form of the criterion function, which offers great flexibility in scientific applications. If the view taken to determine cut-points is that the newly categorized scale X~(d) should produce high agreement or association with the ordinal categories of Y, we can properly choose the function ϑ(·) so that ϑ(P(d)) represents an agreement or association measure of interest. For instance, given the cell probabilities pij(d) and the marginal probabilities pi(d)j=1Lpij(d) and pj(d)i=1Lpij(d)(i,j=1,,L), the weighted kappa coefficient, as a popular agreement measure, can be expressed as

ϑκ(P(d))=i=1Lj=1Lωijpij(d)i=1Lj=1Lωijpi(d)pj(d)1i=1Lj=1Lωijpi(d)pj(d), (2)

where the weights {wij}i,j=1L are specified and represent the degree of discrepancy between two categories. Two common choices for wij include linear weights ωij=1ijL1 or quadratic weights ωij=1(ij)2(L1)2. In addition, a simpler uncorrected agreement measure is correct classification rate, which corresponds to

ϑCC(P(d))=i=1Lpii(d). (3)

Kendall's τb (Agresti, 1990) is a well-known association measure for ordinal measurements, and can be written as

ϑτ(P(d))=i=1L1j=1L1pij(d)(t=i+1Lm=j+1Lptm(d))i=1L1j=2Lpij(d)(t=i+1Lm=1j1ptm(d))12{(1i=1Lpi(d)2)(1j=1Lpj(d)2)}12. (4)

In the special case with L = 2, we can also show that the specification of ϑτ(P(d)) can encompass Youden's index, a measure for determining the cut-point in the context of ROC, which equals,

J(P(d))=p11(d)p1(d)+p22(d)p2(d)1.

These examples illustrate the general applicability of the proposed framework for determining cut-points in a continuous scale. Our general formulation of optimal cut-points in (1) accommodates many different ways that researchers decide and interpret cut-points in practice.

2.2. Nonparametric estimation of optimal cut-points

We study the nonparametric estimation of d0 that does not require parametric assumptions about data distributions. The basic idea is to estimate the criterion function ϑ(P(d)) by its empirical counterpart, ϑ(Pn(d)), and then find the maximizer of the empirical criterion function ϑ(Pn(d)). Here P^n(d)=(p^11(d),,p^1L(d),,p^L1(d),,p^LL(d))T with

p^ij(d)=1nk=1nI(X~k(d)=i,Yk=j).

More specifically, we propose to estimate d0 by

d^=argmaxdϴϑ(Pn(d)).

Suppose that observable data consist of n i.i.d. replicates of (X, Y), denoted by {Xt,Yt}t=1n. We can obtain d^ through the following steps:

  1. For dϴ, transform the continuous measures Xt's to ordinal scales X~t(d)'s;

  2. For each dϴ, calculate the empirical criterion function, ϑ(Pn(d)) using the data {(X~t(d),Yt)}t=1n;

  3. Search in ϴ for d which maximizes ϑ(Pn(d)).

It is worth noting that, for a given dataset, ϑ(Pn(d)) is a piecewise constant function of d, which only jumps at the observed values of X. As a result, the search for the maximizer of ϑ(Pn(d)) only requires the evaluation of the empirical criterion function at a finite number of d{x1,,xn}L1RL1, where xj denotes the observed value of Xj (j = 1, . . . , n). We can also see that the set of all maximizers of ϑ(Pn(d)) takes the form of either a product of L – 1 left open and right closed intervals or a union of multiple such disjoint product intervals. In the large sample sense, one may choose any value in the maximizer set as the estimator of d0. In our numerical studies, we define d^ as the midpoint of the leftmost maximizer product interval (e.g. the first interval when L = 2). The number of solutions is defined as the number of disjoint product intervals. We regard the case where the maximizer set contains two or more disjoint product intervals as the case with multiple solutions.

2.3. Asymptotic properties and inference

While the proposed estimator d^ is conceptually simple, studying its asymptotic properties however is not trivial. The main challenge comes from the fact that Pn(d) is not a smooth function of d and so is ϑ(Pn(d)). The nature of the difficulty mimics that in M-estimation when the objective function is not smooth and thus the standard linearization technique (van der Vaart and Wellner, 1996) does not work. On the other hand, d^ is not a M-estimator given that ϑ(Pn(d)), in general, is not an empirical measure of any known function. As a result, existing results on irregular M-estimation (Kim and Pollard, 1990; van der Vaart and Wellner, 1996) are not directly applicable.

To address these challenges, we take the following steps. First, we “linearize” ϑ(Pn(d)) based on the smoothness of ϑ(·). Next, we examined d=argmax[ϑ{P(d)}+ϑ(1){P(d)}{Pn(d)P(d)}], the maximizer of the linear approximation of ϑ(Pn(d)). Here ϑ(1) (·) denotes the first derivative of ϑ(·), From the definition of Pn(d), we see that d is a M-estimator, for which we can use empirical process techniques to tackle its asymptotic behaviors. We find that the non-smoothness of ϑ{P(d)}+ϑ(1){P(d)}{Pn(d)P(d)} in d causes a “sharp-edge effect” (Kim and Pollard, 1990; Delgado et al., 2001). As a result, the convergence rate of d slows to n1/3. The limiting distribution of d is not necessarily normal but a random vector that maximizes a Gaussian process. Finally, we are able to show the asymptotic equivalence between n13(d^d0) and n13(dd0), and therefore the large-sample properties of d^ follow those derived for d.

We first introduce necessary notation and regularity conditions. Define

ϑ(1)(P)=ϑ(P)PT,D(d)=ϑ(1)(P(d))d,V(d)=2ϑ(P(d))ddT,ψ(δ)={x=(x1,,xL21)T:infdϴxP(d)δ},

where ∥ · ∥ denotes the Euclidean norm. The regularity conditions include:

  • C1.

    (i) ϑ{P(d)} is twice differentiable with respect to d; (ii) d0 is the unique maximizer of ϑ{P(d)} with bounded nonsingular second-derivative matrix V(d0);

  • C2.

    There exists δ0 such that ϑ(1) (P) exists and is bounded for Pψ(δ0);

  • C3.

    The conditional density of X given Y = l (l = 1, . . . , L), denoted by fX|Y=l(x), is uniformly bounded in x.

These regularity conditions pose rather mild assumptions on ϑ (·) and the conditional distribution of X given Y = l l = 1, . . . , L). More specifically, C1(i) and C2 requires ϑ (·) and the conditional distribution of X given Y be sufficiently smooth, and C3 assumes bounded conditional density functions for X given Y. These assumptions are expected to hold for many choices of ϑ (·), such as those corresponding to weighted Kappa, Kendall's tau, and correct classification rate, and common continuous distributions, such as Normal distributions. Note that, C1(ii) is warranted by our formulation of the optimal cut-points. That is, d0 is not well defined unless ϑ{P(d)} has a unique maximizer. By C1(ii), d0 is further assumed to be an interior point of ϴ. Figs. 13 in our empirical studies suggest the plausibility of this assumption.

Fig. 1.

Fig. 1

The plots of criterion functions (left column) and empirical criterion functions (right column) based on weighted kappa (solid lines), Kendall's τb (dashed lines) and correct classification rate (dotted lines).

Fig. 3.

Fig. 3

The plots of empirical criterion functions based on weighted kappa (ϑκ, left top), Kendall's τb (ϑ^τ, right top) and correct classification rate (ϑ^CC, bottom) versus all possible two-dimensional cut-points for the Diabetes and Depression study dataset (n = 1430).

Note that ϑ(·) is a pre-specified function in the proposed framework. When the joint distribution of X and Y is known (like in Monte-Carlo simulations), we can derive the analytic forms for P(d) and ϑ{P(d)}; thus we can verify conditions C1 and C2 analytically. In practice, the joint distribution of X and Y is typically unknown. Therefore, we do not expect the regularity conditions can be fully verified. A practical recommendation is to first derive reasonable parametric estimates for P(d) and then plug them into ϑ{P(d)} and ϑ(1){P(d)}. This would help evaluate whether a selected ϑ(·) meets the required technical assumptions.

We state the asymptotic properties of d^ in the following theorems. Detailed proofs are provided in Appendix.

Theorem 1

Under the regularity conditions C1–C2, the proposed cut-point estimator, d^, is consistent. That is, d^Pd0 as n → ∞.

Theorem 2

Under the regularity conditions C1–C3, n13{d^d0} converges in distribution to the unique maximizer of the stochastic process, G(h)+12hV(d0)h, Here G(h) is a zero-mean Gaussian process of continuous sample paths, and satisfies (A.3) in Appendix A.3.

Our theoretical investigations show that n13(d^d0) converges to a tight limit distribution, which however does not have a closed form. Given the non-standard asymptotics, using conventional bootstrap for inference is questionable (Abrevaya and Huang, 2005; Kosorok, 2008). We propose to adopt random subsampling (Politis et al., 1999) that uses without-replacement subsamples instead of with-replacement bootstrap samples to approximate the distribution of n13(d^d0). More specifically, let superscript (j) denote the jth component of a vector. The subsampling-based inference can be carried out as follows.

  • Step 1: Choose a subsample size b. Set s = 1.

  • Step 2: Randomly draw a subsample of size b without replacement from {Xt,Yt}t=1n. Apply the proposed method to estimate d0 based on this random subsample. Denote the resulting estimator by d^(s).

  • Step 3: Increase s by 1. If s is less than a prespecified large number S, then go back to Step 2.

  • Step 4: Compute the empirical variance of {d^(s)(j)}s=1S, which provides a variance estimator for d^(j).

  • Step 5: Compute the empirical 100(1 – α) th percentile of {d^(s)(j)d^(j)}s=1S, denoted by να(j) . The 100(1 – α) confidence interval for d0(j) can be constructed by [d^(j)να(j),d^(j)+να(j)].

The validity of the above subsampling procedure follows from the results of Politis and Romano (1994), coupled with our Theorem 2, which implies n13(d^d0) converges weakly to a limit distribution. Note that the subsample size s is subject to the theoretical constraints, s → ∞ as n → ∞ and s = o(n). Discussions about practical selection of s can be found in Politis and Romano (1994) and Delgado et al. (2001).

Other modifications of the conventional bootstrap method, such as m out of n bootstrap and smooth bootstrap, have also been investigated in various estimation settings with cubic root convergence (Lee and Pun, 2006; Léger and MacGibbon, 2006; Sen et al., 2010; Sen and Xu, 2015, among others). These methods can be adapted to make inference about d0 under stronger assumptions. For example, by the results of Léger and MacGibbon (2006), resampling without replacement from a smooth and symmetric estimator of fX|Y=l(x) (l = 1, . . . , L) can lead to a consistent bootstrap procedure when fX|Y=l(x) is a symmetric function of x. Such a smooth bootstrap procedure is described in detail in Appendix A.3. In practice, it can serve as a useful alternative inference procedure for d0 when real data suggest symmetric conditional distributions of X given Y = l (l = 1, . . . , L).

3. Simulation study

We conducted extensive simulations to evaluate the proposed method for determining cut-points in a continuous scale. We considered criterion functions constructed based on weighted kappa with linear weight (ϑκ), Kendall's τb (ϑτ), and correct classification rate (ϑCC). The cut-point estimators corresponding to these criterion functions are denoted by d^κ, d^τ and d^CC respectively.

We first examined the situation with L = 2, where the cut-point to estimate is a scalar. We generated Y as a binary random variable taking values 3 and 4 with equal probabilities. Conditional on Y , the continuous X was obtained as X = 2Y + ε, where ε follows N(0, σ2) distribution. We set σ = 0.6 and 1.0 to reflect high-to-moderate and moderate-to-low separation in X between different Y groups. In Fig. 1, we demonstrate the three criterion functions as well as the corresponding empirical criterion functions computed based on one simulated dataset with sample size n = 100. It is shown that the optimal cut-point based on ϑκ, ϑτ, or ϑCC is d0 = 7. All criterion functions are unimodal. The maxima of a criterion function is less “prominent” with a larger σ, which means less separation in X among different Y categories. The criterion functions, ϑκ and ϑτ, have steeper curvatures around their maxima than ϑCC. It is also observed that the empirical criterion functions approximate the true criterion function quite well with a moderate sample size n = 100.

We conducted 1000 Monte-Carlo simulations under each set-up with sample sizes n = 50, 100, and 200. For subsampling, we set S = 100, and selected b roughly as n0.7. That is, b = 15, 25, and 40 respectively when n = 50, 100, and 200. We evaluated the empirical bias, empirical standard deviations, and average estimated standard deviations of the proposed estimator, and empirical coverage probabilities of the proposed 95% confidence intervals. In case of multiple solutions, we selected our estimator as the midpoint of the leftmost maximizer interval (e.g. the first interval when L = 2). From Table 1, we observe that cut-point estimation based on criterion, weighted-kappa or Kendall's τb, may perform better than that based on corrected classification rate. The cut-point estimators, d^κ and d^τ, have small bias even with a sample size as small as n = 50. In comparison, d^CC seems to have considerably larger bias. For all three cut-point estimators, the estimated standard deviations are fairly close to the empirical standard deviations, and their agreement improves with sample size n. As expected, the standard deviations increase with σ, reflecting the elevated estimation variability with more overlap in X between Y categories. The 95% confidence intervals have pretty accurate coverage probabilities.

Table 1.

Simulation results when L = 2 (Bias: Empirical bias × 103; SE: empirical standard deviations × 103; ASE: average of estimated standard deviation × 103; CP: empirical coverage probabilities of 95% confidence intervals × 102).

σ n ϑ κ
ϑ τ
ϑ CC
Bias SE ASE CP Bias SE ASE CP Bias SE ASE CP
0.6 50 6 207 188 93.4 5 219 191 93.2 47 214 191 92.5
100 4 166 155 95.0 4 180 162 95.7 41 166 157 93.5
200 1 133 129 97.7 2 149 137 96.9 30 131 130 95.2
1.0 50 8 358 370 95.6 2 415 370 95.5 91 366 358 94.7
100 7 276 272 98.1 9 326 314 98.0 65 280 291 97.7
200 5 227 220 97.6 3 271 257 97.4 28 234 235 97.0

We also examined cases with L = 3, where d0 is a 2 × 1 vector. The ordinal Y was generated as multinomial random variable that takes values of 3, 4, and 5 with equal probabilities. The continuous X was obtained based on the same equation used for the cases with L = 2. Based on 1000 simulations, we summarize in Table 2 the results in the same format as in Table 1. Under each set-up, the first row corresponds to the smaller cut-point in X and the second row corresponds to the larger cut-point. In Table 2, we observe larger empirical bias and standard deviations compared to those in Table 1. The estimator d^CC again exhibits the largest bias. For all three estimators, the estimated standard deviations and empirical standard deviations still match pretty well, and the empirical coverage probabilities are close to the nominal level.

Table 2.

Simulation results when L = 3 (Bias: Empirical bias × 103; SE: empirical standard deviations × 103; ASE: average of estimated standard deviation × 103; CP: empirical coverage probabilities of 95% confidence intervals × 102).

σ n ϑ κ
ϑ τ
ϑ CC
Bias SE ASE CP Bias SE ASE CP Bias SE ASE CP
0.6 50 32 246 215 92.1 15 244 227 93.0 50 239 215 92.9
42 236 217 94.0 0 240 226 94.2 52 236 222 94.3
100 36 196 181 95.8 5 211 185 95.3 60 189 180 94.7
33 188 176 94.6 2 199 178 94.9 42 188 176 94.8
200 29 152 146 96.8 4 166 151 97.2 39 153 147 95.8
26 151 147 95.8 5 163 151 96.2 35 151 147 95.4
1.0 50 34 396 384 96.9 58 463 434 97.0 96 416 400 94.8
12 400 387 96.0 56 466 432 96.8 98 415 427 95.4
100 7 311 321 96.6 29 399 367 96.5 79 322 335 96.8
3 327 322 96.9 51 405 370 96.2 69 342 343 94.9
200 5 251 258 98.4 31 323 301 97.3 51 257 265 96.8
3 252 255 98.5 21 316 301 98.0 53 262 270 96.9

It is important to note from Tables 12 that the standard deviations of each estimator decrease with sample size at a rate that conforms to the theoretical convergence rate of n1/3 stated in Theorem 2. More specifically, by assessing the ratios of empirical standard deviations between two different sample sizes (e.g. 100 versus 200 or 50 versus 200), we find that these ratios are generally around the cube root of the corresponding sample size ratios.

We also evaluated the performance of the alternative smooth bootstrap inference procedure (Léger and MacGibbon, 2006) described in Appendix A.3. Note that, the conditional distributions of X given Y = l are normal distributions in all simulation set-ups, and hence the smooth bootstrap approach is expected to yield consistent inference. We present the additional simulation results in Tables 3 and 4, parallel to Tables 1 and 2 respectively.

Table 3.

Simulation results when L = 2 (Bias: Empirical bias × 103; SE: empirical standard deviations × 103; ASEs: average of estimated standard deviation × 103 based on smooth bootstrap; CPs: empirical coverage probabilities of 95% confidence intervals × 102 based on smooth bootstrap).

σ n ϑ κ
ϑ τ
ϑ CC
Bias SE ASEs CPs Bias SE ASEs CPs Bias SE ASEs CPs
0.6 50 4 201 210 93.7 5 219 225 93.0 49 205 215 94.0
100 1 161 171 94.0 2 180 191 94.8 38 164 175 93.9
200 4 129 137 93.9 3 146 152 92.8 32 133 139 92.6
1.0 50 9 346 368 93.4 1 411 433 92.8 81 358 394 94.2
100 12 282 292 92.5 20 345 353 92.2 72 301 306 92.2
200 5 211 230 94.6 5 261 282 93.4 42 220 237 93.8

Table 4.

Simulation results when L = 3 (Bias: Empirical bias × 103; SE: empirical standard deviations × 103; ASEs: average of estimated standard deviation based on smooth bootstrap × 103; CPs: empirical coverage probabilities of 95% confidence intervals × 102 based on smooth bootstrap).

σ n ϑ κ
ϑ τ
ϑ CC
Bias SE ASEs CPs Bias SE ASEs CPs Bias SE ASEs CPs
0.6 50 −22 227 242 94.3 −83 232 252 93.4 −49 231 244 94.2
22 235 238 92.5 75 240 249 92.5 −47 235 241 92.1
100 −29 187 197 93.4 −82 202 213 91.3 −51 184 198 91.8
21 196 199 92.6 72 211 213 91.0 −47 196 199 92.3
200 −38 152 159 92.6 −82 167 173 90.0 −34 153 159 92.6
42.2 148 159 94.0 85 159 173 92.1 −27 148 159 94.3
1.0 50 31 408 416 92.0 −41 474 511 92.5 −89 433 445 91.2
−20 415 419 90.6 64 487 509 93.3 −94 435 449 92.8
100 8 320 335 92.7 −37 408 430 93.3 −77 339 351 92.2
−7 340 336 94.3 28 394 434 94.3 −76 321 353 93.4
200 9 250 263 93.1 −10 326 358 94.5 −43 258 276 94.0
8 252 264 93.8 31 321 357 95.1 −40 259 275 93.7

Our simulation results demonstrate satisfactory performance of the smooth bootstrap procedure. The smooth bootstrap procedure gives standard error estimates quite close to empirical standard errors. The resulting confidence intervals can be slightly under-covered when the sample size is small, and the coverage probabilities become closer to the nominal level as the sample size increases. We further compare the lengths of 95% confidence intervals between the subsampling method and the smooth bootstrap approach. Our simulation results (not reported here) show that subsampling based confidence intervals tend to be wider than the smooth bootstrap based confidence intervals. This is consistent with the observation from Tables 14 that the smooth bootstrap approach often yields coverage probabilities lower than the nominal level while the opposite trend holds for the subsampling method. In addition, our numerical experiences suggest that the subsampling procedure is computationally faster. For example, for a set-up with σ = 0.6 and n = 100, computation time ratio between the smooth bootstrap procedure and the subsampling procedure is 4.2 when L = 2, and the ratio is increased to 9.5 when L = 3. The shorter computation time and the wider confidence intervals with the subsampling approach may be explained by the smaller sample size used in each resampling step of the subsampling procedure.

As discussed in Section 2.2, an empirical criterion function usually has more than one maximizers that form a connected interval (i.e. one solution) or multiple disjoint intervals (i.e. multiple solutions). While all criterion functions considered in the simulations are unimodal, multiple solutions can occur with the finite sample estimation of optimal cut-points. In our simulation studies, we also investigated the extent to which the multiple solution problem exists. We report in Table 5 the average number of solutions (ANS) as well as the proportion of simulations that produced multiple solutions (PMS). It is shown that adopting correct classification rate as the criterion for determining cut-points may be prone to the most severe multiple solution problems as compared to the other two criteria, weight kappa and Kendall's τb. For example, when L = 2, the percentages of getting multiple solutions from using ϑCC range from 30% to 40%, which are much higher than those from ϑκ, around 3%–4%, or those from ϑτ, about 1%–3%. While the frequency of encountering multiple solutions seems to decrease as L increases from 2 to 3 when weighted kappa or Kendall's τb was used as the criterion for cut-point determination, it appears to be a reversed situation for ϑCC. When L = 3, multiple solutions were encountered in over 50% simulations in most set-ups with ϑCC. In contrast, using ϑκ or ϑτ only resulted in less than 1% simulations with multiple solutions. One possible explanation for these observations may relate to the observed flatter curvature around d0 in ϑCC(d) than that in ϑκ(d) or ϑτ(d). The flatter curvature in ϑCC around maxima may suggest more ambiguities in the identification of optimal cut-points with finite sample sizes, and consequently lead to the larger bias of d^CC observed in Tables 12 (or 3–4) and more frequent occurrences of multiple solutions observed in Table 5.

Table 5.

Summary of multiple solutions in simulations (ANS: average number of solutions; PMS: percent of multiple solutions).

σ n d^κ
d^τ
d^CC
ANS PMS ANS PMS ANS PMS
L = 2
0.6 50 1.052 4.0% 1.023 2.3% 1.374 30.2%
100 1.048 3.7% 1.014 1.4% 1.524 38.7%
200 1.024 2.9% 1.003 1.1% 1.731 39.7%
1.0 50 1.056 3.8% 1.009 0.9% 1.656 41.6%
100 1.030 2.6% 1.002 0.4% 1.751 45.1%
200 1.045 2.1% 1.003 1.8% 1.749 46.1%

L = 3
0.6 50 1.004 0.4% 1.015 1.5% 1.651 42.5%
100 1.000 0.0% 1.005 0.5% 2.216 58.8%
200 1.000 0.0% 1.001 0.1% 2.602 66.1%
1.0 50 1.004 0.3% 1.002 0.2% 2.491 64.1%
100 1.000 0.0% 1.000 0.0% 2.900 68.1%
200 1.000 1.0% 1.000 0.0% 3.397 71.4%

4. Diabetes and depression study

Diabetes and Depression study was conducted to determine the prevalence of depression among African-Americans with type II diabetes and its association with socioeconomic determinants, adherence to treatment and glycemic control. There are 1430 African American diabetes patients in the study. Two psychometric instruments were administered for each patient: the MINI diagnostic instrument (Sheehan et al., 1998) and the Zung rating scale (Zung, 1965). In order to develop an efficient diagnosis technique to identify major depression in a large population of minority diabetic patients, we seek to determine the cut-points of Zung rating scale for measuring the severity of the depression. Of note is that the MINI, a structured psychiatric interview to diagnose the syndrome of major depression with well-established graded severity guidelines, is designed to be used by licensed professionals or well-trained interviewers (Sheehan et al., 1998) and is time-consuming. In contrast, the Zung rating scale, a short self-administered survey with 20 items to quantify depressive symptoms with total scores ranging from 25 to 100, requires only 5–10 min to complete and does not require a high level of literacy. In this dataset, the average and median Zung scores are 45.4 and 43.8 respectively, and the inter-quartile range for Zung is (35.0, 55.0).

It has been recognized that physical illness of diabetes patients may falsely elevated the scores of certain items in scales especially because of those related to physical symptoms including fatigue, cognitive dysfunction. First, we consider the classification of patients into two categories by combining mild depressed patients with no depression patients as a non-major depression group and the other patients as a major depression group. According to the MINI, 1112 (77.7%) subjects have no or mild depression, and 318 (22.3%) subjects are moderately or severely depressed. Fig. 2 displays three criterion functions based on weighted kappa, Kendall's τb, and correct classification rate versus all possible cut-points for dichotomizing Zung rating scale according to the MINI depression status. It can be seen that the empirical criterion functions can be approximated by a smooth concave curve where the maximum is achieved when the cut-point is around 57 on the Zung rating scale.

Fig. 2.

Fig. 2

The plots of empirical criterion functions based on weighted kappa (ϑ^κ), Kendall's τb(ϑ^τ) and correct classification rate (ϑ^CC) versus all possible one-dimensional cut-points for the Diabetes and Depression study data (n = 1430).

When weighted kappa (with linear weight) is adopted as the criterion function, the estimated cut-point for the Zung scale is 57.0 (SE = 1.43) with 95% CI (53.94, 60.13). Here and hereafter, SE and CI stand for abbreviations for standard error and confidence interval respectively. The corresponding maximum weighted kappa is 0.64. The criterion function based on Kendall's tau provides the same estimate of 57.0 with a slightly larger SE = 1.60 and wider 95% CI, (53.49, 60.58). Using the empirical criterion function based on correct classification rate, the estimated cut-point is 57.5 (SE = 1.63) with 95% CI, (53.94, 60.13). Overall, the three different criterion functions consistently suggest using a cut-point around 57 in Zung rating scale to differentiate no or minor depression versus major depression.

Next, we consider the classification of the Zung scale into three categories: no depression, mild depression, and moderate or severe depression. According to the MINI, 1043 (72.9%) subjects have no depression, 69 (4.8%) are in mild depression and 318 (22.3%) are moderately or severely depressed. In Fig. 3, the three three-dimensional plots depict the empirical criterion functions defined on weighted kappa, Kendall's τb, and correct classification rate, respectively. It is clear that the empirical criterion function based on weighted kappa demonstrates the most desirable profile for identifying maximizers. Using weighted kappa (with linear weight) as the criterion for determining cut-points, we estimate the first cut-point in Zung separating no depression versus mild depression by 55.7 (SE = 1.42) with 95% CI (52.67, 58.84) and estimate the second one, separating mild versus moderate or severe depression, by 57.0 (SE = 1.15) with 95% CI (54.54, 59.53). Based on Kendall's τb, the first cut-point estimate is 55.7 (SE = 1.99) with 95% CI (50.86, 60.65) and the second one is 63.1 (SE = 6.07) with 95% CI (44.74, 69.33). Adopting the criterion function based on correct classification rate, we obtain the first cut-point estimate, 56.4 (SE = 1.18) with 95% CI, (53.28, 58.23); and the second one, 57.0 (SE = 1.20) with 95% confidence interval, (54.54, 59.53).

We note that the second cut-point estimate based on Kendall's τb is quite different from those based on weighted kappa or correct classification rate, and also has a pretty large standard error. This may be explained by the bi-modal trend in the second cut-point demonstrated by the empirical criterion function based on Kendall's τb (see Fig. 3). Such a bi-modal feature could have caused the instability in the estimation of the second cut-point. The estimated second cut-point based on weighted kappa or correct classification rate is in a close proximity of our finding from the case, where only one cut-point is considered for separating no/minor depression from major depression. We also notice that by all methods, the estimated second cut-point is quite close to the estimated first cut-point. Such a result is consistent with the low prevalence rate of mild depression in diabetic subjects, around 4.8%, suggested by the MINI scale of this dataset. The relatively small sample size for the mild depression category, 69, may explain the observed overlaps between the confidence intervals for the first cut-point and those for the second cut-point.

Based on a national survey, Zung (1973) presented the following guidelines: patients with no depression yields score indices of between 25 to 49, patients with global ratings of mild to moderate depressions have indices of 50 to 59 and those with moderate to severe depressions have indices of 60 and over (Zung, 1973). Such rules implicate the two cut-points in the continuous Zung scale, 49 and 59. These values are roughly in the range with our estimated cut-point values. Using the dataset from the Diabetes and Depression study, the weighted kappa, Kendall's τb and correct classification rate resulted from applying such empirical cut-points, d = (49, 59), are 0.61, 0.65, and 0.76 respectively. Adopting the proposed cut-point estimator from maximizing weighted kappa, d^κ = (55.7, 57), these agreement or association statistics are higher, equal to 0.65, 0.66, and 0.84 respectively. When comparing d^κ with the current empirical cut-points, we find that our results do not indicate a significant modification to the Zung cut-point for differentiating minor depression from major depression among diabetic subjects. However our analysis may suggest a shifting-up of the first cut-point that separates no depression from mild depression. This probably reflects a reasonable adjustment in diagnosing minor depression in subjects with chronic disease, as some mild self-reported depressive symptoms may due to medical illness. Such a finding may contribute a refinement of existing diagnosis rules for mental health disorders in medically unhealthy populations.

5. Discussion

In this work, we propose a general and rigorous framework for determining cut-points in a continuous scale according to an ordinal outcome. We formulate our approach as a problem of maximizing a predetermined criterion function of cut-points. Our general specification of criterion functions can accommodate many common agreement or association measures. It also allows for considerations about the relative costs of different types of classification errors and the chance of making these different errors, for example, via formulating the criterion function based on weighted correct classification rate.

Our theoretical studies uncover nonstandard asymptotics attached to the proposed method. Similar nonstandard results also arise in other problem settings, such as classification tree. We adapt the inference procedures accordingly, identifying subsampling as a valid technique for obtaining variance estimation and confidence intervals. Our simulation studies demonstrate appreciable finite sample performance of the proposed method. Via an application to a real mental health dataset, we demonstrate the high potential of the proposed method to become a sensible, flexible, and well justified analytic tool for categorizing a continuous scale in practice.

Acknowledgments

This research project was supported by grants from National Institute of Health (R01MH079448 and R01HL113548). We thank Dr. Musselman for discussions related to Diabetes and Depression study.

Appendix. Regularity conditions and proofs of Theorems 1–2

A.1. Proof of Theorem 1

Applying Taylor expansion, we have

ϑ{Pn(d)}ϑ{P(d)}=ϑ(1){Pˇ(d)}{Pn(d)P(d)} (A.1)

where Pˇ(d) is some vector on the between Pn(d) and P(d). By the definition of Pn(d) and the Glivenko-Cantelli theorem (van der Vaart and Wellner, 1996), we have supdϴPn(d)P(d)P0. Under condition C2, supdϴϑ(1)(Pˇ(d)) is bounded as n is large enough, and thus we have

supdϴϑ{Pn(d)}ϑ{P(d)}P0. (A.2)

By the Argmax continuous mapping theorem (Theorem 3.2.2 of van der Vaart and Wellner (1996)), (A.2) and condition C1 indicate that d^Pd0. That is, d^ is a consistent estimator of d0.

A.2. Proof of Theorem 2

For a vector d, let d(k) denote its kth component. Define md(x,y)=ϑ(P(d)}+ϑ(1){P(d)}(I(x<d(1),y=1),,I(x<d(1),y=L),I(d(1)x<d(2),y=1),,I(d(1)x<d(2),y=L),,I(x>d(L1),y=1),,I(x>d(L1),y=L1))Tϑ(1){P(d)}P(d). Note that, although the function md() is not Lipschitz in the parameter d, according to condition C1, E(md(X,Y)} , which equals ϑ{P(d)}, is twice differentiable at the unique maximizer d0 with nonsingular second derivative matrix V(d0).

Let d=argmax[ϑ{P(d)}+ϑ(1){P(d)}{Pn(d)P(d)}]. By the definition of Pn and nd(), it is equivalent to define d as d=argmaxdϴi=1nmd(Xi,Yi). Therefore, we can view d as a standard M-estimator while treating P(·) as a known function. Our basic idea to prove Theorem 2 is to first study the convergence rate and the asymptotic distribution of d and then establish the large sample properties of d^ based on the connection between d and d^.

For a given δ > 0, define a function class Mδ={mdmd0:dd0<δ}. It can be shown that the function class Mδ is a VC (van der Vaart and Wellner, 1996) with the envelope function, which under conditions C1–C3, takes the form,

Vδ(x,y)C1δ+C2[I(xA1,δ,y=1)++I(xA1,δ,y=L)+{I(xA1,δ,y=1)+I(xA2,δ,y=1)}+{I(xA1,δ,y=L)+I(xA2,δ,y=L)}++I(xAL1,δ,y=1)++I(xAL1,δ,y=L1)],

where C1 and C2 are positive constants, and Ak,δ=[d0(k)δ,d0(k)+δ],k=1,,L1. As δ approaches zero, by condition C3, E{Vδ(X,Y)2} is bounded above by a constant times δ. Thus, the conditions of Theorem 3.2.10 of van der Vaart and Wellner (1996) are satisfied with ϕ(δ)=C3δ for a constant C3. This leads to a rate of convergence n1/3 for dd0. Moreover, Theorem 3.2.10 of van der Vaart and Wellner (1996) implies that n13(dd0) converges in distribution to the maximizer fo the process M(h)G(h)+12hTV(d0)h, where G(h) is is a zero-mean Gaussian process, which has continuous sample paths and satisfies

E[{G(g)G(h)}2]=k=1L21{ak(d0)}2bk(d0,g,h). (A.3)

Here ak(d0) is the kth component of ϑ(1)(d0),b(s1)L+t(d0,g,h)=fXY=t(d0(s))g(s)h(s) when s = 1 and t ∈ {1,..., L}, b(s1)L+t(d0,g,h)=fXY=t(d0(s1))g(s1)h(s1)+fXY=t(d0(s))g(s)h(s), when s ∈ {2,..., L – 1} and t ∈ {1,..., L}, and b(s1)L+t(d0,g,h)=fXY=t(d0(s))g(s)h(s) when s = L and t ∈ {1,..., L – 1}.

Next, we shall complete the proof by showing the asymptotic equivalence between n13(d^d0) and n13(dd0). Define

Mn(h)=ϑ{P(d0+n13h)}+ϑ(1){P(d0+n13h)}{Pn(d0+n13h)P(d0+n13h)}.

Let hn=n13(dd0). By the definition of d,hn is the maximizer of the stochastic process Mn(h)Mn(0). The fact that md satisfies the conditions of Theorem 3.2.10 of van der Vaart and Wellner (1996) implies that n23{Mn(h)Mn(0)} is asymptotically tight in (h:h<K) and weakly converges to the Gaussian process M(h) for every K.

Define M~n(h)=ϑ{Pn(d0+n13h)}. Let h^n=n13(d^d0), ϵn=n23[{M~n(h^n)M~n(0)}{Mn(h^n)Mn(0)}], and ϵn=n23[{M~n(hn)M~n(0)}{Mn(hn)Mn(0)}]. Applying Taylor's expansion to ϑ{P^n(d0+n13h)} around P(d0+n13h), we get

M~n(h)=Mn(h)+12{Pn(d0+n13h)P(d0+n13h)}ϑ(2)(P){Pn(d0+n13h)P(d0+n13h)},

where ϑ(2)(P)=2ϑ(P)PP, and P† is between Pn(d0+n13h) and P(d0+n13h). By the Donsker theorem (van der Vaart and Wellner, 1996), we have supdϴn13Pn(d)P(d)p0. Therefore, under condition C1(ii), suphDϴn23{M~n(h)Mn(h)}=oP(1) where Dϴ={h:d0+n13hϴ}. This immediately implies that ϵn=oP(1) and ϵn=oP(1). Therefore,

oP(1)=ϵnϵn=n23[Mn(h^n)Mn(hn)+ϑ{Pn(d)}ϑ{Pn(d^)}].

This implies n23{Mn(h^n)Mn(0)}n23{Mn(hn)Mn(0)}oP(1) because ϑ{Pn(d)}ϑ{Pn(d^)}0 and Mn(h^n)Mn(hn)0. It then follows from Argmax continuous mapping theorem (van der Vaart and Wellner, 1996) that h^n, like hn, converges in distribution to the unique maximizer of the stochastic process M(h). This completes the proof of Theorem 2.

A.3. A smooth bootstrap inference procedure

A smooth bootstrap inference procedure adapted from Léger and MacGibbon's (2006) work is described as follows.

  • Step 1: Let X~l,k denote the kth observation of X given Y = l (j = 1, . . . , nl). Estimate the conditional density functions, fX|Y=l (l = 1,2,..., L), by the following smooth and symmetric kernel based estimators,
    f^XY=l(x)=12nlhl[k=1nlK(xX~l,khl)+k=1nlK(x+X~l,k2θ^lhl)],
    where K(·) is the Gaussian kernel function, hl is a smoothing parameter standing for bandwidth, θ^l is the sample median of X given Y = l. We proceed bandwidth selection as follows:
    1. Following Sen et al.'s (2010) rule of thumb, we start with an initial bandwidth, h0=0.9An1n, with A=min{s,IQR1.34}, where s and IQR are the sample standard deviation and inter-quartile of {X~l,k,k=1,2,,nl}.
    2. We evaluate a sequence of candidate bandwidth values in the neighborhood of h0, say h0–0.1, h0–0.05, h0, h0+.05, h0 + 0.1, based on the integrated least square cross-validation criterion (Sheather, 2004),
      LSCV(h)={f^XY=l(x)}2dx2nlk=1nlf^XY=l(l,k)(X~l,k),
      where f^XY=l(l,k)() denotes the kernel density estimator constructed from the data given Y = l excluding X~l,k.
    3. We set the bandwidth as hl,opt = argminhl [LSCV(hl)].
  • Step 2: Set s = 1.

  • Step 3: Randomly select Y = l from {Y1, . . . , Yn}. Given Y = l, we resample X based on the estimated density function f^XY=l(x) using Acception–Rejection method. Repeat this procedure for n times to obtain a bootstrap sample of size n.

  • Step 4: Apply the proposed method to estimate d0 based on the bootstrap sample obtained from Step 3. Denote the resulting estimator by d^(s) and increase s by 1. Go back to Step 3 unless s > S.

  • Step 5: Compute the empirical variance of {d^(s)(j)}s=1S, which provides a variance estimate for d^(j).

  • Step 6: Compute d˜(j)=s=1Sd^(s)(j)S and then obtain the empirical 100(α/2)th and 100(1 – α/2)th percentile of {n13(d^(s)(j)d˜(j))}s=1S, denoted by να2(j), and ν1α2(j), respectively. The 100(1 – α)% (basic) confidence interval for d0(j) can be constructed by
    [d^(j)n13ν1α2(j),d^(j)n13να2(j)].
    The consistency of the above smooth bootstrap procedure follows from Theorem 1 and Corollary 1 of Léger and MacGibbon (2006).

References

  1. Abrevaya J, Huang J. On the bootstrap of the maximum score estimator. Econometrica. 2005;73:1175–1204. [Google Scholar]
  2. Agresti A. Categorical Data Analysis. Wiley; New York: 1990. [Google Scholar]
  3. Altman DG. Categorizing continuous variables. Br. J. Cancer. 1991;64:975. doi: 10.1038/bjc.1991.441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Altman DG, Lausen B, Sauerbrei W. Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J. Natl. Cancer Inst. 1994;86(11):829–835. doi: 10.1093/jnci/86.11.829. [DOI] [PubMed] [Google Scholar]
  5. Baughman AL, Bisgard KM, Lynn F, Meade BD. Mixture model analysis for establishing a diagnostic cut-off point for pertusis antibody levels. Stat. Med. 2006;25(17):2994–3010. doi: 10.1002/sim.2442. [DOI] [PubMed] [Google Scholar]
  6. Brownie C, Habicht JP. Selecting a screening cut-off point or diagnostic criterion for comparing prevalences of disease. Biometrics. 1984;40(3):675–684. [PubMed] [Google Scholar]
  7. Chernoff H. Estimation of the mode. Ann. Inst. Math. Stat. 1964;16:31–41. [Google Scholar]
  8. Cohen J. A coefficient of agreement for nominal scales. Educat. Psychol. Measure. 1960;20(1):37–46. [Google Scholar]
  9. Delgado M, Rodriguez-Poo J, Wolf M. Subsampling inference in cube-root asymptotics with an application to manski's maximum score estimator. Econom. Lett. 2001;73:241–250. [Google Scholar]
  10. James IR. Estimation of the mixing proportion in a mixture of two normal distributions from simple, rapid measurements. Biometrics. 1978;34:265–275. [PubMed] [Google Scholar]
  11. Kendall MG. A new measure of rank correlation. Biometrika. 1938;30(1–2):81–93. [Google Scholar]
  12. Kim J, Pollard D. Cube root asymptotics. Ann. Statist. 1990;18:191–210. [Google Scholar]
  13. Kosorok M. Introduction to Empirical Process and Semiparametric Inference. Springer; New York: 2008. [Google Scholar]
  14. Kraemer H. Assessment of 2 × 2 associations: Generalization of signal-detection methodology. Amer. Statist. 1988;42:37–49. [Google Scholar]
  15. Lee SMS, Pun MC. On m out of n bootstrapping for nonstandard M-estimation with nuisance parameters. J. Amer. Statist. Assoc. 2006;101:1185–1197. [Google Scholar]
  16. Léger C, MacGibbon B. On the bootstrap in cube root asymptotics. Canad. J. Statist. 2006;34:29–44. [Google Scholar]
  17. Mazumdar M, Glassman JR. Categorizing a prognostic variable: Review of methods, code for easy implementation and applications to decision-making about cancer treatments. Stat. Med. 2000;19:113–132. doi: 10.1002/(sici)1097-0258(20000115)19:1<113::aid-sim245>3.0.co;2-o. [DOI] [PubMed] [Google Scholar]
  18. Moussavi S, Chatterji S, Verdes E, Tandon A, Patel V, Ustun B. Depression, chronic disease, and decrements in health: results from the world health surveys. The Lancet. 2007;370:851–858. doi: 10.1016/S0140-6736(07)61415-9. [DOI] [PubMed] [Google Scholar]
  19. O'Brien SM. Cutpoint selection for categorizing a continuous predictor. Biometrics. 2004;60:504–509. doi: 10.1111/j.0006-341X.2004.00196.x. [DOI] [PubMed] [Google Scholar]
  20. Pepe M. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press Inc.; New York: 2003. [Google Scholar]
  21. Perkins N, Schisterman E. The inconsistency of optimal cut-points using two roc based criteria. Amer. J. Epidem. 2006;163:67–675. doi: 10.1093/aje/kwj063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Politis D, Romano J. Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 1994;22:2031–2050. [Google Scholar]
  23. Politis D, Romano J, Wolf M. Subsampling. Springer; New York: 1999. [Google Scholar]
  24. Schisterman E, Perkins N, Liu A, Bondell H. Optimal cut-point and its corresponding youden index to discriminate individuals using pooled blood samples. Epidemiology. 2005;16:73–81. doi: 10.1097/01.ede.0000147512.81966.ba. [DOI] [PubMed] [Google Scholar]
  25. Sen B, Banerjee M, Woodroofe M. Inconsistency of bootstrap: the Grenander estimator. Ann. Statist. 2010;38:1953–1977. [Google Scholar]
  26. Sen B, Xu G. Model based bootstrap methods for interval censored data. Comput. Statist. Data Anal. 2015;81:121–129. [Google Scholar]
  27. Sheather SJ. Density estimation. Statist. Sci. 2004;19:588–597. [Google Scholar]
  28. Sheehan D, Janavs J, Baker R, Harnett-Sheehan K, Knapp E, Sheehan M, Lecrubier Y, Weiller E, Hergueta T, Amorim P, Bonora LI, Lépine JP. M.i.n.i-mini international neuropsychiatric interview-english version 5.0.0-dsm-iv. J. Clinial Psychiartry. 1998;59:34–57. [Google Scholar]
  29. Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, Hergueta T, Baker R, Dunbar GC. The mini-international neuropsychiatric interview (m.i.n.i.): the development and validation of a structured diagnostic psychiatric interview for d sm-iv and icd-10. J. Clinial Psychiartry. 1998;59(S20):22–33. [PubMed] [Google Scholar]
  30. van der Vaart A, Wellner J. Weak Convergence and Empirical Processes: With Applications in Statistics. Springer-Verlag; New York: 1996. [Google Scholar]
  31. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32–35. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  32. Zung WWK. A self-rating depression scale. Archiv. Gen. Psychiartry. 1965;12:63–70. doi: 10.1001/archpsyc.1965.01720310065008. [DOI] [PubMed] [Google Scholar]
  33. Zung WWK. From art to science: The diagnosis and treatment of depression. Archiv. Gen. Psychiartry. 1973;29:328–337. doi: 10.1001/archpsyc.1973.04200030026004. [DOI] [PubMed] [Google Scholar]

RESOURCES