Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Jan 13.
Published in final edited form as: Statistics (Ber). 2010 Apr 1;44(2):145–153. doi: 10.1080/02331880902986984

Probabilities for separating sets of order statistics

D H Glueck a,*, A Karimpour-Fard b, J Mandel c,d,e, K E Muller f
PMCID: PMC3020799  NIHMSID: NIHMS184247  PMID: 21243084

Abstract

Consider a set of order statistics that arise from sorting samples from two different populations, each with their own, possibly different distribution functions. The probability that these order statistics fall in disjoint, ordered intervals and that of the smallest statistics, a certain number come from the first populations is given in terms of the two distribution functions. The result is applied to computing the joint probability of the number of rejections and the number of false rejections for the Benjamini–Hochberg false discovery rate procedure.

Keywords: Benjamini and Hochberg procedure, block matrix, permanent, multiple comparison

1. Introduction

Glueck et al. [1] gave explicit expressions for the probability that arbitrary subsets of order statistics fall in disjoint, ordered intervals on the set of real numbers. In this paper, we extend this work and consider two sets of real-valued, independent but not necessarily identically distributed random variables. We give expressions in terms of cumulative distribution functions for the probability that arbitrary subsets of order statistics fall in disjoint, ordered intervals and that of the smallest statistics, a certain number come from one set. We have been unable to find any previous papers on this topic. This problem is of interest in calculating probabilities for the Benjamini and Hochberg [2] multiple comparisons procedure.

2. A simple example

Consider the following simple example. Let X1, X2 ∈ [0, 1] be independent random variables. Denote by FX1(x1) and FX2(x2) the marginal cumulative distribution functions and by FX1,X2(x1, x2) the joint cumulative distribution function of X1 and X2. Assume that the cumulative distribution functions are continuous. Let Y1 = min{X1, X2} and let Y2 = max{X1, X2} be the order statistics. For i = 1, 2, write the marginal cumulative distribution function of Yi as FYi (yi) and the joint cumulative distribution function as FY1,Y2(y1, y2), for y1y2. This joint cumulative distribution function is also continuous [3, p. 10].

Choose numbers b1 < b2, b1, b2 ∈ (0, 1). We wish to find the probabilities

𝒜=Pr{(y1<b1)(y2>b2)}, (1)
β=Pr{(y1<b1)(y2>b2)(x1<b1)} (2)

and

γ=Pr{(y1<b1)(y2>b2)¬(x1<b1)}. (3)

and express them in terms of the distribution functions FX1 and FX2. First, we will find the probabilities directly. So,

β=Pr{(x1<b1)(x2>b2)}=FX1(b1)[1FX2(b2)] (4)

and

γ=Pr{(x1>b2)(x2<b1)}=[1FX1(b2)]FX2(b1). (5)

Equations (4) and (5) follow directly from the independence of the random variables and the definition of the cumulative distribution functions. Since

{(y1<b1)(y2>b2)}={(y1<b1)(y2>b2)(x1<b1)}  {(y1<b1)(y2>b2)¬(x1<b1)} (6)

and the union is disjoint, it follows that

𝒜=β+γ. (7)

For a problem with more than two order statistics, the number of cases one needs to consider and the number of possible combinations of statistics, subsets, and bounds makes a direct approach impractical. An algorithmic approach to obtaining γ and β will allow the generalization to an arbitrary number of order statistics.

Using the assumption that the distribution functions are continuous, simple set operations, and the definition of distribution function, we obtain that the probability of the union (6) is

𝒜=Pr{(y1<b1)¬(y2<b2)} (8)
=Pr{y1<b1}Pr{(y1<b1)(y2<b2)} (9)
=FY1(b1)FY1,Y2(b1,b2). (10)

The cumulative distributions of the order statistics can be written [4],

FY1(b1)=FX1(b1)FX2(b1)+FX1(b1)[1FX2(b1)]+[1FX1(b1)]FX2(b1) (11)
FY1,Y2(b1,b2)=FX1(b1)[FX2(b2)FX2(b1)]+[FX1(b2)FX1(b1)]FX2(b1)+FX1(b1)FX2(b1). (12)

Then, substituting Equations (11) and (12) into Equation (10), we can write 𝒜 in terms of the distribution functions of X1 and X2,

𝒜=FX1(b1)[1FX2(b1)]+[1FX1(b1)]FX2(b1)FX1(b1)[FX2(b2)FX2(b1)][FX1(b2)FX1(b1)]FX2(b1) (13)
=FX1(b1)[1FX2(b2)]+[1FX1(b2)]FX2(b1). (14)

We now interpret the terms in the sum in Equation (14). The term that includes FX1(b1) as a factor is the probability of an event in which x1 < b1 occurs, and the term that includes 1 − FX1(b2) as a factor is the probability of an event in which x1 > b2. Since b1 < b2, the two events are disjoint, and, consequently, Equation (7) follows again.

To summarize, we have expressed the probability in terms of the joint distribution of the order statistics, which was in turn written in terms of the distribution functions of the random variables. Finally, by recognizing terms that corresponded to a partition, we decomposed 𝒜 into a sum of β and γ, the two probabilities of interest.

3. General case

The logic used in this simple, two random variables example can be generalized to an arbitrary number of random variables. Consider a set of order statistics that arise from sorting samples from two different populations, each with their own, possibly different distribution function. We wish to find the probability that these order statistics fall in a given union of intervals and that of the smallest statistics, a certain number come from one population.

For this general case, we need to introduce some notation and definitions. Let Xi, i = 1,…m, be independent but not necessarily identically distributed real-valued random variables with values in the interval [0, 1] and continuous cumulative distribution functions FXi (xi). Partition the set {X1, X2,…,Xm} into two subsets,

S1={X1,X2,,Xn},S2={Xn+1,Xn+2,,Xm}. (15)

For example, one can consider measurements for males or females, or for two different populations of breast cancer, slow or fast growing. The order statistics Y1, Y2,…,Ym are random variables defined by sorting the values of Xi. Thus Y1Y2 ≤ … ≤ Ym. Denote the realizations of the order statistics by y1y2 ≤ … ≤ ym.

The arguments of the joint cumulative distribution function of order statistics are customarily written omitting redundant arguments; thus for 1 ≤ em let 1 ≤ n1 < n2 < ⋯ < nem, denote the indices of the order statistics of interest. The joint cumulative distribution function of the set {Yn1, Yn2,…,Yne}, which is a subset of the complete set of order statistics, is defined as

FYn1,,Yne(y1,,ye)=Pr({Yn1y1}{Yn2y2}{Yneye}). (16)

Suppose we are given sm disjoint intervals

(cq,dq),0=c1<d1<c2<<cs<ds=1, (17)

and integers

kq0,q=1skq=m, (18)

where k0 = 0 and kq is the number of order statistics that fall in the qth interval. Define wq,1=1+i=1q1ki, and wq,kq=i=1qki to be the subscripts of the largest and smallest order statistics, respectively, that fall in the qth interval. In the case when kq = 1, we have wq,1 = wq,kq. Using this notation, the event that exactly kq of the order statistics fall in the qth interval is

{c1<Yw1,1<<Yw1,k1<d1  cs<Yws,1<<Yws,ks<ds}, (19)

or in a more compact notation (21) given below. Now let B be another random event. The following theorem gives the probability of this event intersected with the event (19), in terms of the cumulative distribution functions of the order statistics relative to the event B. This distribution function is defined by

FYn1,,Yne;B(y1,,ye)=Pr({Yn1y1}{Yn2y2}{Yneye}B). (20)

Contrary to the usual convention, we do not require that the indices of the order statistics in the cumulative distribution function (20) are sorted, because that would result in a complication of the notation in the next theorem (additional renumbering of the arguments).

THEOREM 1 Denote the event

E=q=1s({cq<Ywq,1}{Ywq,kq<dq}). (21)

Then

Pr(EB)=FYw1,k1,Yw2,k2,,Yws,ks;B(d1,d2,,dq)q=1sFYw1,k1,Yw2,k2,,Yws,ks,Ywq,kq;B(d1,d2,,dq,cq)+r,t=1r<tsFYw1,k1,Yw2,k2,,Yws,ks,Ywr,1,Ywt,1;B(d1,d2,,dq,cr,ct)+(1)sFYw1,1,Yw1,k1,Yw2,1,Yw2,k2,,Yws,1,Yws,ks;B(c1,d1,c2,d2,,cs,ds). (22)

Proof By standard set operations,

E=q=1s{cq<Ywq,1}q=1s{Ywq,kq<dq} (23)

and

q=1s{cq<Ywq,1}=q=1s{Ywq,1cq}C=(q=1s{Ywq,1cq})C, (24)

where C denotes the complement. Therefore,

EB=(q=1s{Ywq,1cq})CF, (25)

where the event F is defined by

F=q=1s{Ywq,kq<dq}B. (26)

By the additivity of probability, it follows Equation from (25) that

Pr(EB)=Pr(F)Pr(q=1s{Ywq,1cq}F)=Pr(F)Pr(q=1sAq), (27)

where Aq = {Ywq,1cq} ∩ F. Using the additivity of probability again, we have

Pr(q=1sAq)=q=1sPr(Aq)r,t=1r<tsPr(ArAt)+ (28)
+(1)sPr(q=1sAq) (29)

Now putting Equations (26)(29) together and using the continuity of the cumulative distribution functions, we obtain

Pr(EB)=Pr(q=1s{Ywq,kqdq}B)FYw1,k1,Yw2,k2,,Yws,ks;B(d1,d2,,dq)r=1sPr({Ywr,1cr}q=1s{Ywq,kqdq}B)FYw1,k1,Yw2,k2,,Yws,ks,Ywr,1;B(d1,d2,,dq,cr)+r,t=1r<tsPr({Ywr,1ct}{Ywt,1ct}q=1s{Ywq,kqdq}B)FYw1,k1,Yw2,k2,,Yws,ks,Ywr,1,Ywt,1;B(c1d1,d2,,dq,cr,ct)+(1)sPr(q=1s{Ywq,1cq}q=1s{Ywq,kqdq}B)FYw1,1,Yw1,k1,Yw2,1,Yw2,k2,,Yws,1,Yws,ks;B(c1,d1,c2,d2,,cs,ds),

which concludes the proof.

From now on assume that B is the event that exactly j elements of S1 fall in the interval (0, y1), for a given jn. This event is shown in Table 1. Thus, to compute the probability of interest, it is enough evaluate the cumulative distribution functions relative to the event B of the order statistic, given by Equation (20). An efficient method for the computation of cumulative distribution functions of order statistics from two populations was proposed by Glueck et al. [5]. Here we need a slight generalization, involving the event B, which requires a different proof.

Table 1.

Numbers of order statistics from the sets S1 and S2 in the interval (0, d1) and outside the interval (0, d1), in the event B.

< d1 d1
S1 j nj n
S2 k1j (mn) − (k1j) (mn)
Total k1 mk1 m

THEOREM 2 Denote the index vector i = (i0, i1,…,ie+1) and the summation index set

={i:0=i0i1ieie+1=m,andiana for all 1ak}. (30)

Suppose that FXi (x) = F (x), for all 1 ≤ in, and FXi (x) = G (x), for all n + 1 ≤ im. Then the cumulative distribution function relative to the event B (20) is given by

FYn1,,Yne;B(y1,,ye)=iλa=1e+1n!(mn)!λa!(iaia1λa)!×[F(ya)F(ya1)]λa[G(ya)G(ya1)]iaia1λa, (31)

where y0 = 0, ye+1 = 1, and λ = (λ1, λ2,…,λe+1) ranges over all integer vectors such that λ1 = j and

λ1+λ2++λe+1=n,0λaiaia1. (32)

Proof Denote by Ai,λ the event that exactly iaia−1 of the random variables Xi fall in the interval (ya−1, ya), and exactly λa of those are elements of S1. When a = 1, (ya−1, ya) = (y0, y1) = (0, y1). If B occurs, λ1 = j. Then from the binomial theorem,

Pr(Ai,λ)=a=1e+1n!(mn)!λa!(iaia1λa)![F(ya)F(ya1)]λa[G(ya)G(ya1)]iaia1λa. (33)

Since the events Ai,λ for different (i, λ) are disjoint, the result follows.

The only difference between Theorem 2 and the result by Glueck et al. [1] is the added condition λ1 = j.

In the case of two random variables, we recover the same results as the direct method in Section 2. With m = 2, n = 1, s = 2, c1 = 0, d1 = b1, c2 = b2, d2 = 1, S1 = {X1}, S2 = {X1}, k1 = 1, k2 = 1, Yw1,1 = Yw1,k1 = Y1, Yw2,1 = Yw2,k1 = Y2, using Theorems 1 and 2 yields

(EB)=γ, (34)

when j = 0, and

(EB)=β, (35)

when j = 1.

In conclusion, for two sets of real-valued, independent but not necessarily identically distributed random variables, we have now given an expression for the probability that arbitrary subsets of order statistics fall in disjoint, ordered intervals and that of the smallest statistics, a certain number come from one set.

4. Concluding example

The methods of this paper can be used to calculating the joint probability of the number of rejections and the number of false rejection for the Benjamini–Hochberg [2] procedure. A rejection of a hypothesis for which the null holds is a false rejection. Given an false discovery rate α ∈ (0, 1), hypotheses Hi, i = 1,…,m, p-values Xi, and the corresponding order statistics for the p-values Yi = X(e) (the random variables Xi sorted in nondecreasing order X(1)X(2) ≤ ⋯ ≤ X(m)), the procedure produces a nondecreasing sequence of numbers bi = iα/m ∈ (0, 1), rejects the hypotheses H(e), e = 1,…, k1, such that k1 is the largest number for which yk1bk1, and accepts all others. For n ∈ {0, 1,…,m}, assume that the null holds for H1, H2,…, Hn and that the alternative holds for Hn+1, Hn+2,…,Hm. Let S1 = {X1, X2,…,Xn} be the set of p-values that correspond to the null hypotheses, and S2 = {Xn+1, Xn+2,…,Xm} be the set of p-values for which the alternative holds. Then j is the number of null hypotheses that are rejected, which is equal to the number of p-values corresponding to null hypotheses that fall in the interval [0, bk1].

Under the assumption that the p-values for which the alternative holds have the same distribution, one can use the methods of this paper to find the joint distribution of j and k1. For each value of k1 and m, Glueck et al. [6] pointed out that the rejection regions for the Benjamini and Hochberg (1995) procedure can be decomposed into disjoint sets of events. These events correspond to certain numbers of order statistics falling into sets of intervals, defined by the numbers bi. Details about the decomposition of the rejection regions into these events are given in Glueck et al. [6]. The general case is too complicated to detail here. However, as an example, we calculate the probabilities that with m = 2 hypotheses, and n = 1 null hypotheses, the Benjamini and Hochberg [2] procedure rejects k1 = 1 hypotheses and that j, the number of false rejections, is either 0 or 1.

Suppose we wish to test m = 2 hypotheses. Specifically, we wish to test hypotheses about the location of the sample mean. We plan to conduct a two-sided test. We assume that we have two large populations, with known variances (both σ2), and that the variables of interest, say ε1 and ε2, are normally distributed, so that ε1 ~ N1, σ2) and ε2 ~ N2, σ2). We wish to test two hypotheses H1: μ1 = μ0, and H2: μ2 = μ0, with the alternative hypothesis for both populations the same, so HA: μ = μA. We sample Ni random variables from each population, say εi1, εi2,…,εiNi. For convenience, we will assume that the random sample is of the same size for each hypothesis test, so N1 = N2 = N.

With

ε¯i=N1δ=1Nεiδ, (36)

the test statistics are given by

Zi=(σN)1(ε¯iμ0), (37)

and the two sided p-values are [7, p. 244]

Xi={2Φ(Zi)Zi02[1Φ(Zi)]Zi>0, (38)

where Φ is the cumulative distribution function of the standard normal (mean = 0 and variance = 1). Let ϕ be the probability density function of the standard normal.

Suppose that in truth, we have ε1 ~ N0, σ2), so that the null holds for H1, and ε2 ~ NA, σ2), so that alternative holds for H2. Define S1 = {X1}, and S2 = {X2}. Then the number of p-value for which the null holds, n = 1. For H1, the hypotheses for which the null holds, the p-value has a uniform distribution on the interval [0, 1], so for x1 ∈ [0, 1],

FX1(x1)=x1. (39)

For H2, the alternative holds. When we conduct the hypothesis test, we are unaware of the truth. We always calculate the p-value under the null. However, since the alternative actually holds,

Pr[Z2z2]=Pr[ε¯iμ0σ/Nz2]=Pr[εiμAσ/Nz2+μ0μAσ/N]=Φ[z2+μ0μAσ/N]. (40)

Finally,

FX2(x2)=Pr(X2<x2)=Pr({X2<x2}{Z20})+Pr({X2<x2}{Z2>0})=Pr({2Φ(Z2)<x2})+Pr({2[1Φ(Z2)]<x2})=Pr({Z2Φ1(x22)})+1Pr({Z2Φ1(1x22)})=Φ[Φ1(x2/2)+μ0μAσ/N]+1Φ[Φ1(1x2/2)+μ0μAσ/N], (41)

where the last step follows by substitution from Equation (40).

Now, as a specific example, we fix μ0 = 0, μA = 1, σ2 = 1 α = 0.05. We wish to calculate the probability that k1 = 1 and that j = 0 or j = 1. With c1 = 0, d1 = α/2, c2 = α, d2 = 1. This is the probability that of the two hypotheses, we reject exactly one, and it is H1, the hypothesis for which the null holds. When j = 0, the rejection we make is of the hypothesis for which the alternative holds, and when j = 1, the rejection we make is of the null hypothesis, a false rejection.

We calculated the probability using our methodology, and by a simulation using a sample of 100,000 variables. Recall that k1 is the number of order statistics that is less than b1, and j is the number in Set 1, and less than b1. The results are shown in Table 2.

Table 2.

Comparison of simulation and theory.

k1 j Theory Simulation Difference
1 0 0.472982 0.47388 0.000898
1 1 0.00978051 0.0095 0.00028051

Note: Recall that k1 is the number of hypotheses that were rejected, and j is the number of null hypotheses that were rejected. We had two hypotheses and one null hypothesis.

Notice that the simulation differs from the theory only in the fourth decimal place. The theory is exact. Software that implements this method in Mathematica is available from the authors upon request.

Acknowledgements

Glueck was supported by NCI K07CA88811. Mandel was supported by NSF-CMS 0325314. Muller was supported by NCI P01 CA47 982-04, NCI R01 CA095749-01A1 and NIAID 9P30 AI 50410. The authors thank Professor Gary Grunwald for his helpful comments.

Footnotes

AMS (2000) Subject Classification: Primary: 62E15, 65C60

References

  • 1.Glueck DH, Karimpour-Fard A, Mandel J, Muller KE. On the probability that order statistics fall in intervals. 2008 in review. [Google Scholar]
  • 2.Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 1995;57:289–300. [Google Scholar]
  • 3.David HA. Order Statistics. 2nd ed. New York: Wiley; 1981. [Google Scholar]
  • 4.Bapat RB, Beg MI. Order statistics for non-identically distributed variables and permanents. Sankhya Ser. A. 1989;51:79–93. [Google Scholar]
  • 5.Glueck DH, Karimpour-Fard A, Mandel J, Hunter L, Muller KE. Fast computation by block permanents of cumulative distribution functions of order statistics from several populations. Commun. Stat. Theory Methods. 2008 doi: 10.1080/03610920802001896. to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Glueck DH, Muller KE, Karimpour-Fard A, Hunter L. Expected power for the false discovery rate with independence. Commun. Stat. Theory Methods. 2008;37(12) doi: 10.1080/03610920801893731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Rosner B. Fundamentals of Biostatistics. 6th ed. New York: Brooks-Cole; 2006. [Google Scholar]
  • 8.Ross S. A First Course in Probability. 2nd ed. New York: Macmillan Publishing Company; 1984. [Google Scholar]

RESOURCES