Abstract
We review a finite-sampling exponential bound due to Serfling and discuss related exponential bounds for the hypergeometric distribution. We then discuss how such bounds motivate some new results for two-sample empirical processes. Our development complements recent results by Wei and Dudley (2012) concerning exponential bounds for two-sided Kolmogorov - Smirnov statistics by giving corresponding results for one-sided statistics with emphasis on “adjusted” inequalities of the type proved originally by Dvoretzky et al. (1956) and by Massart (1990) for one-sample versions of these statistics.
Keywords: Bennett inequality, finite sampling, Hoeffding inequality, hypergeometric distribution, two-samples, Kolmogorov-Smirnov statistics, exponential bounds
1. Introduction: Serfling’s finite sampling exponential bound
Suppose that {c1,…, cN} is a finite population with each ci ∈ ℝ. For n ≤ N, let Y1,…, Yn be a sample drawn from {c1,…, cN} without replacement; we can regard the finite population {c1,…, cN} as an urn containing N balls labeled with the numbers c1,…, cN. Some notation: we let
It is well-known (see e.g. Rice (2007), Theorem B, page 208) that satisfies E(Ȳn) = μN and
| (1) |
Serfling (1974), Corollary 1.1, shows that for all λ > 0
| (2) |
This inequality is an inequality of the type proved by Hoeffding (1963) for sampling with replacement and more generally for sums of independent bounded random variables. Comparing (1) and (2), it seems reasonable to ask whether the factor in (2) can be improved to fn ≡ (n − 1)/(N − 1)? Indeed Serfling ends his paper (on page 47) with the remark: “(it is) also of interest to obtain (2) with the usual sampling fraction instead of ”. Note that when n = N, Ȳn = μN, and hence the probability in (2) is 0 for all λ > 0, and the conjectured improvement of Serfling’s bound agrees with this while Serfling’s bound itself is positive when n = N.
Despite related results due to Kemperman (1973a,b,c), it seems that a definitive answer to this question is not yet known.
A special case of considerable importance is the case when the numbers on the balls in the urn are all 1’s and 0’s: suppose that c1 = ··· = cD = 1, while cD+1,…, cN = 0. Then is well-known to have a Hypergeometric(n, D, N) distribution given by
In this special case μN = D/N, , while bN = 1 and aN = 0. Thus Serfling’s inequality (2) becomes
and the conjectured improvement is
Despite related results due to Chvátal (1979) and Hush and Scovel (2005) it seems that a bound of the form in the last display remains unknown.
We should note that an exponential bound of the Bennett type for the tails of the hypergeometric distribution does follow from results of Vatutin and Mikha ilov (1982) and Ehm (1991); see also Pitman (1997).
Theorem 1
(Ehm, 1991) If 1 ≤ n ≤ D ∧ (N − D), then where Xi ~ Bernoulli(πi), with πi ∈ (0, 1), are independent.
It follows from Theorem 1 that
Furthermore, by applying Theorem 1 together with Bennett’s inequality (Bennett (1962); see also Shorack and Wellner (1986), page 851), we obtain the following exponential bound for the tail of the hypergeometric distribution:
Corollary 1
If 1 ≤ n ≤ D ∧ (N − D), then for all λ > 0
where μN ≡ D/N, , 1 − fn ≡ 1 − (n − 1)/(N − 1) is the finite-sampling correction factor, and ψ (y) ≡ 2y−2h(1 + y) where h(y) ≡ y(log y − 1) + 1.
Since , the inequality of the corollary yields a further bound which is quite close to the conjectured Hoeffding type improvement of Serfling’s bound, and which now has the desired finite-sampling correction factor 1 − fn:
Corollary 2
By considerations related to the work of Talagrand (1994) and León and Perron (2003), the authors of the present paper have succeeded in proving the following exponential bound.
Theorem 2
(Greene and Wellner (2015); Greene (2016)) Suppose that . Define μN = D/N and suppose N > 4 and 2 ≤ n < D ≤ N/2. Then for all we have
The proof of this bound, along with a complete analogue for the hypergeometric distribution of a bound of Talagrand (1994) for the binomial distribution, appears in Greene and Wellner (2015) and in the forthcoming Ph.D. thesis of the first author, Greene (2016).
The bound given in Theorem 2 involves a still better finite-sampling correction factor, namely 1 − f̄n = 1 − n/N, which has also appeared in Lo (1986) in the context of a Bayesian analysis of finite sampling. Note that as N → ∞, the above bound yields
a bound which improves slightly on the bound given by León and Perron (2003) in the case of sums of i.i.d. Bernoulli random variables.
Before leaving this section we begin to make a connection to finite-sampling empirical distributions: Now let and . Then it is easily seen that Serfling’s bound yields
for each fixed λ > 0 and t ∈ ℝ. Note that since 𝔽n(t) is equal in distribution to the sample mean of n draws without replacement from an urn containing N FN(t) 1’s and N (1 − FN(t)) 0’s, the bound in the last display only involves the hypergeometric special case of Serfling’s inequality. This leads to the following conjecture concerning bounds for the finite sampling empirical process { }:
Conjecture
There exist constants C, D > 0 (possibly C = 1 and D = 2?) such that
| (3) |
| (4) |
for all λ > 0. The possibility that D = 2 is suggested by the corresponding inequality established by Massart (1990) in the case of sampling with replacement.
With these strong indications of the plausibility of an improvement of Serfling’s bound and corresponding improvements in exponential bounds for the uniform-norm deviations of the finite-sampling empirical process, we can now turn to an application of the basic idea in the context of two-sample Kolmogorov-Smirnov statistics.
2. Two-sample tests and finite-sampling connections
To connect this with the two-sample Kolmogorov-Smirnov statistics, suppose that X1, …, Xm are i.i.d. F and Y1, …, Yn are i.i.d. G. Let N = m+n. Then for testing Hc : F = G with F continuous versus K+ : F ≥ G (F ≺s G), K− : G ≥ F, (G ≺s F), or K : F ≠ G, the classical K-S test statistics are
respectively. It is well-known that under Hc we have
if m ∧ n → ∞ where 𝕌 is a standard Brownian bridge process on [0, 1]; see e.g. Hájek and Šidák (1967), pages 189–190, Hodges (1958), and van der Vaart and Wellner (1996), pages 360–366.
Note that with λN ≡ m/N and
where Z(1) ≤ ··· ≤ Z(N) are the order statistics of the pooled sample, we have
and hence, with λ̄N = 1 − λN,
Thus, using the independence of the ranks R and the order statistics Z (both based on the pooled sample),
and it would follow from (3) that
| (5) |
for all t > 0. Similarly it would also follow from (3) that
for all t > 0. Combining the two one-sided inequalities yields a (conjectured) two-sided inequality:
In the next section we will prove that bounds of this type with C = 1 and D = 2 hold in the special case m = n. For some results for the two-side two-sample Kolmogorov-Smirnov statistic in the case m = n and computational results for m ≠ n, see Wei and Dudley (2012). These authors were aiming for a bound of the form C exp(−2t2) both for m = n and m ≠ n. The above heuristics seem to suggest that a bound of the form C exp(−2((N −1)/N)t2) might be a natural goal.
3. An exponential bound for when m = n
Throughout this section we suppose that the null hypothesis Hc holds: G = F is a continuous distribution function.
From Hodges (1958), (2.3) on page 473 (together with and d = a/n from page 473, line 4), when m = n (so N = 2n),
We first compare the exact probability from the last display with the possible upper bounds
For n = 3 we find that
| a | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| E(xact) | 1 | .75 | 0.3 | 0.05 |
|
| ||||
| PB2 | 1 | 0.7574 | 0.3291 | 0.0821 |
| PB2 - E | 0 | 0.0074 | 0.0291 | 0.0321 |
|
| ||||
| PB3 | 1 | 0.7165 | 0.2636 | 0.0498 |
| PB3 - E | 0 | −0.0335 | −0.0364 | −0.0002 |
Further comparisons for m = n = 10, 12, 13, 14, 15, 25 support the validity of the bound involving the finite sampling fraction fn. These comparisons agree with the following theorem:
Theorem 3
A. When m = n (so that N = 2n) the second bound in (5) holds for all n ≥ 1 with C = 1:
| (6) |
| (7) |
Equivalently, when m = n,
| (8) |
for all t > 0.
B. On the other hand, when m = n (so that N = 2n), for all n ≥ 1 we have
Proof
A. Since the inequality holds trivially for a = 0, and can be shown easily by numerical computation for a ∈ {1, 2, 3} (see the Table above), it suffices to show that
for a ∈ {1,…, n} and n ≥ 4. Furthermore, we will show that it holds for a = n in a separate argument, and thus it suffices to show that it holds for a ∈ {1,…, n − 1} and n ≥ 4. By rewriting the numerator and denominator on the left side of the last display, the desired inequality can be rewritten as
By taking logarithms we can rewrite this as
| (9) |
Now by Stirling’s formula with bounds (see e.g. Nanjundiah (1959)) we have
| (10) |
Using these bounds in (9) we find that the left side is bounded above by
Note that I1 and I2 are as defined in Wei and Dudley (2012) page 640, while I3 and I4 differ. From Wei and Dudley (2012) page 640,
| (11) |
(which is proved by Taylor expansion of (1 + x) log(1 + x) + (1 − x) log(1 − x) about x = 0), and
| (12) |
Note that the lead term in the bound (11) for I1 and lead term of I4 cancel each other, while the first term of the bound (12) for I2 cancels the second term of I4. Adding the bounds yields
Now R12 ≤ 0 for n ≥ 4 and I3 ≤ 0 for all n ≥ 2 and a ∈ {1, …, n − 1} by the following argument:
for n ≥ 4. This is a decreasing function of a for fixed n, and hence to show that it is < 0 it suffices to check it for a = 1. But when a = 1 the right side above equals
so we conclude that I3 < 0 for a ∈ {1, …, n − 1} and n ≥ 4. It remains only to show that the desired bound holds for a = n; that is we have
But this can easily be shown via the Stirling formula bounds (10).
Thus
and the claimed inequality holds for all n ≥ 4. Since the bounds hold for n = 1, 2, 3 by direct numerical computation, the claim follows.
B. We first define
Since we can take , it suffices to show that rn(a) > 0 for . We will first show this for n ≥ 31. Then the proof will be completed by checking the inequality numerically for and n ∈ {1, …, 30}.
By using the Stirling formula bounds of (10) as in the proof of A, but now with upper bounds replaced by lower bounds, we find that
As in (11) and (12) and the displays following them, we find that
Putting these pieces together and rearranging we find that
| (13) |
| (14) |
will prove the claim. Note in (13) that the a2/n term cancelled by virtue of the lower bound estimate based on the Taylor expansion of (1 + x) log(1 + x) + (1 − x) log(1 − x). First note that
The denominator of the right-hand-side is clearly positive for . By inspection, we can see the term a2n3 + 16a2 − 16n2 in the numerator is increasing in a. Picking a = 1, we then see n3 + 16 − 16n2 > 0 for n ≥ 31, and thus a2n3 + 16a2 − 16n2 > 0 for all admissible a. Next, the polynomial 28n3 − 45a2n is decreasing in the admissible a. For any fixed n, the minimum value it can attain is then larger than 28n3 − 90n2. For n ≥ 31, this quantity is positive. Therefore, 28n3 − 45a2n > 0 for all admissible a when n ≥ 31. Finally, note that 16n3 − 480n2 = 16n2(n − 30) > 0 for n ≥ 31. Hence we have shown K2 > 0.
We next have
| (15) |
Again since , it is clear that α, β, and γ in (15) are positive for all admissible choices of a. Hence, the sign of each bracketed term will be dictated by the remaining polynomial in a. It is also clear from their form that each polynomial is decreasing in a; hence we need only evaluate at the endpoints to determine positivity. But , and with the final inequality following as n ≥ 31. Hence all terms in (15) are positive and so K1 > 0. Together with K2 > 0 as proved above, the claim is proved for n ≥ 31.
Since the bound holds for and n ∈ {1, …, 30} by direct numerical computation, the claim follows.
4. Some comparisons and connections
4.1. Comparisons: two-sided tail bounds
Here we compare and contrast our results with those of Wei and Dudley (2012). As in Wei and Dudley (2012) (see also Wei and Dudley (2011)), we say that the DKW inequality holds for given m, n and C if
and we say that the DKWM inequality holds for given m, n if the inequality in the last display holds with C = 2. Wei and Dudley (2012) prove the following theorem:
Theorem 4
(Wei and Dudley, 2012) For m = n in the two sample case:
The DKW inequality always holds with C = e≐2.71828.
For m = n ≥ 4, the smallest n such that Hc can be rejected at level 0.05, the DKW inequality holds with C = 2.16863.
The DKWM inequality holds for all m = n ≥ 458.
For each m = n < 458, the DKWM inequality fails for some t of the form .
- For each m = n < 458, the DKW inequality holds for C = 2(1 + δn) for some δn > 0 where, for 12 ≤ n ≤ 457,
For comparison, the following theorem follows from Theorem 3. We say that the modified DKWM inequality holds for given m, n if
Theorem 5
For m = n in the two sample case:
For all n ≥ 1 the modified DKWM inequality holds.
-
Alternatively, for the modified Kolmogorov statistic given by
the DKWM inequality holds for all n ≥ 1.
We are not claiming that our “modified” version of the DKWM inequality improves on the results of Wei and Dudley (2012): it is clearly worse for m = n > 458. On the other hand, it may provide a useful clue to the formulation of DKWM type exponential bounds for two-sample Kolmogorov statistics when m ≠ n. In this direction we have the following conjecture:
Conjecture
For any m ≠ n,
| (16) |
| (17) |
That is, we conjecture that the modified DKWM inequality holds for all m, n ≥ 1. This is supported by all the numerical experiments we have conducted so far.
4.2. Comparisons: one-sided tail bounds
Wei and Dudley (2012) do not treat bounds for the one-sided statistics. Here we summarize our results with a theorem which parallels their Theorem 4 above. In analogy with their terminology, we say that the one-sided DKW inequality holds for given m, n and C if
and we say that the one-sided DKWM inequality holds for given m, n if the inequality in the last display holds with C = 1. Moreover, we say that the modified one-sided DKWM inequality holds for given m, n if
Theorem 6
For m = n in the two sample case:
The one-sided DKW inequality holds for all n ≥ 1 with C = e/2≐2.71828/2 = 1.35914. For this range of n, C = e/2 is sharp since equality occurs for n = 1 and (or ).
For m = n ≥ 5, the one-sided DKW inequality holds with C = 2.16863/2 = 1.084315.
The one-sided DKWM inequality fails for all m = n ≥ 1.
The modified one-sided DKWM inequality holds for all m = n ≥ 1.
Proof
(c) follows from Theorem 3-B. (d) follows from Theorem 3-A. It remains only to prove (a) and (b).
To prove (a), we first note that Wei and Dudley (2012) showed that for n ≥ 108 we have
Thus to prove that the claimed inequality holds for n ≥ 108, it suffices to show that it holds for where is the smallest value of t for which the bound is less than or equal to 1.
Proceeding as in the proof of Theorem 3-A, we find that we want to show that
By the same arguments used in the proof of Theorem 3-A, we find that the left side in the last display is bounded above by
Now K1 ≤ 0 for n ≥ 4 and a ∈ {1, …, n − 1} by the previous proof, and
if
This completes the proof for n ≥ 108. Numerical computation easily shows that the claim holds for all n ∈ {1, …, 107}.
The proof of (b) is similar upon replacing e/2 by 1.084315, and again computing numerically for n ∈ {1, …, 107}.
Corollary 3
For n ≥ 5 and C = 1.084315,
Figures 1 and 2 illustrate Theorem 6.
Figure 1.
Difference between approximations and exact one-sided probabilities for n = 128 and a ∈ {1, 2, …, 128}. Negative values indicate the exact probability exceeds the approximation. Serfling DKWM is the bound obtained via the heuristic of section 2, using the sampling fraction . Modified DKWM uses the sampling fraction 1 − fn = (N − n)/(N − 1). DKWM uses the fraction from Wei and Dudley.
Figure 2.
Difference between approximations and exact one-sided probabilities for n = 23 and a ∈ {1, 2, …, 23}. Negative values indicate the exact probability exceeds the approximation. DKWM6a corresponds to the DKWM bound with the constant e/2, discussed in Theorem 6(a). DKWM6b corresponds to the DKWM bound with the constant 2.16863/2, discussed in Theorem 6(b).
Acknowledgments
The second author owes thanks to Werner Ehm for several helpful conversations and to Martin Wells for pointing out the Pitman reference. We also owe thanks to the referee for a number of helpful comments and suggestions.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Contributor Information
Evan Greene, Department of Statistics, University of Washington, Seattle, WA 98195-4322.
Jon A. Wellner, Department of Statistics, University of Washington, Seattle, WA 98195-4322
References
- Bennett G. Probability inequalities for the sum of independent random variables. J Amer Statist Assoc. 1962;57:33–45. [Google Scholar]
- Chvátal V. The tail of the hypergeometric distribution. Discrete Math. 1979;25(3):285–287. [Google Scholar]
- Dvoretzky A, Kiefer J, Wolfowitz J. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann Math Statist. 1956;27:642–669. [Google Scholar]
- Ehm W. Binomial approximation to the Poisson binomial distribution. Statist Probab Lett. 1991;11(1):7–16. [Google Scholar]
- Greene E. PhD thesis. University of Washington; 2016. Finite sampling exponential bounds with applications to empirical processes. [Google Scholar]
- Greene E, Wellner JA. Exponential bounds for the hypergeometric distribution. Tech Rep. 2015 doi: 10.3150/15-BEJ800. arXiv:1507.08298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hájek J, Šidák Z. Theory of rank tests. Academic Press; New York: 1967. [Google Scholar]
- Hodges JL., Jr The significance probability of the Smirnov two-sample test. Ark Mat. 1958;3:469–486. [Google Scholar]
- Hoeffding W. Probability inequalities for sums of bounded random variables. J Amer Statist Assoc. 1963;58:13–30. [Google Scholar]
- Hush D, Scovel C. Concentration of the hypergeometric distribution. Statist Probab Lett. 2005;75(2):127–132. [Google Scholar]
- Kemperman JHB. Moment problems for sampling without replacement. I Nederl Akad Wetensch. Proc Ser A 76=Indag Math. 1973a;35:149–164. [Google Scholar]
- Kemperman JHB. Moment problems for sampling without replacement. II Nederl Akad Wetensch. Proc Ser A 76=Indag Math. 1973b;35:165–180. [Google Scholar]
- Kemperman JHB. Moment problems for sampling without replacement. III Nederl Akad Wetensch. Proc Ser A 76=Indag Math. 1973c;35:181–188. [Google Scholar]
- León CA, Perron F. Extremal properties of sums of Bernoulli random variables. Statist Probab Lett. 2003;62(4):345–354. [Google Scholar]
- Lo AY. Bayesian statistical inference for sampling a finite population. Ann Statist. 1986;14(3):1226–1233. [Google Scholar]
- Massart P. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann Probab. 1990;18(3):1269–1283. [Google Scholar]
- Nanjundiah TS. Note on Stirling’s formula. Amer Math Monthly. 1959;66:701–703. [Google Scholar]
- Pitman J. Probabilistic bounds on the coefficients of polynomials with only real zeros. J Combin Theory Ser A. 1997;77(2):279–303. [Google Scholar]
- Rice JA. Mathematical Statistics and Data Analysis. 3. Duxbury Press; Belmont, CA: 2007. [Google Scholar]
- Serfling RJ. Probability inequalities for the sum in sampling without replacement. Ann Statist. 1974;2:39–48. [Google Scholar]
- Shorack GR, Wellner JA. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons Inc; New York: 1986. Empirical processes with applications to statistics. [Google Scholar]
- Talagrand M. Sharper bounds for Gaussian and empirical processes. Ann Probab. 1994;22(1):28–76. [Google Scholar]
- van der Vaart AW, Wellner JA. Springer Series in Statistics. Springer-Verlag; New York: 1996. Weak convergence and empirical processes. [Google Scholar]
- Vatutin VA, Mikhaĭlov VG. Limit theorems for the number of empty cells in an equiprobable scheme for the distribution of particles by groups. Teor Veroyatnost i Primenen. 1982;27(4):684–692. [Google Scholar]
- Wei F, Dudley RM. Tech rep. MIT, Department of Mathematics; 2011. Dvoretzky-Kiefer-Wolfowitz inequalities for the two-sample case. [Google Scholar]
- Wei F, Dudley RM. Two-sample Dvoretzky-Kiefer-Wolfowitz inequalities. Statist Probab Lett. 2012;82(3):636–644. [Google Scholar]


