Abstract
Consider distributional limit of the Pearson chi-square statistic when the number of classes mn increases with the sample size n and . Under mild moment conditions, the limit is Gaussian for λ = ∞, Poisson for finite λ > 0, and degenerate for λ = 0.
Keywords: Pearson chi-square statistic, central limit theorem, Poisson limit theorem, weak convergence
1. Preliminaries
The Pearson chi-square statistic is probably one of the best-known and most important objects of statistical science and has played a major role in statistical applications ever since its first appearance in Karl Pearson’s work on “randomness testing” (Pearson, 1900). The standard test for goodness-of-fit with the Pearson chi-square statistic tacitly assumes that the support of the discrete distribution of interest is fixed (whether finite or not) and unaffected by the sampling process. However, this assumption may be unrealistic for modern ’big-data’ problems which involve complex, adaptive data acquisition processes (see, e.g., Grotzinger et al. 2014 for an example in astrobiology). In many such cases the associated statistical testing problems may be more accurately described in terms of triangular arrays of discrete distributions whose finite supports are dependent upon the collected samples and increase with the samples’ size (Pietrzak et al., 2016). Motivated by ’big-data’ applications, in this note we establish some asymptotic results for the Pearson chi-square statistic for triangular arrays of discrete random variables for which their number of classes mn grows with the sample size n. Specifically, let Xn,k, k = 1, . . . , n, be iid random variables having the same distribution as Xn, where
Recall that the standard Pearson chi-square statistic is defined as
(1) |
where the empirical frequencies p̂n(i) are
As stated above, in what follows we will be interested in the double asymptotic analysis of the weak limit of , that is, the case when mn → ∞ as n → ∞.
Observe that given in (1) can be decomposed into a sum of two uncorrelated components as follows
(2) |
where
(3) |
and
(4) |
The second equality above introduces notational convention we use throughout. Note that for fixed n the statistic S n is simply a sum of iid random variables and Un is an unnormalized U-statistic (see, e.g., Korolyuk and Borovskich, 2013). It is routine to check that
and consequently
Moreover, since we also have ℂov(Un, S n) = 0, it follows that
When mn = m is a constant then the classical result (see, e.g., Shao, 2003, chapter 6) implies that the statistic asymptotically follows the χ2-distribution with (m−1) degrees of freedom. Consequently, when m is large the standardized statistic may be approximated by the standard normal distribution. However, in the case when mn → ∞ as n → ∞ the matters appear to be more subtle and the above normal approximation may or may not be valid depending upon the asymptotic relation of mn and n, as described below. Since S n is a sum of iid random variables, the case when S n contributes to the limit of normalized may be largely handled with the standard theory for arrays of iid variables. Consequently, we focus here on a seemingly more interesting case when the asymptotic influence of Un dominates over that of S n. Specifically, throughout the paper we assume that as n, mn → ∞
(C) |
Note that (C) implies in probability and, in particular, is trivially satisfied when Xn is a uniform random variable on the integer lattice 1, . . . , mn, that is, when for i = 1 . . . , mn. Under condition (C) we get a rather complete picture of the limiting behavior of . Our main results are presented in Section 2 where we discuss the Poissonian and Gaussian asymptotics. Some examples, relations to asymptotics known in the literature and further discussions are provided in Section 3. The basic tools used in our derivations are listed in the appendix. In what follows limits are taken as n → ∞ with mn → ∞ and stands for convergence in distribution.
2. Poissonian and Gaussian asymptotics
We start with the case when a naive normal approximation for the standardized statistic fails. Indeed, as it turns out, when mn is asymptotically of order n2, we have the following Poisson limit theorem for .
Theorem 2.1
Assume that the condition (C) holds, as well as
(5) |
Then
(6) |
Proof
Due to (C) it suffices to consider the asymptotics of Un alone. We write
(7) |
where An,1 = 0 and for k = 2, . . . , n
(8) |
The above representation implies that to prove (6) we need only to show that . To this end we will verify the conditions of Theorem A.1 in the appendix, due to Beśka, Kłopotowski and Słomiński (Beśka et al., 1982). Denote ℱn,0 = {∅, Ω} and ℱn,k = σ(Xn,1, . . . , Xn,k), k = 1, . . . , n. Then using the first form of An,k from (8) we see that
due to (5) and thus (A.1) holds. Similarly,
(9) |
and thus (A.2) also follows with . Since An,k ≥ 0 the required convergence in (A.3) (for any ε > 0) will follow from convergence of the unconditional moments
(10) |
Using the second form of An,k from (8) we see that the conditional distribution of mn pn(Xn,k) An,k given Xn,k follows a binomial distribution Binom(k − 1, pn(Xn,k)). Since for M ~ Binom(r, p) we have 𝔼 M = rp, 𝔼 M2 = rp + r(r − 1) p2 and 𝔼 M3 = rp +3r(r − 1)p2 +r(r − 1)(r − 2)p3, we thus obtain
Similarly,
Note that (C) and (5) imply and therefore
Combining the limits of the last three expressions we conclude that the right-hand side of (10) tends to zero and hence (A.3) of Theorem A.1 is also satisfied. The result follows.
Let us now consider the case . As it turns out, under this condition the statistic is asymptotically Gaussian.
Theorem 2.2
Assume that condition (C) is satisfied and that there exists δ > 0 such that
(11) |
as well as
(12) |
Then
(13) |
Remark 2.3
Note that under (C) the conditions (11) (with δ = 1) and (12) are implied by the condition n/mn → λ ∈ (0, ∞).
Proof
As in Theorem 2.1, under our assumption (C) it suffices to show convergence in distribution to N ~ Norm(0, 1) of the normalized Un variable
where
(14) |
and the last equality defines Bn,k. Since 𝔼(I(Xn,k = Xn, j)|ℱn,k−1) = pn(Xn, j) for any j = 1, . . . , k − 1, it follows that 𝔼(Yn,k|ℱn,k−1) = 0. Consequently, (Yn,k, ℱn,k)k=1,...,n are martingale differences. Therefore, to prove (13) we may use the Lyapounov version of the CLT for martingale differences (see Theorem A.2 in the appendix).
Due to (14) we have
Since 𝕍ar(I(Xn = Xn, j)|ℱn,k−1) = pn(Xn, j)(1 − pn(Xn, j)) and
we obtain
Consequently, (A.4) is equivalent to
(15) |
To show the above, we separately consider moments of the summands on the left-hand side of (15). For the first one, note that
where the last equality denotes the distributional equality of random variables. Therefore, using inequality (B.2) given in the appendix, we get (possibly with different universal constants C from line to line)
In view of this and the elementary inequality |a + b|p ≤ C(|a|p + |b|p) valid for any p > 0 and any real a, b we have for some constants C1, C2
For the numerator of the second part on the left hand side of (15) we may write
Moreover,
since the expectations of the other terms resulting from squaring the large-bracketed first expression above are equal to zero. Consequently
and thus for the squared expectation of the second term in (15) we get
Note that here we used the fact that mn → ∞. To finish the proof we only need to show (A.5). Again we will rely on the representation of Yn,k given in (14). Note that
Since I(Xn, j = Xn,k) − pn(Xn,k), j = 1, . . . , k − 1, are conditionally iid given Xn,k and
then by conditioning with respect to Xn,k and applying Rosenthal’s inequality (see (B.1) in the appendix) to the conditional moment of the sum we obtain
(16) |
By virtue of the Schwartz inequality we obtain that
in view of (11). Therefore, it only suffices to show that the first term in the last expression in (16) converges to zero. But this follows due to (11) and (12), since
3. Discussion
We will now illustrate the results of the previous section with some examples as well as put them in a broader context of earlier work by others. For the sake of completeness, we first note
Remark 3.1. The case λ = 0
Consider . Then the last part of the right hand side of (7) converges to zero and we are left with the sum of non-negative random variables which satisfies
To see the above, it suffices to consider the convergence of the first moments. To this end note that
The simple illustration of Theorem 2.2 is as follows.
Example 3.1
Let α ∈ [0, 1) and set pn(i) = (Cαiα)−1 for i = 1, . . . , mn. Here in view of the general formula
(17) |
Note that for 0 < α < 1 the condition (C) is equivalent to
(18) |
and implies (12). Applying (17) again we see that for any δ > 0
and therefore (11) is also satisfied. Hence, the conclusion of Theorem 2.2 holds true under (18) for 0 < α < 1.
Note that in the above example the assumption (5) of Theorem 2.1 cannot be satisfied for 0 < α < 1 (see (18)) but can hold for α = 0, that is, when the distribution is uniform. We remark that in our present setting such distribution is of interest, for instance, when testing for signal-noise threshold in data with large number of support points (Pietrzak et al., 2016). Combining the results of Theorems 2.1 and 2.2 and Remark 3.1 one obtains the following.
Corollary 3.2 (Asymptotics of for uniform distribution)
Assume that for i = 1, 2, . . . , mn and n = 1, 2, . . . as well as
Then
We note that the asymptotic distribution of when both n and mn tend to infinity has been considered by several authors, typically in the context of asymptotics of families of goodness-of-fit statistics related to different divergence distances. Some of these results considered also the asymptotic behavior of such statistics not only under the null hypothesis (as we did here) but also under simple alternatives and hence are, in that sense, more general. However, when applied to the chi-square statistic under the null hypothesis they appear to be special cases of our theorems in Section 2. We briefly review below some of the most relevant results.
Tumanyan (1954, 1956) proved asymptotic normality of under the assumption min1≤i≤mn npn(i) → ∞ which in the case of the uniform distribution is equivalent to n/mn → ∞, a condition obviously stronger than we use (see Corollary 3.2).
Steck (1957) generalized these results on normal asymptotics assuming among other conditions that infn n/mn > 0 which again is stronger than . He also obtained the Poissonian and degenerate limit in the case of uniform distribution, in agreement with the first two cases in our Corollary 3.2. The main result of Holst (1972) for the chi-square statistic gives normal asymptotics under the regime n/mn → λ ∈ (0, ∞) and max1≤j≤n pn( j) < β/n which also is stronger than our assumptions. In the uniform case under this regime the result was proved earlier in Harris and Park (1971). The main result of Morris (1975) for the chi-square statistics gives asymptotic normality under n min1≤j≤n pn( j) > ε > 0 for all n ≥ 1, max1≤j≤n pn( j) → 0 and the ”uniform asymptotically negligible” condition of the form , where , i = 1, . . . , mn, and . In the case of the uniform distribution it gives asymptotic normality of under the condition n/mn > ε > 0, the result apparently weaker than the third part of Corollary 3.2.
Following the paper of Cressie and Read (1984) introducing the family of power divergence statistics (of which the chi-square statistic is a member), much effort was directed at proving asymptotic normality for wider families of divergence distances as well as for more than one multinomial independent sample, see e.g. Menéndez et al. (1998); Pérez and Pardo (2002) (in both papers the authors considered the regime n/mn → λ ∈ (0, ∞)) and Inglot et al. (1991), Morales et al. (2003) (in both papers the authors considered the regime and for some β ≥ 1) or Pietrzak et al. (2016) (with the regime n/mn → ∞). Note that for the asymptotic normality results all these regimes are again more stringent than what we consider here.
Finally, for completeness, we briefly address one of the scenarios when condition (C) does not hold.
Remark 3.3
Note that if then the asymptotic behavior of standardized is the same as that of , where
Since for any fixed n ≥ 1 random variables Yn,k, k = 1, . . . , n, are iid (zero mean) and 𝕍ar Yn,k = n−1 it follows that {Yn,k, k = 1, . . . , n}n≥1 is an infinitesimal array. Therefore classical CLT for row-wise iid triangular arrays (cf., e.g., Shao, 2003, chapter 1) applies. Note also that the remaining case when appears more complicated and requires a different approach.
Acknowledgments
The research was conducted when the second author was visiting The Mathematical Biosciences Institute at OSU. Both authors thank the Institute for its logistical support and funding through US NSF grant DMS-1440386. The research was also partially funded by US NIH grant R01CA-152158 and US NSF grant DMS-1318886. The authors wish to gratefully acknowledge helpful comments made by the referee and the associate editor on the early version of the manuscript.
Appendix A. Limit Theorems
Below, for convenience of the readers, we recall some results which are used in the proofs. The first one is found in Beśka et al. (1982) and the second one is a version of the martingale CLT (see, e.g., Hall and Heyde, 1980).
Theorem A.1 (Poissonian conditional limit theorem)
Let {Zn,k, k = 1, . . . , n; n ≥ 1} be a double sequence of non-negative random variables adapted to a row-wise increasing double sequence of σ-fields {𝒢n,k−1, k = 1, . . . , n; n ≥ 1}. If for n → ∞
(A.1) |
(A.2) |
and for any ε > 0
(A.3) |
then , where Z ~ Pois(η) is a Poisson random variable.
Theorem A.2 (Lyapunov-type martingale CLT)
Let {(Zn,k, ℱn,k) k = 1, . . . , n; n ≥ 1} be a double sequence of martingale differences. If
(A.4) |
and
(A.5) |
then , where N ~ Norm(0, 1) is a standard normal random variable.
Appendix B. Moment Inequalities
The following moment inequalities are used in Section 2.
Rosenthal inequality
Rosenthal (1970). If X1, . . . , Xn are independent and centered random variables such that 𝔼|Xi|r < ∞, i = 1, . . . , n and r > 2 then
(B.1) |
MZ-BE inequality
Marcinkiewicz and Zygmund (1937) for r ≥ 2, von Bahr and Esseen (1965) for 1 ≤ r ≤ 2. If X1, . . . , Xn are independent and centered random variables such that 𝔼|Xi|r < ∞, i = 1, . . . , n then for r > 1
(B.2) |
where .
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Beśka M, Kłopotowski A, Słomiński L. Limit theorems for random sums of dependent d-dimensional random vectors. Probability Theory and Related Fields. 1982;61(1):43–57. [Google Scholar]
- Cressie N, Read TR. Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society. Series B (Methodological) 1984:440–464. [Google Scholar]
- Grotzinger JP, Sumner D, Kah L, Stack K, Gupta S, Edgar L, Rubin D, Lewis K, Schieber J, Mangold N, et al. A habitable fluvio-lacustrine environment at Yellowknife Bay, Gale Crater, Mars. Science. 2014;343(6169):1242777. doi: 10.1126/science.1242777. [DOI] [PubMed] [Google Scholar]
- Hall P, Heyde CC. Martingale limit theory and its application. New York: Academic Press; 1980. includes indexes. [Google Scholar]
- Harris B, Park C. Indagationes Mathematicae (Proceedings) Vol. 74. Elsevier; 1971. The distribution of linear combinations of the sample occupancy numbers; pp. 121–134. [Google Scholar]
- Holst L. Asymptotic normality and efficiency for certain goodness-of-fit tests. Biometrika. 1972;59(1):137–145. [Google Scholar]
- Inglot T, Jurlewicz T, Ledwina T. Asymptotics for multinomial goodness of fit tests for a simple hypothesis. Theory of Probability & Its Applications. 1991;35(4):771–777. [Google Scholar]
- Korolyuk VS, Borovskich YV. Theory of U-statistics. Vol. 273. Springer Science & Business Media; 2013. [Google Scholar]
- Marcinkiewicz J, Zygmund A. Quelques théoremes sur les fonctions indépendantes. Fund Math. 1937;29:60–90. [Google Scholar]
- Menéndez M, Morales D, Pardo L, Vajda I. Asymptotic distributions of φ-divergences of hypothetical and observed frequencies on refined partitions. Statistica Neerlandica. 1998;52(1):71–89. [Google Scholar]
- Morales D, Pardo L, Vajda I. Asymptotic laws for disparity statistics in product multinomial models. Journal of Multivariate Analysis. 2003;85(2):335–360. [Google Scholar]
- Morris C. Central limit theorems for multinomial sums. The Annals of Statistics. 1975:165–188. [Google Scholar]
- Pearson K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 1900;50(302):157–175. [Google Scholar]
- Pérez T, Pardo J. Asymptotic normality for the Kϕ-divergence goodness-of-fit tests. J Comput Appl Math. 2002;145:301–317. [Google Scholar]
- Pietrzak M, Rempała GA, Seweryn M, Wesołowski J. Limit theorems for empirical Rényi entropy and divergence with applications to molecular diversity analysis. TEST. 2016:1–20. [Google Scholar]
- Rosenthal HP. On the subspaces ofl p (p¿ 2) spanned by sequences of independent random variables. Israel Journal of Mathematics. 1970;8(3):273–303. [Google Scholar]
- Shao J. Springer Texts in Statistics. Springer; 2003. Mathematical Statistics. [Google Scholar]
- Steck GP. Limit theorems for conditional distributions. Univ California Publ Statist. 1957;2(12):237–284. [Google Scholar]
- Tumanyan SK. On the asymptotic distribution of the chi-square criterion. Dokl Akad Nauk SSSR. 1954;94:1011–1012. [Google Scholar]
- Tumanyan SK. Asymptotic distribution of the chi-square criterion when the number of observations and number of groups increase simultaneously. Teor Veroyat Yeyo Primen. 1956;1(1):131–145. [Google Scholar]
- von Bahr B, Esseen CG. Inequalities for the r-th absolute moment of a sum of random variables, 1 ≦ r ≦ 2. The Annals of Mathematical Statistics. 1965;36(1):299–303. [Google Scholar]