Abstract
In this paper we study U-statistics with side information incorporated using the method of empirical likelihood. Some basic properties of the proposed statistics are investigated. We find that by implementing the side information properly, the proposed U-statistics can have smaller asymptotic variance than the existing U-statistics in the literature. The proposed U-statistics can achieve asymptotic efficiency in a formal sense and their weak limits admit a convolution result. We also find that the corresponding U-likelihood ratio procedure, as well as the U-empirical likelihood based confidence interval construction, do not benefit from incorporating side information, a result that is consistent with the result under the standard empirical likelihood ratio procedure. The impact of incorrect side information implementation in the proposed U-statistics is also explored. Simulation studies are conducted to assess the finite sample performance of the proposed method. The numerical results show that with side information implemented, the deduction of asymptotic variance can be substantial in some cases, and the coverage probability of the confidence interval using the U-empirical likelihood ratio based method outperforms that of the normal approximation based method, in particular in the cases when the underlying distribution is skewed.
Keywords: Efficiency, Information bound, Side information, U-statistic
1. Introduction
Since the pioneering work of Hoeffding [15], the U-statistics have been an active research field in statistics due to their wide range of applications. Hoeffding [16] established some fundamental properties of U-statistics which had close relationship with the V-statistic proposed by von Mises [37]. Berk [5] discovered the reverse martingale structure for U-statistic. Sen (e.g. [33]) made a number of contributions in this topic. Parallel to the result for V-statistics, Gregory [13] obtained the asymptotic distribution for degenerate U-statistics with rank two. The asymptotic distribution of U-statistics with arbitrary rank was developed by Janson [17] and Rubin and Vitale [32], etc. Borovskich [7] extended the results to Hilbert space. A detailed review and major historical developments in this field can be found in the book by Koroljuk and Borovskich [21], hereafter denoted as KB.
The empirical likelihood (EL) is one of the recent major developments in statistics. The original idea can be traced back to Thomas and Grunkemeier [35]. The work of Owen [25–27] formally established the advantages and application scopes of this method, and paved the road of increasing popularity of EL due to the wide range of applications, the theoretical advantages, the simplicity of usage and the flexibility to incorporate auxiliary (or side) information in various forms. EL has been applied in various problems, for example, nonparametric confidence regions [9], the generalized linear model [20], survival analysis [1], density and quantile estimations [8,39], goodness-of-fit measure [3], nonparametric regression [10,29], marginal and conditional likelihood [30], ROC curve [31], econometrics [19], etc. It is well known that incorporating side information via empirical likelihood can reduce asymptotic variance of the estimators [28]. Motivated by this fact, we explore to incorporate side information into the U-statistic using the EL method, and expect that the new procedure can improve the performance of U-statistic under appropriate conditions.
It is also known that constructing confidence regions using EL ratio has various advantages than using normal approximation based method or bootstrap. For example, Wood et al. [38] and Jing et al. [18] have considered the EL method to U-statistics to construct confidence intervals without side information incorporated. We investigate the U-statistic to construct confidence intervals using the empirical likelihood by incorporating side information, and the resulting confidence intervals are compared with those based on normal approximation. Our method of formulating the weights of U-statistics is parallel to those in the EL, and is different from that in [38,18]. We find that by incorporating the side information properly, the proposed U-statistics will have smaller asymptotic variance than the existing U-statistics methods without side information. The proposed U-statistics can achieve asymptotic efficiency in a formal sense, and their weak limits admit a convolution result. We also find that the U-statistic EL based likelihood ratio procedure do not benefit from incorporating the side information asymptotically, a result that is consistent with the result under the standard empirical likelihood ratio procedure. The resulting coverage probability based on finite sample still outperforms that of the normal based approximation. The impact of incorrect side information incorporation is also explored.
In Section 2 we introduce the framework of the proposed U-statistics with side information incorporated, and investigate the basic asymptotic properties of the proposed U-statistics in Section 3. The U-empirical likelihood ratio with side information is formulated in Section 4. Examples and simulations results are given in Section 5 to illustrate the proposed method. All the relevant proofs are left in the Appendix.
2. Incorporating side information in U-Statistics
Let X1, …, Xn be independent and identically distributed (i.i.d.) random variables with unknown distribution function F (x) = P(Xi ≤ x). In this paper, we assume Xi being random variable for simplicity, although there is no essential difference to extend it to the case of random vectors. Denote X = (X1, …, Xm)′, (m ≥ 2). Let i = (i1, …, im)′, Xi = (Xi1, …, Xim)′. Dn,m = {i : 1 ≤ i1 < ⋯ < im ≤ n} denotes the collection of indices for the U-statistic of degree m. Let be the combination number of m elements out of n, x = (x1, …, xm)′, and Fn,m(x) be the empirical distribution function of Fm based on the sample 𝒳n ≔ {Xi : i ∈ Dn,m}, with mass at each point in 𝒳n. Given an m-variate symmetric kernel h, the U-statistic is defined as
The goal is to estimate θ = EFmh(X), where EFm denotes the expectation with respect to Fm. It is known that the U-statistic Un is the minimal variance unbiased estimator of θ [34, p. 176].
Since the work of Owen [25], the empirical likelihood (EL) has gained increasing popularity due to its wide range of applications, simplicity to use and flexibility to incorporate auxiliary (or side) information. We examine here to combine both the EL method to flexibly incorporate side information and theU-statistics to achieve a better variance for the estimator.
We consider the set-up for EL as in [28]. Suppose the side information can be incorporated into the EL through a d-dimensional known function g(x) = (g1(x), …, gd(x))′ via the relationship
where E[·] denotes the expectation with respect to F. The EL is defined as
where the wis are the nonparametric maximum likelihood estimated empirical masses assigned to the observation Xis. With the side information constraints, the EL is
Let t = (t1, …, td)′ be the Lagrange multipliers corresponding to the constraint of g(·), and as in [26], we get
where tj = tj(X1, …, Xn)(j = 1, …, d) are determined by
To combine the EL method and U-statistics, Wood et al. [38] considered a weighted U-statistic
with weight w(i1, …, im) = wi1 ⋯ wim being estimated using EL procedure. Jing et al. [18] proposed a Jackknife EL for the U-statistic without side information considered. They first merge the observed h(Xi)’s into a Jackknife sample, then treat this Jackknife pseudo sample as a sample of n i.i.d observations and apply the standard EL method for the mean to obtain the EL estimate for U-statistic.
In this paper, our goal is to estimate θ = EFmh(X) under the information constraints to incorporate side information in the form
| (1) |
Without loss of generality g(·) is assumed symmetric with respect to its arguments (otherwise we can set g(x1, …, xm) = 1/m! ∑(p) g(xi1, …, xim) to make it symmetric, where the notation ∑(p) denote summation over the indices (i1, …, im) of all the permutations of (1, …, m)). This function g includes constraints EF (g(X1)) = 0 as a special case by setting a componentwise product . Some examples of g(·) will be given in Section 5 for illustration.
To formulate the proposed U-statistic we consider a different but a direct way to define the weights w(i1, …, im)’s. Let wi = Fm({Xi}) and w = (wi : i ∈ Dn,m). Since the wi’s are unknown (as is Fm), we maximizes the product of the wi’s subject to appropriate constraints (they may not be independent of each other). Re-write the EL subject to the side information constraints as
We get, as in [26], that
| (2) |
and t = tn = (tn1, …, tnd)′ with tnj = tnj(X1, …, Xn)(j = 1, …, d) being determined by
| (3) |
For details regarding the existence of t as the solution of (3) see, for example, the papers by Owen and others. The proposed weights for U-statistics are parallel to those in the EL, and simpler than some existing method in that there is no need to form a product of m elements from w1, …, wn as in [38], nor to merge the data as in [18].
Similar to Hoeffding [15], for any kernel h(·) with EFm(h(X)) < ∞, let hc (x1, …, xc) = E(h(X1, …, Xm)|X1 = x1, …, Xc = xc), be its centered version (c = 1, …, m), , and in general,
where δxs (ys) is the Dirac function, taking value 1 if ys = xs and 0 otherwise. The integration representation above can be found in KB. The h̃c’s are called canonical forms of h. If h̃1 = ⋯ = h̃k−1 = 0 and h̃k ≠ 0 (or equivalently Var(h̃1) = ⋯ = Var(h̃k−1) = 0 and Var(h̃k) ≠ 0), the U-statistic Un with kernel h is said of rank k(1 ≤ k ≤ m). When k > 1Un is called degenerate; when k = m it is called complete degenerate. Un has the following Hoeffding [16] representation
Let , Un has the following variance formula [16]
Define gc = (gc,1, …, gc,d)′ with
and the canonical forms g̃c = (g̃c,1, …, g̃c,d)′ for g as,
Similarly, let qc (c = 1, …, m) be the canonical forms of g(·)h(·) = (g1(·)h(·), …, gd(·)h(·))′. The canonical forms h̃c and q̃c (c = 1, …, m) exist theoretically, but are unknown in practice since F is unknown. Let ro = min{rank(g1), …, rank(gd)}, r = rank(h), r1 = min{rank(g1h), …, rank(gdh)}, and F̃nm be the empirical distribution with mass wi at the observation xi. Using the weights wi’s given in (2) and (3), we define the U-statistic with side information given by the constraints g as
| (4) |
In comparison, the commonly used U-statistic Un has weight at each observation h(Xi), while with the EL formulation, the weights are replaced by wi. In the following we investigate the basic asymptotic properties of Ũn.
3. The asymptotic properties of Ũn
In this section we study some basic asymptotic behavior of the proposed U-statistic, including its convergence, asymptotic distribution, uniform convergence, and asymptotic efficiency. The following conditions will be used in this section:
(C1). Ω ≔ E[g(X)g′(X)] is positive definite.
(C2). E‖g(X)‖α < ∞ for some α > 0 to be specified.
(C3). EFm|h(X)| < ∞.
(C4). EFmh2(X) < ∞.
(C5). EFm[‖g(X)‖2|h(X)|] < ∞.
where ‖·‖ denotes the Euclidean norm. We note that (C2) with α ≥ 4 plus (C4) implies (C5).
3.1. Convergence rate of Ũn
We first give a lemma to characterize the asymptotic form of the weight wi’s, which will be used repeatedly in the asymptotic study.
Lemma. Assume (C1) and (C2) for α > 2m/ro, we have
where, 1d = (1, …, 1)′ is the d-dimensional vector of 1’s, the O(·) terms are uniformly for all Xi’s and i’s, with
The Op(·) terms above are uniformly for all the xi’s and i’s.
Theorem 1. (i). Assume the conditions in the lemma plus (C3) and (C5), if r = 1, then
(ii). Assume conditions in the lemma plus (C4) and (C5), if r > 1, then
(iii). Assume (C4) and conditions of Lemma (i), if r = 1, then with σ2 given in Theorem 2 (i),
3.2. Asymptotic distribution of Ũn
Let J1(h) be the Gaussian process indexed by h ∈ L2(R, ℬ, F) with mean E(J1(h)) = 0 and covariance Cov(J1(h), J1(g)) = ∫ h(x)g(x)F(dx) for all h, g ∈ L2(R, ℬ, F). Let W(·) be the Gaussian random measure on L2(R, ℬ, P) defined by W(A) = J1(IA), A ∈ ℬ. J1(h) = ∫ h(x)W(dx) is called the Wiener–Itô integral of order 1. Generally, for h ∈ L2(Rr, ℬr, Fr), the Wiener–Itô integral of order r is defined as
and its covariance is given by
For a vector function h = (h1, …, hd)′ with hj ∈ L2(Rr, ℬr, F)(j = 1, …, d), define Jr (h) componentwisely as a d-dimensional random process. Denote for convergence in distribution.
Theorem 2. (i) Assume (C4) and conditions of the lemma, if r = 1, then
where , A = EFm[g(X)h(X)] and A1 = EF [g̃1(X1)h̃1(X1)].
(ii) Assume (C4), conditions of Lemma (ii) and r > 1, then
when A ≠ 0,
when A = 0,
From Theorem 2 we see that the most interesting case is r = ro = r1 = 1, in which is asymptotic non-degenerate normal, with asymptotic variance being smaller than that of . σ2 is the same as that of Un either when r1 > 1, A = 0, or when ro > 1, A1 = 0 and Ω1 = 0. Thus, for the side information to be of practical meaning, we need r = ro = r1 = 1.
It is interesting to note that if we have full information about the parameter to be estimated, then we can “estimate” the parameter with perfection, i.e., its asymptotic variance being reduced to zero. As an artificial example, let a and b are nonzero known constants, μ = E(X1), and . Since g is a known function, μ must be known. We are to estimate θ = Eh(X1, …, Xm) = amμ using the U-statistic (4), with the wi’s given by (2) and (3). In this case, θ is already known as is μ, and we can “estimate” θ with zero asymptotic variance, as in the following
Corollary 1. Assume τ2 = Var(X1) < ∞, and h and g are as given in the above. Then Theorem 2 (i) holds with σ2 = 0.
3.3. The optimality property of Ũn
To study the asymptotic efficiency of the estimators of θ, let 𝕀(θ|g) be the information bound [6] for estimating θ given the side information in g, to be given in Theorem 3(i) below. When the asymptotic variance of an estimator achieves this bound or equals to this bound up to a known multiplicative positive constant, the estimator is called asymptotically efficient. It is the limit version of the Cramer–Rao lower bound for variances of unbiased estimators. For Euclidean parameters without g, the information (lower) bound is the inverse of Fisher information.
Suppose f (·|θ) is the density function of X given θ, θn = θ + n−1/2b for some b ∈ C, the complex plane. An estimator Tn = Tn(X1, …, Xn) is said to be regular, if under f (·|θn), for some random variable W, and the result does not depend on the sequence {θn}. Let Z ⊕ V denote the summation of two independent random variables Z and V, I(θ) be the Fisher information at θ, and Z ~ N(0, I−1(θ)). The convolution Theorem [14] states that for any regular estimator Tn with weak limit W, there is a V such that
This result further characterizes the weak limit of an asymptotic efficient estimator without side information: it is a normal random variable with mean zero and variance I−1(θ). Below we obtain the information bound and convolution result for the proposed estimators with the presence of side information.
Theorem 3. Assume r = ro = 1, (C4) and conditions in the lemma, we have
Thus, if we set g(x) = (g(x1) + ⋯ + g(xm))/m, then rank(g) = 1, A = mA1, Ω = mΩ1, σ2 = m2𝕀(θ|g) and Ũn is efficient.- Assume further that the density f (·|θ) of X has the second order continuous partial derivative with respect to θ, then for any regular estimator Tn with weak limit W of , W can be decomposed as, for some V,
It is easy to see that any U-statistic with side information of the form Ũn is regular, thus is optimal in the sense of convolution under the conditions of Theorem 3. Without side information, the asymptotic variance of is ; when side information presents, the asymptotic variance of is , with a reduction of . From the proof of Theorem 3(i) we see that 𝕀(θ|g) is the length of the projection of h̃1(X) onto [g̃1(X)⊥], the linear span of the orthogonal complements of g̃1(X). Increasing the components in g (and thus in g̃1) shrinks the space [g̃1(X)⊥], and shortens the length of the projection or increases the efficiency of Ũn, or increasing the number of information constraints reduces the asymptotic variance of the U-statistic.
Remark. By Theorem 2(i) or Theorem 3(i), given a nominal level α, a level (1 − α) confidence interval of θ can be obtained as [Ũn ± n−1/2σΦ−1(1 − α/2)], without using likelihood ratio, where Φ−1(·) is the standard normal quantile function. Here σ is smaller with the presence of side information than no side information involved, hence the inference becomes more accurate.
3.4. The uniform SLLN and CLT of Ũn-processes
Let P̃n,m, Pn,m, Pm and P be the (random) probability measures induced by F̃n,m, Fn,m, Fm and F respectively. For a function h, denote P̃n,mh = ∑i∈Dn,m wih(Xi), Pmh = EPmh(X), and . For fixed h and g, we have shown that, under appropriate conditions,
with . In contrast, with . Thus incorporating the side information g reduces the asymptotic variance by the amount of .
It is of interest to have a uniform version of the above SLLN and CLT over a class of functions ℋ. The uniformity means supremum over ℋ, which may or may not be measurable, thus the almost sure and weak convergence results here will be in the sense of outer measure P* of P (cf. [36]; hereafter VW). When the corresponding quantity is measurable, the convergence is automatically in the sense of the measure P itself. Nolan and Pollard [23,24] study the uniform SLLN and the CLT for U-process of order two. Giné and Zinn [12], Arcones and Giné [2] and Giné [11], among others, study other types of uniform problems in general situations. Here we explore the uniform laws for U-statistics under different conditions.
Let ℋ be a class of functions satisfying (C4), and for any probability measure Q, denote ‖Qh‖ℱ = sup{|Qh| : h ∈ ℋ}. Let L∞(ℋ) be the space of functionals z : ℋ ↦ R with norm ‖z‖ℋ = suph∈ℋ |z(h)| and the metric on L∞(ℋ) is given by d(z1, z2) = ‖z1 − z2‖ℋ for z1, z2 ∈ L∞(ℋ). For an integer m-vector k = (k1, …, km), a subset 𝒳 of Rm and a function h : 𝒳 ↦ R, denote |k| = k1 + ⋯ + km, . ‖x‖ is the Euclidean norm for x ∈ 𝒳,
and CM(𝒳) is the set of functions h : 𝒳 ↦ R with ‖h‖s ≤ M. Let be a partition of Rm into bounded convex sets with non-empty interior, ℋ1 be the class of functions h such that the restrictions ℋ1|Ij belong to CMj (Ij) for every j and M = maxj Mj < ∞. Let ℋ2 be the class of convex functions h : C ↦ R for some convex compact C ⊂ Rm such that |h(x) − h(y)| ≤ L‖x − y‖ for some 0 < L < ∞, all x, y ∈ C and h ∈ ℋ2, and ‖ ∑i∈Dn,m eih(xi)‖ℋ2 is measurable for each n and each ei ∈ {−1, 1} (ℋ2 is then called P-measurable in VW). An envelope function G of ℋ is a function such that |h(x)| ≤ G(x) for all x, and h ∈ ℋ. Let ℋ = ℋ1 ∪ ℋ2 with (C4) satisfied on ℋ and λ(·) be the Lebesgue measure on Rm. Let denote weak convergence in L∞(ℋ). We have
Theorem 4. (i) Under the conditions of Theorem 1(i), for ℋ defined above, assume that ∀h ∈ ℋ, gh ∈ ℋ in the componentwise sense, ℋ1 has a square integrable envelope function H, maxj λ(Ij) < ∞, , and ℋ2 is bounded. Then we have
(ii) Under the conditions of Theorem 3(ii), assume ℋ has a square integrable envelope function H, maxj λ(Ij) < ∞, maxj λ(Ij) < ∞, m < 4, s > m/2 for ℋ1, and with υ = m/s. Then
where 𝔾 is a Gaussian process indexed by ℋ, with EP (𝔾h) = 0 and for all h, q ∈ ℋ.
Using results in [2,11], we can get many more results, below we only mention one.
Corollary 2. For a class of functions ℋ, assume that ∀h ∈ ℋ, gh ∈ ℋ; that ℋ is a measurable VC-subgraph class of functions with envelope H and PmH < ∞, or ∀ε > 0, (for definition, see p. 1512, [2]). Then
4. The empirical likelihood ratio for U-statistics with side information
Next we define the empirical likelihood ratio for θ, and construct the confidence interval for θ in the presence of side information. Let G(x|θ) = (g′(x), h(x) − θ)′, we have EFm G(X|θ) = 0. Without side information, the weights that maximize ∏i∈Dn,m wi subject to ∑i∈Dn,m wi = 1 are for all i ∈ Dn,m; while the weights that maximize ∏i∈Dn,m wi subject to ∑i∈Dn,m wi = 1 and ∑i∈Dn,m wiG(Xi|θ) = 0 are and t is determined by (3) with g(·) replaced by G(·|θ). Therefore we define the empirical log likelihood ratio of θ with the presence of side information by
where
and denote
Let , η2 = Var(h(X)); and Λ1 = Cov(G̃1), G̃1 is the first canonical form (vector) of G.
Note that when there is no side information, G(·|θ) reduces to h(·) − θ, and t is a scalar determined by ∑i∈Dn,m(h(Xi) − θ)/[1 + t(h(Xi) − θ)] = 0. The corresponding log-likelihood ratio is
Theorem 5. (i) Under conditions of Theorem 2 (i) or Theorem 3 (i), assume ro = 1 and Λ to be positive definite, then
(ii) Assume (C4), then
When m = 1, and the above result for U-statistic automatically reduces to that for the common EL ratio, and the right hand side in Theorem 5(i) is (see the corresponding result in Theorem 2 of Qin and Lawless [28]), therefore with side information incorporated in the likelihood ratio, the length of confidence region for θ cannot be reduced, this is an interesting contrast to the estimation with side information, in which the asymptotic variance is reduced. However, using the EL ratio, the shape of the confidence region is more natural than many other commonly used methods, such as the normal approximation, which are forced to be symmetric. The latter method may have poorer coverage probability because of the shorter interval length and its subjective shape.
Although side information is widely applied in practice to improve performance of estimators via the EL, the following Corollary 3 describes the effects when incorrect side information is used, thus side information should be used with care, and be justified properly before its use.
Corollary 3. If EFmg(X) = δ ≠ 0, then
- Under conditions of Theorem 1 (i),
- Under conditions of Theorem 2 (i),
- If EFmG(X) = δ ≠ 0, then under conditions of Theorem 5 (i),
when Λ = Λ1, , the chi-squared distribution of degree d + 1 with noncentrality parameter nδ′Λ−1δ.
5. Examples and simulation studies
5.1. Examples
In this section we give some examples for illustration.
Example 1. For a given distribution F, let θ(F) = ∫ (x − μ)2dF(x) be the variance, where μ is the mean. Let μk, k ≥ 2 be the k-th moment of F. For the kernel h(x1, x2) = (x1 − x2)2/2, we have h̃1(x1) = [(x1 − μ)2 − θ]/2, η2 = E(h2) − θ2 = (μ4 + θ2)/2, . Without side information, the asymptotic variance of Un based on kernel h(x1, x2) is , which is the same as that for the sample variance estimator .
If we know that F has median at 0: F (0) = 1/2, we take g(x1, x2) = [I(x1 ≤ 0) + I(x2 ≤ 0)]/2 − 1/2. Then g̃1(x1) = [I(x1 ≤ 0) − 1/2]/2, , and . So by Theorem 3(i), the asymptotic variance of Ũn is now , a deduction of from .
Example 2. For the Wilcoxon one-sample statistic, θ(F) = PF (x1 + x2 ≤ 0), the kernel for the corresponding U-statistic is h(x1, x2) = I(x1 + x2 ≤ 0), h̃1(x1) = F (−x1) − θ, . Without side information, the asymptotic variance of Un based on kernel h(x1, x2) is .
Suppose we know the distribution is symmetric about a > 0: F (x − a) = 1 − F (a − x) for all x. Take g(x1, x2) = [I(x1 ≤ 0) + I(x1 ≤ 2a) + I(x2 ≤ 0) + I(x2 ≤ 2a)]/2 − 1, then g̃1(x1) = [I(x1 ≤ 0) + I(x1 ≤ 2a)]/2 − 1/2, Ω1 = F (−a)/2, . The deduction of asymptotic variance is .
Example 3. For the Gini difference, θ(F) = EF |x1 − x2|, the corresponding kernel for U-statistic is h(x1, x2) = |x1 − x2|. We have . Without side information, the asymptotic variance of Un based on kernel h(x1, x2) is .
If we know the distribution mean μ, and take g(x1, x2) = (x1 + x2)/2 − μ, then g̃1(x1) = (x1 − μ)/2, Ω1 = ∫(x − μ)2dF(x), . The deduction of asymptotic variance is .
5.2. Simulation studies
Simulation studies are conducted to assess the finite-sample performance of the proposed methods in this section. These studies are based on Examples 1 and 2 in Section 5.1. We compare variance estimates of U-statistics with and without side information, and calculate the variance reduction under different sample sizes. We also compare various U-EL based and normal approximation-based confidence intervals for θ in terms of coverage probability. Although side information does not have effect asymptotically, as indicated by Theorem 5, the finite sample property of constructing confidence intervals using the U-EL ratio is still of interest, and will be compared with those obtained through normal approximation based method.
Based on the U-EL theory developed in Section 4, we can construct three U-EL based intervals for θ as follows:
The first one, called EL1 interval, is defined as
where q1−α is the (1 − α)-th quantile of the distribution of . q1−α can be estimated by using the sample estimates of Λ1 and Λ and Monte Carlo method.
One can also approximate the quantile of the distribution of l(θ) by using bootstrap method. Let (B ≥ 200 is recommended) are B bootstrap replicates of l(θ). Then, the second EL-based interval for θ, called EL2, is given by
where is the b-th ordered value of ’s, and [x] represents the integer part of x.
The third one, called EL3 interval, is constructed as follows:
where . This interval is motivated by the fact that the distribution of can be approximated by a scaled chi-squares distribution. i.e.,
where c is an unknown constant, and .
The asymptotic normal distribution obtained in Theorem 2 can be used to construct two additional confidence intervals for θ, called AN1 and AN2 intervals, as follows:
where σ̂ is the estimate of σ by plugging the sample estimates of all population quantities in Theorem 2. σ̂* is the bootstrap estimate of σ based on B bootstrap samples. For computation consideration, we take B = 200 in the simulation studies.
Examples 1 and 2 in Section 5.1 are considered in the simulation study. In the first example, the underlying distribution is chosen to be a skewed distribution with median 0. Here we take X ~ exp(1) − ln(2), the standard exponential distribution with a shifted center. Then EX = 1 − ln 2, Median(X) = 0, and θ = Var(X) = 1. In the second example, we consider a symmetric distribution with mean a. We choose X ~ 𝒩(1, 4), then , where Φ(x) is the cdf of the standard normal distribution. The simulation results are presented in Tables 1–4.
Table 1.
The asymptotic variance estimation of U-statistics. X ~ exp(1) − ln(2).
| Method | n = 50 | n = 100 | n = 150 | n = 200 |
|---|---|---|---|---|
| Without side information | 8.5239 | 7.8569 | 7.3839 | 7.1557 |
| With side information | 8.4572 | 7.5524 | 7.2673 | 7.0791 |
| Variance reduction | 0.0667 | 0.3045 | 0.1165 | 0.0766 |
Table 4.
Coverage probabilities of various 95% confidence intervals for θ with side information. X ~ 𝒩(1, 4).
| Sample size | EL1 | EL2 | EL3 | AN1 | AN2 |
|---|---|---|---|---|---|
| n = 50 | 0.876 | 0.983 | 0.918 | 0.956 | 0.945 |
| n = 100 | 0.882 | 0.981 | 0.931 | 0.961 | 0.947 |
| n = 150 | 0.926 | 0.978 | 0.968 | 0.978 | 0.968 |
| n = 200 | 0.942 | 0.956 | 0.984 | 0.970 | 0.954 |
Tables 1 and 2 show the estimated asymptotic variances of U-statistics with or without side information respectively under sample size n = 50, 100, 150, 200. The reduction of variance is also calculated. The results are based on 1000 repetitions.
Table 2.
The asymptotic variance estimation of U-statistics. X ~ 𝒩(1, 4).
| Method | n = 50 | n = 100 | n = 150 | n = 200 |
|---|---|---|---|---|
| Without side information | 0.2413 | 0.2208 | 0.2199 | 0.2203 |
| With side information | 0.0548 | 0.0526 | 0.0527 | 0.0572 |
| Variance reduction | 0.1865 | 0.1682 | 0.1673 | 0.1631 |
Tables 3 and 4 show the coverage probabilities of the EL-based intervals (EL1, EL2 and EL3) and the normal approximation-based intervals (AN1 and AN2) with side information.
Table 3.
Coverage probabilities of various 95% confidence intervals for θ with side information. X ~ exp(1) − ln(2).
| Sample size | EL1 | EL2 | EL3 | AN1 | AN2 |
|---|---|---|---|---|---|
| n = 50 | 0.942 | 0.949 | 0.994 | 0.783 | 0.784 |
| n = 100 | 0.929 | 0.950 | 0.991 | 0.858 | 0.878 |
| n = 150 | 0.934 | 0.954 | 0.990 | 0.872 | 0.880 |
| n = 200 | 0.950 | 0.949 | 0.989 | 0.898 | 0.904 |
The simulation results show that the proposed U-statistic Ũn performs well in finite sample cases. From Tables 1 and 2 we can clearly see a reduction of the variance of estimating θ. The variance reduction can be significant, as in Example 2, which shows that the proposed method could offer a more accurate estimation.
From the coverage probabilities in Tables 3 and 4, we see that the U-EL based confidence intervals work significantly better than the normal approximation-based confidence intervals when the underlying distribution is a skewed distribution (Example 1). When the underlying distribution is a symmetric distribution (Example 2), the performances of these methods of methods are comparable. Furthermore, in most cases, bootstrap-based methods work better than plugin methods.
Concluding remarks
We studied a method to implement side information into the U-statistic, via the empirical likelihood approach, and investigated some asymptotic behavior of the proposed method. We show, for parameter estimation, the proposed U-statistic with side information has advantages, such as smaller asymptotic variance, over that without side information incorporated. We also explored the construction of confidence intervals using U-statistic based empirical likelihood ratio. Although such U-EL ratio does not benefit from side information asymptotically, our simulation studies show that the corresponding confidence intervals still out perform those based on normal approximation in finite sample cases. We also note that, if incorrect side information is incorporated, the resulting estimates can be seriously biased. Thus in practice the incorporation of side information should be justified properly.
Acknowledgments
This work is supported in part by the National Center for Research Resources at NIH grant 2G12RR003048. Dr. Gengsheng Qin’s work is supported in part by US NSA grant H98230-12-1-0228. Dr. Wenqing He’s work is partially supported by the Natural Sciences and Engineering Research Council of Canada.
Appendix
Proof of the Lemma. (i) As in [26], write t = tn = ρne with ρn ≥ 0 and e = e(X1, …, Xn) a d-vector with ‖e‖ = 1. We first find the asymptotic order of tn. Denote , Zn = max{|e′g(Xi)| : i ∈ Dn,m}, and Z̃n = max{|e′g(X̃i)| : i ∈ D̃n,m} with , ∀i and j ∈ D̃n,m, i and j have no common entry, and X̃i = (X̃i1, …, X̃im), X̃i1, …, X̃im i.i.d. with X1 for all i1, …, im (i = 1, …, n). Since Z̃n is a maximum over i.i.d. samples, while Zn is that over dependent samples come from the same distribution, for large n, we have Zn ≤ Z̃n (a.s.). Since E‖e′g(X)‖α < ∞, (a.s.) as in [26], and so (a.s.). We have
Below we will show, for some 0 < c < C < ∞, for all large n, uniformly for all the Xi’s and i’s,
| (A.1) |
In fact, R1,n is a (matrix valued) U-statistic with a.s. limit 0 < E[g(X)g′(X)] = Ω < ∞, where the “0 <” is in the matrix positive definite sense and the “<∞” is in the componentwise sense. Let 0 < λ1 ≤ ⋯ ≤ λd < ∞ be all the eigenvalues of Ω, we have Ω = Q′diag(λ1, …, λd)Q with Q being orthonormal. Denote η = Qe, then η′η = 1. Then for large n, R1,n > Ω/2 (a.s.) thus e′R1,ne > e′Ωe/2 = η′diag(λ1, …, λd)η/2 ≥ λ1/2 ≔ c > 0 (a.s.). Similarly, for large n, R1,n < 2Ω (a.s.) and e′R1,ne < 2λd ≔ C (a.s.).
Since ‖e‖ = 1, we only need to prove the second assertion in (A.1) for R2,n. Note R2,n is a (vector) U-statistic with kernel g(x) satisfying E(g(X)) = 0. Recall gc is the canonical forms of g, let R2,nc be the corresponding Hoeffding forms of R2,n(c = 1, …, m). By the given condition we have E(‖gc (X)‖2) ≤ E‖g(X)‖2 < ∞, (c = ro, …, m), E‖g(X)‖4/3 < ∞, and in componentwise sense (component j in R2,nc is zero for c = ro, …, rj −1 if rj ≔ rank(gj) > ro). If ro = 1, let , by Theorem 9.1.1 in KB, we get
If ro > 1, by Lemma 9.2.1 in KB,
Now, since R1,n = O(1) (a.s.) with 0 < O(1) < ∞ and , we have, for ro = 1, ρn/(1 + ρnZn) = O(|e′R2,n|) = O(n−1/2(log log n)1/2) (a.s.), or ρn(1 − o(n−(1/2−m/α)(log log n)1/2)) = O(n−1/2(log log n)1/2) (a.s.). For ro > 1, ρn(1 − o(nm/α−ro/2 log n)) = o(n−ro/2 log n) (a.s.). Thus we have
Since for all i, (a.s.), thus for large n, maxi∈Dn,m |t′g(Xi)| < 1 (a.s.), so we have
or
thus
We have already shown (a.s.). Also g(·)g′(·) is non-degenerate by (C1), also since m ≥ 2, E‖g(X)‖4 < ∞ by (C2), thus by the law of iterated logarithm (LIL) for U-statistics, (a.s.), hence
Similarly,
so,
From this we get, (a.s.),
(ii) As in the proof of (i), we only need to show the results regardless of e. Treating gg′ and R1,n as vectors of length d2. Under the given conditions, note gg′ is non-degenerate, by the central limit theorem (CLT) for U-statistics,
where Ξ is determined by gg′. Similarly, for R2,n, since Eg(X) = 0, if ro = 1, by standard U-statistics theory, is asymptotical normal, or R2,n = Op(n−1/2). If ro > 1, note ro ≤ m, the given conditions implies E|gc (x1, …, xc)|2c/(2c−ro) < ∞ for c = ro, …, m. So, by Theorem 4.4.1 in KB, for some non-degenerate random variable R2, thus R2,n = Op(n−ro/2). So, as in the proof of (i), we get ρn(1 − oP (nm/α−ro/2)) = Op(n−ro/2). Since oP (nm/α−ro/2) = op(1), we have ‖tn‖ = ρn = Op(n−ro/2).
Going through the proofs in (i), with O(n−1/2(log log n)1/2) replaced by Op(n−1/2), we get , and, with ρn = n−ro/2,
Proof of Theorem 1. (i) By Lemma (i), we have
By the given conditions and the SLLN of U-statistics, ,
where A = EFm[g(X)h(X)], and
Since by the given conditions, EFm|h̃c (X)|γ < ∞ and EFm |g̃c (X)| < ∞ for γ = cp/(p(c − 1) + 1), c = 1, …, m and 1 < p < 2, so by Corollary 3.4.1 in KB, n−1/p+1(Un − θ) → 0 (a.s.), or nq(Un − θ) → 0 (a.s.) and nqU0,n → 0 (a.s.) for all q < 1/2, and consequently, for all q < 1/2,
(ii) Use notations in (i), we have
Recall for any U-statistic Un with rank r and canonical forms h̃c (c = r, …, m) the following decomposition holds
Since for c = r, …, m, by Lemma 9.2.1 in KB, Un,c = o(n−c/2 log n) (a.s.), so Un = o(n−r/2 log n) (a.s.). Thus, when r1 = r0 = 1, U1,n = O(1) (a.s.), U0,n = o(n−q) (a.s.) for any q < 1/2, and U2,n = O(1) (a.s.) as its kernel is always non-degenerate, so by Lemma (i) we have
When r1 > r0 = 1, by Lemma 9.2.1 in KB, U1,n = o(n−r1/2 log n) (a.s.), note O(n−1 log log n) = o(n−1 log n), and so
When 1 = r1 < r0,
and when r1, r0 > 1,
(iii) Using Lemma (i) and notations in the proof of (i), we have, a.s.,
Recall that U0,n = o(n−q), U1,n −A = o(n−q) (a.s.) for all 0 < q < 1/2, and U2,n → C2 (a.s.) for some C2 < ∞. We have, a.s., for all 0 < q < 1/2,
By Theorem 9.1.1 of KB, the LIL holds for the first term above, and the above equation gives the desired result.
Proof of Theorem 2. (i) Using the fact that U1,n → A (a.s.) and U2,n → C2 (a.s.) for some C2 < ∞ as proved in Theorem 1(i), thus by Lemma (ii),
The second term above is, for all 0 < q < 1/2, n1/2O(n−2q) = oP (1); the third term above is OP (n−ro/2) as U1,n → A(a.s.) < ∞; and the last term above is OP (n−(ro−1/2)) as U2,n → C (a.s.) for some C < ∞; Thus we only need to deal with the first term above.
Let
then H1(x1) ≔ E[H(X)|X1 = x1] = 0, i.e. H(x) is a degenerate kernel. Similarly, G(x) is degenerate, so is K(x) = H(x) − A′Ω−1G(x), with EFmK(X) = 0 and rk ≔ rank(K) ≥ 2. Now we have
Let K̃c be the canonical forms of K, and by the given conditions, (c = rk, …, m). So by Hoeffding’s formula,
and so . We get
Since Var[m(h̃1(X1) − A′Ω−1 g̃1(X1))] = σ2 if when ro > 1 (in this case g̃1(·) ≡ 0). Now the result follows from the standard CLT and Slutsky’s theorem.
(ii) We have, since U2,n = OP (1),
By Theorem 4.4.2 in KB, , or Un − θ = OP (n−r/2). Similarly in summary we have
Also, U1,n → A (a.s.), and when r0 = 1, .
First we consider the case A ≠ 0. In this case, Un − θ = OP (n−r/2), and OP (n−(ro+1)/2)U1,n + OP (n−ro) = oP (n−ro/2).
Thus when ro < r, we have
When ro = r,
When ro > r,
Now we consider the case A = 0, then , and
When ro ≤ min{r1, r/2}, Ũn − θ = OP (n−ro), and its distribution needs more accurate expansion to evaluate. When r < min{2ro, r1 + ro},
When r1 + ro < r or r1 < ro,
When r1 + ro = r,
Proof of Corollary 1. In this case we have h1(X1) = a[X1 + (m − 1)μ], h̃1(X1) = a(X1 − ν), . Also, g1(X1) = b(X1 − μ) = g̃1(X1), A1 = abτ2, , Ω1 = b2τ2, and ro = 1. So by Theorem 2(i) we have σ2 = m2(a2τ2 − 2a2τ2 + a2τ2) = 0.
Proof of Theorem 3. (i) Note θ = EFmh(X). The information bound is for parameter of the form EF (s(X1)) for some s(·). Recall h1(x1) = E[h(X1, …, Xm)|X1 = x1] and EF (h1(X1)) = θ, thus we take s(·) = h1(·). Similarly, the constraint for computing the information bound should be a uni-variate function, we take it to be g1(x1).
Let f (x) be the density/mass function of F (x) with respect to some dominating measure μ(x), denote γ (f) = ∫h1(x)f (x)dμ(x) = θ as a functional of f, γ̇ (f)(x) be the adjoint (evaluated at 1) of its pathwise derivative with respect to log f (for definition, see, for example, [6]), γ1(f) = Ef [g1(X)] for the side information constraint, γ̇1(f)(x) the adjoint (evaluated at 1) of its pathwise derivative, L2,d,r (f) = {s(x) : s : Rd → Rr, Ef [s(X)s′(X)] < ∞}, for s1 ∈ L2,d,k(f) and s2 ∈ L2,d,r (f), define the inner product (matrix) , the norm (matrix) ‖s1‖2 = 〈s1, s1〉 and ‖s1‖−2 ≔ (‖s1‖2)−1 when ‖s1‖2 is non-degenerate.
By Proposition A.5.2 in [6], we have γ̇ (f) = h1(X) − θ = h̃1(X) and γ̇1(f) = g1(X) = g̃1(X). Let ∏(υ|υ1) be the projection of υ onto [υ1], the linear span of υ1 with respect to f and μ, and the orthogonal complement of [υ1] with respect to f and μ. Without side information, the efficient influence function ℐ(X, γ (f)) for estimating γ (f) is ℐ(X, γ (f)) = γ̇ (f) and the information bound is ‖ℐ(X, γ (f))‖2. In the presence of side information γ1(f), by Example 3.2.3 in [6], the efficient influence function ℐ(X, γ (f)|γ1(f)) for estimating γ (f) is
and the information (lower) bound for estimating θ, with side information g, is .
When g(x) = (g(x1) + ⋯ + g(xm))/m, we have g̃1(x1) = g1(x1) = E[g(x1, …, xm)|x1] = g(x1)/m, A = E[g(X)h(X)] = E[g(X1)h1(X1)] = mE[g1(X1)h1(X1)] = mE[ g̃1(X1)h̃1(X1)] = mA1, , thus
Since m2 is a known positive constant, we can just divide Ũn by m so that its asymptotic variance is 𝕀(θ|g), and thus it is efficient.
Since σ2 = ‖∏(γ̇ (f)| γ̇1(f)⊥)‖2 ≥ 0, with “=” iff γ̇ (f) = h̃1(X) ∈ [γ̇1(f)], the linear span of γ̇1(f) = g̃1(X), or θ is completely determined by g̃1(X), which is impossible. Also , with “=” iff γ̇ (f) ∈ [γ̇1(f)⊥], or 0 = 〈γ̇ (f), γ̇1(f)〉 = A.
(ii) Let f (x|θ, g) be the density function given the parameter θ and the information constraint g, S(x|θ, g) = ∂ log f (x|θ, g)/∂θ be the corresponding score function. The corresponding Fisher information is I(θ|g) = ‖S(X|θ, g)‖2. Although S(x|θ, g), hence I(θ|g), is not directly available, the corresponding efficient influence function ℐ(X, γ (f)|γ1(f)) is given in (i), and we have the following relationship between the information bound 𝕀(θ|g) and the Fisher information I(θ|g)
Let be the log-likelihood, we have the following locally asymptotic normality [22] of the likelihood ratio
where .
Let ϕY (t) = E[exp{itY}] be the characteristic function of a random variable Y. We are to show limn ϕWn (t) = ϕU (t)ϕZ (t). In fact, by assumption of regularity,
where the last step above is by the same argument as in [4]. Since b ∈ C is arbitrary, take b = −itI−1(θ|g), we get
thus
Now take U = W − 𝕀(θ|g)V, the proof is complete.
Proof of Theorem 4. (i) Denote the related U-statistics as functions of h, and note the O(·) terms in the lemma are independent of h. Note U1,n is a functional of gh, U2,n is a functional of , θ and Un are functionals of h, and U0,n is a functional of g. As in the proof of Theorem 1(ii), we have
Since U0,n(g) → 0 (a.s.) and is independent of h, we only need to show, a.s.,
In fact, since gh ∈ ℋ for all h ∈ ℋ, and U1,n(h) = Un(h), we have suph∈ℋ |U1,n(gh)| ≤ suph∈ℋ |U1,n(gh) − Pm(gh)| + suph∈ℋ |Pm(gh)| ≤ suph∈ℋ |U1,n(h) − Pm(h)| + suph∈ℋ |Pm(h)| = suph∈ℋ |Un(h) − θ(h)| + suph∈ℋ |Pm(h)|. Since ℋ has an integrable envelope H, suph∈ℋ |Pm(h)| ≤ Pm(suph∈ℋ |h|) ≤ Pm(H) < ∞. Thus, suph∈ℋ |U1,n(gh)| < ∞ (a.s.), if suph∈ℋ |Un(h) − θ(h)| → 0 (a.s.).
Similarly, since ‖g‖2h ∈ ℋ for all h ∈ ℋ, and U2,n(h) = Un(h), we have suph∈ℋ |U2,n((1dg + ‖g‖2)h)| < ∞ (a.s.), if suph∈ℋ |Un(h) − θ(h)| → 0 (a.s.).
Now we only need to prove suph∈ℋ |Un(h) − θ(h)| → 0 (a.s.), (the class ℋ is then called P-Glivenko–Cantelli). Since the property of P-Glivenko–Cantelli is permanent for finite union of classes, we only need to prove this on ℋ1 and ℋ2 separately.
We first prove ℋ1 is P-Glivenko–Cantelli. For ε > 0, let N[ ](ε, ℋ1, L1(Pm)) be the bracketing entropy of the class ℋ1 with L1(P) norm: ∀h ∈ ℋ1, ‖h‖Pm = EPm|h|. We first prove that if N[ ](ε, ℋ1, L1(Pm)) < ∞ for all ε > 0, then the conclusion is true. In fact, given ε > 0, since N[ ](ε, ℋ1, L1(Pm)) < ∞, there are finite many ε-brackets [li, ui] whose union covers ℋ1 and such that Pm(ui − li) < ε for all i. Then for any h ∈ ℋ1, there is an upper bracket ui such that
Consequently,
Since by SLLN for U-statistics, (Pn,m − Pm)ui → 0 (a.s.), thus limn suph∈ℋ1 (Un(h) − Pm(h)) ≤ ε (a.s.). Similarly, limn infh∈ℋ1 (Un(h) − Pm(h)) ≥ −ε (a.s.). Since ε > 0 is arbitrary, we get suph∈ℋ1 |Un(h) − Pm(h)| → 0 (a.s.).
Now we show N[ ](ε, ℋ1, L1(Pm)) < ∞ for all ε > 0. Let . By Corollary 2.7.4 in VW, for some constant K depending only on υ and m,
for every ε > 0, υ ≥ 1 and probability P. The given condition implies that maxj . Take υ = 1, the right hand side above is finite by the given condition.
Now we show suph∈ℋ2 |Un(h) − θ(h)| → 0 (a.s.). Let, for ε > 0, N(ε, ℋ2, L1(Q)) be the entropy of ℋ2 without bracketing under norm L1(Q) for some probability measure Q and N(ε, ℋ2, ‖·‖∞) be that under norm ‖·‖∞. Recall H is also an envelope function on ℋ2. Since L1(Q) ≤ ‖·‖∞, N(ε ‖H‖Q, ℋ2, L1(Q)) ≤ N(ε ‖H‖Q, ℋ2, ‖·‖∞), with ‖H‖Q = (∫ H2dQ)1/2. Let M be the bound on ℋ2, and ℋ̃2 = {(h − infx∈C h(x))/M : h ∈ ℋ2}, then ℋ̃2 is the class of convex functions h : C ↦ [0, 1] with Lipschitz constant L/M, and N(ε, ℋ2, ‖·‖∞) = N(εM, ℋ̃2, ‖·‖∞). By Corollary 2.7.10 in VW, for any ε > 0,
for any probability measure Q, and K only depends on m and C. Also, since H is an envelope function over ℋ2 ≠ {0}, thus infQ ‖H‖Q ≥ δ for some δ > 0, and the infimum is over 𝒬 of all probability measures Q on C, with ‖H‖Q < ∞. Thus we have, for any ε > 0,
also, ℋ2 is P-measurable by its definition, thus ℋ2 is P-Glivenko–Cantelli (cf. the statement in lines −5 to −3, p. 84 of VW).
(ii) The class ℋ with the stated property is called P-Donsker. First, it is apparent that for any k and h1, …, hk ∈ ℋ, for the Gaussian process 𝔾 on ℋ as stated. So by Theorem 1.5.4 in VW, we only need to show that {𝔾̃n,m} is asymptotically tight on ℋ. Using Lemma (ii) and similar argument as in the proof of (i), since and , we only need to show this for {𝔾n,m}, and by Theorem 1.5.7 in VW, we only need to show that {𝔾n,m} is asymptotically equicontinuous and totally bounded on ℋ. Below we will show that if
| (A.2) |
then {𝔾n,m} is asymptotically equicontinuous and totally bounded on ℋ, where 𝒬 is the collection of all measures Q with ‖H‖Q < ∞.
With (A.2), Theorem 2.5.2 in VW asserted the corresponding conclusion for empirical measures. Now we extend the result to U-statistics. For this, we point out that the symmetrization Lemma 2.3.1 in VW still holds for U-statistics, also Hoeffding’s inequality holds for U-statistics (Arcones and Giné [2, Proposition 2.3, p. 1501]), thus the proofs there are valid in our situation.
To check (A.2) on ℋ, and we only need to check it for ℋ1 and ℋ2 separately. Using Corollary 2.7.4 in VW, we have
for all ε > 0, υ ≥ m/s. Since the given condition implies maxj , in the above inequality choose υ = m/s, then by the given condition, and since υ < 2, we have
hence by the statement in p. 85 in VW, ℋ1 satisfies (A.2). The original statement in VW is for the integral . Since ℋ1 has a square integrable envelope function H, so ∀h1, h2 ∈ ℋ, ‖h1 − h2‖L2(P) ≤ ‖h1‖L2(P) + ‖h2‖L2(P) ≤ 2‖H‖L2(P) < ∞, i.e., ℋ1 itself is a ball with radius no greater than 2‖H‖L2(P), or N[ ](ε, ℋ1, L2(P)) = 1 for ε ≥ 2‖H‖L2(P), thus its entropy is zero for ε ≥ 2‖H‖L2(P), so the integration is finite iff is finite.
For ℋ2, similarly as in the proof of (i), for some η > 0,
Since m < 4, so
thus by (2.1.7) in VW, ℋ2 is P-Donsker.
Proof of Corollary 2. From proof of Theorem 4(i), we only need to show suph∈ℋ |Pn,m − Pmh| → 0 (a.s.), which is true by Corollary 3.3, or 3.5 respectively in [2].
Proof of Theorem 5. (i) As in the proofs of the previous theorems, with g replaced by G, since ro ≔ min{rank(g1), …, rank(gd), rank(h)} = 1, by Lemma (ii) we have
and by standard U-statistics theory,
Also, , and maxi since m/α < ro/2 = 1/2, so
This completes the proof since
(ii) This is a special case of (i).
References
- 1.Adimari G. Empirical likelihood type confidence intervals under random censorship. Annals of the Institute of Statistical Mathematics. 1997;49:447–466. [Google Scholar]
- 2.Arcones MA, Giné E. Limit theorems for U-processes. Annals of Probability. 1993;21(3):1494–1542. [Google Scholar]
- 3.Baggerly KA. Empirical likelihood as a goodness-of-fit measures. Biometrika. 1998;85:535–547. [Google Scholar]
- 4.Begun JM, Hall WJ, Huang W, Weller JA. Information and asymptotic efficiency in parametric–nonparametric models. Annals of Statistics. 1983;11:432–452. [Google Scholar]
- 5.Berk RH. Limiting behavior of posterior distributions when the model is incorrect. Annals of Mathematical Statistics. 1966;37:51–58. [Google Scholar]
- 6.Bickel PJ, Klaassen CA, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. Baltimore, Maryland: Johns Hopkins University Press; 1993. [Google Scholar]
- 7.Borovskich YuV. Institute of Mathematics, Ukraine. Acad. Sci., Kiev. Russia: 1986. Theory of U-Statistics in Hilbert Space. [Google Scholar]
- 8.Chen SX. Empirical likelihood for nonparametric density estimation. Australian Journal of Statistics. 1997;39:47–56. [Google Scholar]
- 9.Chen SX, Hall P. Smoothed empirical likelihood confidence intervals for quantiles. Annals of Statistics. 1993;21:1166–1181. [Google Scholar]
- 10.Chen SX, Qin YS. Empirical likelihood confidence intervals for local linear smoothers. Biometrika. 2000;87:946–953. [Google Scholar]
- 11.Giné E. Lectures on Probability Theory and Statistics, Sanit Flour 1996, in: Lecture Notes in Mathematics. vol. 1665. Berlin: Springer; 1997. Decoupling and limit theorems for U-statistics and U-processes; pp. 1–35. [Google Scholar]
- 12.Giné E, Zinn J. Probability, Banach Spaces. vol. 8. Boston: Birkhäuser; 1992. Marcinkiewicz type laws of large numbers and convergence of moments for U-statistics; pp. 273–291. [Google Scholar]
- 13.Gregory G. Large sample theory for U-statistics and tests of fit. Annals of Statistics. 1977;5:110–123. [Google Scholar]
- 14.Hájek J. A characterization of limiting distributions of regular estimates. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete. 1970;14:323–330. [Google Scholar]
- 15.Hoeffding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics. 1948;19:293–325. [Google Scholar]
- 16.Hoeffding W. The strong law of large numbers for U-statistics. Inst. Statist. Mimeo. Ser. 1961;(302):1–10. [Google Scholar]
- 17.Janson S. The asymptotic distribution of degenerate U-statistics. Department of Mathematics, University of Uppsala; 1979. pp. 1–17. Preprint No. 5. [Google Scholar]
- 18.Jing BY, Yuan J, Zhou W. Empirical likelihood for U-statistics. Journal of the American Statistical Association. 2009;104:1224–1232. [Google Scholar]
- 19.Kitamura Y. Empirical likelihood methods in econometrics: theory and practice. Discussion Paper 1569, Cowles Foundations for Research in Economics. 2006 [Google Scholar]
- 20.Kolaczyk ED. Empirical likelihood for generalized linear models. Statistica Sinica. 1994;4:199–218. [Google Scholar]
- 21.Koroljuk VS, Yu V, Borovskich . Theory of U-Statistics. The Netherlands: Kluwer Academic Publishers; 1994. [Google Scholar]
- 22.LeCam L. Publ. Statist. vol. 3. Univ. California Press; 1960. Locally Asymptotically Normal Families of Distributions; pp. 37–98. [Google Scholar]
- 23.Nolan D, Pollard D. U-process: rates of convergence. Annals of Statistics. 1987;15:780–799. [Google Scholar]
- 24.Nolan D, Pollard D. Functional limit theorems for U-process. Annals of Statistics. 1988;16:1291–1298. [Google Scholar]
- 25.Owen AB. Empirical likelihood ratio confidence intervals for a single functional. Biometrika. 1988;75:237–249. [Google Scholar]
- 26.Owen AB. Empirical likelihood confidence regions. Annals of Statistics. 1990;18:90–120. [Google Scholar]
- 27.Owen AB. Empirical likelihood for linear models. Annals of Statistics. 1991;19:1725–1747. [Google Scholar]
- 28.Qin J, Lawless JL. Empirical likelihood and general estimating equations. Annals of Statistics. 1994;22:300–325. [Google Scholar]
- 29.Qin GS, Tsao M. Empirical likelihood based inference for the derivative of the nonparametric regression function. Bernoulli. 2005;11:715–735. [Google Scholar]
- 30.Qin J, Zhang B. Marginal likelihood, conditional likelihood and empirical likelihood: connections and applications. Biometrika. 2005;92:251–270. [Google Scholar]
- 31.Qin GS, Zhou XH. Empirical likelihood inference for the area under ROC curve. Biometrics. 2006;62:613–622. doi: 10.1111/j.1541-0420.2005.00453.x. [DOI] [PubMed] [Google Scholar]
- 32.Rubin H, Vitale RA. Asymptotic distribution of symmetric statistics. Annals of Statistics. 1980;8:165–170. [Google Scholar]
- 33.Sen PK. Almost sure behavior of U-statistics and von Mises’ differentiable statistical functions. Annals of Statistics. 1974;2:387–395. [Google Scholar]
- 34.Serfling R. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980. [Google Scholar]
- 35.Thomas D, Grunkemeier G. Confidence interval estimation of survival probabilities of censored data. Journal of the American Statistical Association. 1975;70:865–871. [Google Scholar]
- 36.van der Vaart A, Wellner J. Weak Convergence and Empirical Processes: With Applications to Statistics. New York: Springer-Verlag; 1996. [Google Scholar]
- 37.von Mises R. On the asymptotic distributions of differentiable statistical functions. Annals of Mathematical Statistics. 1947;18:309–348. [Google Scholar]
- 38.Wood ATA, Do KA, Broom NM. Sequential linearization of empirical likelihood constraints with application to U-statistics. Journal of Computational and Graphical Statistics. 1996;5:365–385. [Google Scholar]
- 39.Zhang B. A note on kernel density estimation with auxiliary information. Communications in Statistics-Theory and Methods. 1998;27:1–11. [Google Scholar]
