Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 May 21.
Published in final edited form as: J Multivar Anal. 2012 Oct;111:20–38. doi: 10.1016/j.jmva.2012.04.008

U-statistic with side information

Ao Yuan a,*, Wenqing He b, Binhuan Wang c, Gengsheng Qin c
PMCID: PMC3660044  NIHMSID: NIHMS462605  PMID: 23704796

Abstract

In this paper we study U-statistics with side information incorporated using the method of empirical likelihood. Some basic properties of the proposed statistics are investigated. We find that by implementing the side information properly, the proposed U-statistics can have smaller asymptotic variance than the existing U-statistics in the literature. The proposed U-statistics can achieve asymptotic efficiency in a formal sense and their weak limits admit a convolution result. We also find that the corresponding U-likelihood ratio procedure, as well as the U-empirical likelihood based confidence interval construction, do not benefit from incorporating side information, a result that is consistent with the result under the standard empirical likelihood ratio procedure. The impact of incorrect side information implementation in the proposed U-statistics is also explored. Simulation studies are conducted to assess the finite sample performance of the proposed method. The numerical results show that with side information implemented, the deduction of asymptotic variance can be substantial in some cases, and the coverage probability of the confidence interval using the U-empirical likelihood ratio based method outperforms that of the normal approximation based method, in particular in the cases when the underlying distribution is skewed.

Keywords: Efficiency, Information bound, Side information, U-statistic

1. Introduction

Since the pioneering work of Hoeffding [15], the U-statistics have been an active research field in statistics due to their wide range of applications. Hoeffding [16] established some fundamental properties of U-statistics which had close relationship with the V-statistic proposed by von Mises [37]. Berk [5] discovered the reverse martingale structure for U-statistic. Sen (e.g. [33]) made a number of contributions in this topic. Parallel to the result for V-statistics, Gregory [13] obtained the asymptotic distribution for degenerate U-statistics with rank two. The asymptotic distribution of U-statistics with arbitrary rank was developed by Janson [17] and Rubin and Vitale [32], etc. Borovskich [7] extended the results to Hilbert space. A detailed review and major historical developments in this field can be found in the book by Koroljuk and Borovskich [21], hereafter denoted as KB.

The empirical likelihood (EL) is one of the recent major developments in statistics. The original idea can be traced back to Thomas and Grunkemeier [35]. The work of Owen [2527] formally established the advantages and application scopes of this method, and paved the road of increasing popularity of EL due to the wide range of applications, the theoretical advantages, the simplicity of usage and the flexibility to incorporate auxiliary (or side) information in various forms. EL has been applied in various problems, for example, nonparametric confidence regions [9], the generalized linear model [20], survival analysis [1], density and quantile estimations [8,39], goodness-of-fit measure [3], nonparametric regression [10,29], marginal and conditional likelihood [30], ROC curve [31], econometrics [19], etc. It is well known that incorporating side information via empirical likelihood can reduce asymptotic variance of the estimators [28]. Motivated by this fact, we explore to incorporate side information into the U-statistic using the EL method, and expect that the new procedure can improve the performance of U-statistic under appropriate conditions.

It is also known that constructing confidence regions using EL ratio has various advantages than using normal approximation based method or bootstrap. For example, Wood et al. [38] and Jing et al. [18] have considered the EL method to U-statistics to construct confidence intervals without side information incorporated. We investigate the U-statistic to construct confidence intervals using the empirical likelihood by incorporating side information, and the resulting confidence intervals are compared with those based on normal approximation. Our method of formulating the weights of U-statistics is parallel to those in the EL, and is different from that in [38,18]. We find that by incorporating the side information properly, the proposed U-statistics will have smaller asymptotic variance than the existing U-statistics methods without side information. The proposed U-statistics can achieve asymptotic efficiency in a formal sense, and their weak limits admit a convolution result. We also find that the U-statistic EL based likelihood ratio procedure do not benefit from incorporating the side information asymptotically, a result that is consistent with the result under the standard empirical likelihood ratio procedure. The resulting coverage probability based on finite sample still outperforms that of the normal based approximation. The impact of incorrect side information incorporation is also explored.

In Section 2 we introduce the framework of the proposed U-statistics with side information incorporated, and investigate the basic asymptotic properties of the proposed U-statistics in Section 3. The U-empirical likelihood ratio with side information is formulated in Section 4. Examples and simulations results are given in Section 5 to illustrate the proposed method. All the relevant proofs are left in the Appendix.

2. Incorporating side information in U-Statistics

Let X1, …, Xn be independent and identically distributed (i.i.d.) random variables with unknown distribution function F (x) = P(Xix). In this paper, we assume Xi being random variable for simplicity, although there is no essential difference to extend it to the case of random vectors. Denote X = (X1, …, Xm)′, (m ≥ 2). Let i = (i1, …, im)′, Xi = (Xi1, …, Xim)′. Dn,m = {i : 1 ≤ i1 < ⋯ < imn} denotes the collection of indices for the U-statistic of degree m. Let Cnm be the combination number of m elements out of n, x = (x1, …, xm)′, Fm(x)=j=1mF(xj) and Fn,m(x) be the empirical distribution function of Fm based on the sample 𝒳n ≔ {Xi : iDn,m}, with mass 1/Cnm at each point in 𝒳n. Given an m-variate symmetric kernel h, the U-statistic is defined as

Un=(Cnm)1iDn,mh(Xi)=EFn,mh(X).

The goal is to estimate θ = EFmh(X), where EFm denotes the expectation with respect to Fm. It is known that the U-statistic Un is the minimal variance unbiased estimator of θ [34, p. 176].

Since the work of Owen [25], the empirical likelihood (EL) has gained increasing popularity due to its wide range of applications, simplicity to use and flexibility to incorporate auxiliary (or side) information. We examine here to combine both the EL method to flexibly incorporate side information and theU-statistics to achieve a better variance for the estimator.

We consider the set-up for EL as in [28]. Suppose the side information can be incorporated into the EL through a d-dimensional known function g(x) = (g1(x), …, gd(x))′ via the relationship

E[g(X1)]=0,

where E[·] denotes the expectation with respect to F. The EL is defined as

L(F)=i=1nwi,

where the wis are the nonparametric maximum likelihood estimated empirical masses assigned to the observation Xis. With the side information constraints, the EL is

maxwi=1nwisubject toi=1nwi=1andi=1nwig(Xi)=0.

Let t = (t1, …, td)′ be the Lagrange multipliers corresponding to the constraint of g(·), and as in [26], we get

wi=1n11+tg(Xi),

where tj = tj(X1, …, Xn)(j = 1, …, d) are determined by

i=1ng(Xi)1+tg(Xi)=0.

To combine the EL method and U-statistics, Wood et al. [38] considered a weighted U-statistic

(Cnm)11i1<<imnnmwi1wimh(Xi1,,Xim)

with weight w(i1, …, im) = wi1wim being estimated using EL procedure. Jing et al. [18] proposed a Jackknife EL for the U-statistic without side information considered. They first merge the Cnm observed h(Xi)’s into a Jackknife sample, then treat this Jackknife pseudo sample as a sample of n i.i.d observations and apply the standard EL method for the mean to obtain the EL estimate for U-statistic.

In this paper, our goal is to estimate θ = EFmh(X) under the information constraints to incorporate side information in the form

EFmg(X)=0. (1)

Without loss of generality g(·) is assumed symmetric with respect to its arguments (otherwise we can set g(x1, …, xm) = 1/m! ∑(p) g(xi1, …, xim) to make it symmetric, where the notation ∑(p) denote summation over the indices (i1, …, im) of all the permutations of (1, …, m)). This function g includes constraints EF (g(X1)) = 0 as a special case by setting a componentwise product g(X)=j=1mg(Xj). Some examples of g(·) will be given in Section 5 for illustration.

To formulate the proposed U-statistic we consider a different but a direct way to define the weights w(i1, …, im)’s. Let wi = Fm({Xi}) and w = (wi : iDn,m). Since the wi’s are unknown (as is Fm), we maximizes the product of the wi’s subject to appropriate constraints (they may not be independent of each other). Re-write the EL subject to the side information constraints as

maxwiDn,mwisubject toiDn,mwi=1andiDn,mwig(Xi)=0.

We get, as in [26], that

wi=(Cnm)111+tg(Xi), (2)

and t = tn = (tn1, …, tnd)′ with tnj = tnj(X1, …, Xn)(j = 1, …, d) being determined by

iDn,mg(Xi)1+tg(Xi)=0. (3)

For details regarding the existence of t as the solution of (3) see, for example, the papers by Owen and others. The proposed weights for U-statistics are parallel to those in the EL, and simpler than some existing method in that there is no need to form a product of m elements from w1, …, wn as in [38], nor to merge the data as in [18].

Similar to Hoeffding [15], for any kernel h(·) with EFm(h(X)) < ∞, let hc (x1, …, xc) = E(h(X1, …, Xm)|X1 = x1, …, Xc = xc), hco=hcθ be its centered version (c = 1, …, m), 1(x1)=h1o(x1),2(x1,x2)=h2o(x1,x2)1(x1)1(x2),3(x1,x2,x3)=h3o(x1,x2,x3)i=131(xi)1ij32(xi,xj), and in general,

c(x1,,xc)=ho(x1,,xc)i=1c1(xi)1i<jc2(xi,xj)1i1<<ic1cc1(xi1,,xic1)=hc(y1,,yc)s=1cd(δxs(ys)F(ys)),(c=1,,m),

where δxs (ys) is the Dirac function, taking value 1 if ys = xs and 0 otherwise. The integration representation above can be found in KB. The c’s are called canonical forms of h. If 1 = ⋯ = k−1 = 0 and k ≠ 0 (or equivalently Var(1) = ⋯ = Var(k−1) = 0 and Var(k) ≠ 0), the U-statistic Un with kernel h is said of rank k(1 ≤ km). When k > 1Un is called degenerate; when k = m it is called complete degenerate. Un has the following Hoeffding [16] representation

Unθ=c=kmCmcUnc,Unc=(Cnc)11i1<<icnc(Xi1,,Xic).

Let ηc2=E[c2](c=1,,m), Un has the following variance formula [16]

Var(Un)=(Cnm)1c=1mCmcCnmmcηc2.

Define gc = (gc,1, …, gc,d)′ with

gc,j(x1,,xc)=EFm(gj(X1,,Xm)|X1=x1,,Xc=xc),(j=1,,d;c=1,,m)

and the canonical forms c = (c,1, …, c,d)′ for g as,

c(x1,,xc)=gc(y1,,yc)s=1cd(δxs(ys)F(ys)),(j=1,,d;c=1,,m).

Similarly, let qc (c = 1, …, m) be the canonical forms of g(·)h(·) = (g1(·)h(·), …, gd(·)h(·))′. The canonical forms c and c (c = 1, …, m) exist theoretically, but are unknown in practice since F is unknown. Let ro = min{rank(g1), …, rank(gd)}, r = rank(h), r1 = min{rank(g1h), …, rank(gdh)}, and nm be the empirical distribution with mass wi at the observation xi. Using the weights wi’s given in (2) and (3), we define the U-statistic with side information given by the constraints g as

Ũn=iDn,mwih(Xi)=En,mh(X). (4)

In comparison, the commonly used U-statistic Un has weight (Cnm)1 at each observation h(Xi), while with the EL formulation, the weights are replaced by wi. In the following we investigate the basic asymptotic properties of Ũn.

3. The asymptotic properties of Ũn

In this section we study some basic asymptotic behavior of the proposed U-statistic, including its convergence, asymptotic distribution, uniform convergence, and asymptotic efficiency. The following conditions will be used in this section:

  • (C1). Ω ≔ E[g(X)g′(X)] is positive definite.

  • (C2). Eg(X)‖α < ∞ for some α > 0 to be specified.

  • (C3). EFm|h(X)| < ∞.

  • (C4). EFmh2(X) < ∞.

  • (C5). EFm[‖g(X)‖2|h(X)|] < ∞.

where ‖·‖ denotes the Euclidean norm. We note that (C2) with α ≥ 4 plus (C4) implies (C5).

3.1. Convergence rate of Ũn

We first give a lemma to characterize the asymptotic form of the weight wi’s, which will be used repeatedly in the asymptotic study.

Lemma. Assume (C1) and (C2) for α > 2m/ro, we have

  1. wi=a.s.1Cnm(1g(Xi)Ω11CnmjDn,mg(Xj)+1dg(Xi)O(ρnn1/2(log logn)1/2)+[1dg(Xi)+g(Xi)2]O(ρn2)),
    where, 1d = (1, …, 1)′ is the d-dimensional vector of 1’s, the O(·) terms are uniformly for all Xis and is, with
    ρn={O(n1/2(log logn)1/2),ro=1;o(nro/2logn),1<rom.
  2. wi=1Cnm(1g(Xi)Ω11CnmjDn,mg(Xj)+1dg(Xi)Op(n(ro+1)/2)+[1dg(Xi)+g(Xi)2]Op(nro)).

The Op(·) terms above are uniformly for all the xis and is.

Theorem 1. (i). Assume the conditions in the lemma plus (C3) and (C5), if r = 1, then

nq(Ũnθ)0,a.s.for allq<1/2.

(ii). Assume conditions in the lemma plus (C4) and (C5), if r > 1, then

an(Ũnθ)0,(a.s.),wherean={nqfor allq<1/2,ifr1=ro=1;nmin{r/2,1}/logn,ifr1>ro=1;nmin{ro,r}/2/logn,if1=r1<ro;min{nr/2logn,nmin{(r0+r1)/2,r0}(logn)2},ifro,r1>1.

(iii). Assume (C4) and conditions of Lemma (i), if r = 1, then with σ2 given in Theorem 2 (i),

limnsup(2σ2log lognn)1/2|Ũnθ|=1,(a.s.).

3.2. Asymptotic distribution of Ũn

Let J1(h) be the Gaussian process indexed by hL2(R, ℬ, F) with mean E(J1(h)) = 0 and covariance Cov(J1(h), J1(g)) = ∫ h(x)g(x)F(dx) for all h, gL2(R, ℬ, F). Let W(·) be the Gaussian random measure on L2(R, ℬ, P) defined by W(A) = J1(IA), A ∈ ℬ. J1(h) = ∫ h(x)W(dx) is called the Wiener–Itô integral of order 1. Generally, for hL2(Rr, ℬr, Fr), the Wiener–Itô integral of order r is defined as

Jr(h)=h(x1,,xr)W(dx1)W(dxr),hL2(Rr,r,F),

and its covariance is given by

Cov(Jr(h),Jr(g))=r!h(x1,,xr)g(x1,,xr)F(dx1)F(dxr),h,gL2(Rr,r,F).

For a vector function h = (h1, …, hd)′ with hjL2(Rr, ℬr, F)(j = 1, …, d), define Jr (h) componentwisely as a d-dimensional random process. Denote D for convergence in distribution.

Theorem 2. (i) Assume (C4) and conditions of the lemma, if r = 1, then

n(Ũnθ)DN(0,σ2),σ2={m2(η122AΩ1A1+AΩ1Ω1Ω1A),ro=1;m2η12,ro>1;

where η12=EF12(X1),Ω1=EF(1(X1)1(X1)), A = EFm[g(X)h(X)] and A1 = EF [1(X1)1(X1)].

(ii) Assume (C4), conditions of Lemma (ii) and r > 1, then

nb/2(Ũnθ)DZ,where

when A0,

{b=ro,Z=CmroAΩ1Jro(ro),ifro<r;b=r,Z=CmrJr(rAΩ1r),ifro=r;b=r,Z=CmrJr(r),ifro>r;

when A = 0,

{b=ro,Z=OP(1),ifromin{r1,r/2};b=r,Z=CmrJr(r),ifr<min{r1+ro,2ro};b=r,Z=CmrJr(r)Cmr1CmroJr1(r1)Ω1Jro(ro),ifr1+ro=2r;b=ro+r1,Z=Cmr1CmroJr1(r1)Ω1Jro(ro),ifr1+ro<rorr1<ro.

From Theorem 2 we see that the most interesting case is r = ro = r1 = 1, in which n(Ũnθ) is asymptotic non-degenerate normal, with asymptotic variance being smaller than that of n(Unθ). σ2 is the same as that of Un either when r1 > 1, A = 0, or when ro > 1, A1 = 0 and Ω1 = 0. Thus, for the side information to be of practical meaning, we need r = ro = r1 = 1.

It is interesting to note that if we have full information about the parameter to be estimated, then we can “estimate” the parameter with perfection, i.e., its asymptotic variance being reduced to zero. As an artificial example, let a and b are nonzero known constants, μ = E(X1), h(X1,,Xm)=ak=1mXk and g(X1,,Xm)=bk=1m(Xkμ). Since g is a known function, μ must be known. We are to estimate θ = Eh(X1, …, Xm) = amμ using the U-statistic (4), with the wi’s given by (2) and (3). In this case, θ is already known as is μ, and we can “estimate” θ with zero asymptotic variance, as in the following

Corollary 1. Assume τ2 = Var(X1) < ∞, and h and g are as given in the above. Then Theorem 2 (i) holds with σ2 = 0.

3.3. The optimality property of Ũn

To study the asymptotic efficiency of the estimators of θ, let 𝕀(θ|g) be the information bound [6] for estimating θ given the side information in g, to be given in Theorem 3(i) below. When the asymptotic variance of an estimator achieves this bound or equals to this bound up to a known multiplicative positive constant, the estimator is called asymptotically efficient. It is the limit version of the Cramer–Rao lower bound for variances of unbiased estimators. For Euclidean parameters without g, the information (lower) bound is the inverse of Fisher information.

Suppose f (·|θ) is the density function of X given θ, θn = θ + n−1/2b for some bC, the complex plane. An estimator Tn = Tn(X1, …, Xn) is said to be regular, if under f (·|θn), Wnn(Tnθn)DW for some random variable W, and the result does not depend on the sequence {θn}. Let ZV denote the summation of two independent random variables Z and V, I(θ) be the Fisher information at θ, and Z ~ N(0, I−1(θ)). The convolution Theorem [14] states that for any regular estimator Tn with weak limit W, there is a V such that

W=ZV.

This result further characterizes the weak limit of an asymptotic efficient estimator without side information: it is a normal random variable with mean zero and variance I−1(θ). Below we obtain the information bound and convolution result for the proposed estimators with the presence of side information.

Theorem 3. Assume r = ro = 1, (C4) and conditions in the lemma, we have

  1. 𝕀(θ|g)=η12A1Ω11A1.
    Thus, if we set g(x) = (g(x1) + ⋯ + g(xm))/m, then rank(g) = 1, A = mA1, Ω = mΩ1, σ2 = m2𝕀(θ|g) and Ũn is efficient.
  2. Assume further that the density f (·|θ) of X has the second order continuous partial derivative with respect to θ, then for any regular estimator Tn with weak limit W of Wnn(Tnθ), W can be decomposed as, for some V,
    W=ZV,withZ~N(0,𝕀(θ|g)).

It is easy to see that any U-statistic with side information of the form Ũn is regular, thus is optimal in the sense of convolution under the conditions of Theorem 3. Without side information, the asymptotic variance of n(Unθ) is η12; when side information presents, the asymptotic variance of n(Ũnθ) is η12A1Ω11A1, with a reduction of A1Ω11A1. From the proof of Theorem 3(i) we see that 𝕀(θ|g) is the length of the projection of 1(X) onto [1(X)], the linear span of the orthogonal complements of 1(X). Increasing the components in g (and thus in 1) shrinks the space [1(X)], and shortens the length of the projection or increases the efficiency of Ũn, or increasing the number of information constraints reduces the asymptotic variance of the U-statistic.

Remark. By Theorem 2(i) or Theorem 3(i), given a nominal level α, a level (1 − α) confidence interval of θ can be obtained as [Ũn ± n−1/2σΦ−1(1 − α/2)], without using likelihood ratio, where Φ−1(·) is the standard normal quantile function. Here σ is smaller with the presence of side information than no side information involved, hence the inference becomes more accurate.

3.4. The uniform SLLN and CLT of Ũn-processes

Let n,m, Pn,m, Pm and P be the (random) probability measures induced by n,m, Fn,m, Fm and F respectively. For a function h, denote n,mh = ∑iDn,m wih(Xi), Pmh = EPmh(X), 𝔾̃n,mh=n(n,mhPmh) and 𝔾n,mh=n(Pn,mhPmh). For fixed h and g, we have shown that, under appropriate conditions,

n,mhPmh=P1(a.s.)and𝔾̃n,mhDN(0,σ2)

with σ2=σ2(h)=P12P(11)Ω11P(11). In contrast, 𝔾n,mhDN(0,η12) with η12=Pη̃12. Thus incorporating the side information g reduces the asymptotic variance by the amount of P(11)Ω11P(1).

It is of interest to have a uniform version of the above SLLN and CLT over a class of functions ℋ. The uniformity means supremum over ℋ, which may or may not be measurable, thus the almost sure and weak convergence results here will be in the sense of outer measure P* of P (cf. [36]; hereafter VW). When the corresponding quantity is measurable, the convergence is automatically in the sense of the measure P itself. Nolan and Pollard [23,24] study the uniform SLLN and the CLT for U-process of order two. Giné and Zinn [12], Arcones and Giné [2] and Giné [11], among others, study other types of uniform problems in general situations. Here we explore the uniform laws for U-statistics under different conditions.

Let ℋ be a class of functions satisfying (C4), and for any probability measure Q, denote ‖Qh = sup{|Qh| : h ∈ ℋ}. Let L(ℋ) be the space of functionals z : ℋ ↦ R with norm ‖z = suph∈ℋ |z(h)| and the metric on L(ℋ) is given by d(z1, z2) = ‖z1z2 for z1, z2L(ℋ). For an integer m-vector k = (k1, …, km), a subset 𝒳 of Rm and a function h : 𝒳 ↦ R, denote |k| = k1 + ⋯ + km, Dkh(x)=|k|h(x)/(x1k1xmkm). ‖x‖ is the Euclidean norm for x ∈ 𝒳,

hs=max|k|ssupx𝒳|Dkh(x)|+max|k|=ssupx,y𝒳|Dkh(x)Dkh(y)|,

and CM(𝒳) is the set of functions h : 𝒳 ↦ R with ‖hsM. Let Rm=j=1Ij be a partition of Rm into bounded convex sets with non-empty interior, ℋ1 be the class of functions h such that the restrictions ℋ1|Ij belong to CMj (Ij) for every j and M = maxj Mj < ∞. Let ℋ2 be the class of convex functions h : CR for some convex compact CRm such that |h(x) − h(y)| ≤ Lxy‖ for some 0 < L < ∞, all x, yC and h ∈ ℋ2, and ‖ ∑iDn,m eih(xi)‖2 is measurable for each n and each ei ∈ {−1, 1} (ℋ2 is then called P-measurable in VW). An envelope function G of ℋ is a function such that |h(x)| ≤ G(x) for all x, and h ∈ ℋ. Let ℋ = ℋ1 ∪ ℋ2 with (C4) satisfied on ℋ and λ(·) be the Lebesgue measure on Rm. Let D denote weak convergence in L(ℋ). We have

Theorem 4. (i) Under the conditions of Theorem 1(i), fordefined above, assume thath ∈ ℋ, gh ∈ ℋ in the componentwise sense,1 has a square integrable envelope function H, maxj λ(Ij) < ∞, j=1Mj1/2Pm1/2(Ij)<, and2 is bounded. Then we have

suph|n,mhPmh|=0,(a.s.*).

(ii) Under the conditions of Theorem 3(ii), assumehas a square integrable envelope function H, maxj λ(Ij) < ∞, maxj λ(Ij) < ∞, m < 4, s > m/2 for1, and j=1Mj2υυ+2Pυυ+2(Ij)< with υ = m/s. Then

𝔾̃n,mD𝔾inL(),

where 𝔾 is a Gaussian process indexed by ℋ, with EP (𝔾h) = 0 and CovP(𝔾h,𝔾q)=P(11)P(11)Ω11P(11) for all h, q ∈ ℋ.

Using results in [2,11], we can get many more results, below we only mention one.

Corollary 2. For a class of functions ℋ, assume thath ∈ ℋ, gh ∈ ℋ; thatis a measurable VC-subgraph class of functions with envelope H and PmH < ∞, or ∀ε > 0, N[](1)(ε,,Pm)< (for definition, see p. 1512, [2]). Then

suph|n,mhPmh|0,a.s.

4. The empirical likelihood ratio for U-statistics with side information

Next we define the empirical likelihood ratio for θ, and construct the confidence interval for θ in the presence of side information. Let G(x|θ) = (g′(x), h(x) − θ)′, we have EFm G(X|θ) = 0. Without side information, the weights that maximize ∏iDn,m wi subject to ∑iDn,m wi = 1 are wi=(Cnm)1 for all iDn,m; while the weights that maximize ∏iDn,m wi subject to ∑iDn,m wi = 1 and ∑iDn,m wiG(Xi|θ) = 0 are wi=(Cnm)11/(1+tG(Xi|θ)) and t is determined by (3) with g(·) replaced by G(·|θ). Therefore we define the empirical log likelihood ratio of θ with the presence of side information by

RG(θ)=Ln(θ)/(Cnm)Cnm=iDn,m(Cnmwi),

where

Ln(θ)=maxiDn,mwi=1,iDn,mwiG(Xi|θ)=0iDn,mwi

and denote

l(θ)=logRG(θ)=iDn,mlog[1+tG(Xi|θ)].

Let Λ=EFm(G(x|θ)G(X|θ))=(ΩAAη2), η2 = Var(h(X)); and Λ1 = Cov(1), 1 is the first canonical form (vector) of G.

Note that when there is no side information, G(·|θ) reduces to h(·) − θ, and t is a scalar determined by ∑iDn,m(h(Xi) − θ)/[1 + t(h(Xi) − θ)] = 0. The corresponding log-likelihood ratio is

lh(θ)=iDn,mlog[1+t(h(Xi)θ)].

Theorem 5. (i) Under conditions of Theorem 2 (i) or Theorem 3 (i), assume ro = 1 and Λ to be positive definite, then

2nm2Cnml(θ)DZd+1Λ11/2Λ1Λ11/2Zd+1,Zd+1~N(0,Id+1).

(ii) Assume (C4), then

2nη2m2Cnmη12lh(θ)Dχ12.

When m = 1, Λ11/2=Λ1/2 and the above result for U-statistic automatically reduces to that for the common EL ratio, and the right hand side in Theorem 5(i) is χd+12 (see the corresponding result in Theorem 2 of Qin and Lawless [28]), therefore with side information incorporated in the likelihood ratio, the length of confidence region for θ cannot be reduced, this is an interesting contrast to the estimation with side information, in which the asymptotic variance is reduced. However, using the EL ratio, the shape of the confidence region is more natural than many other commonly used methods, such as the normal approximation, which are forced to be symmetric. The latter method may have poorer coverage probability because of the shorter interval length and its subjective shape.

Although side information is widely applied in practice to improve performance of estimators via the EL, the following Corollary 3 describes the effects when incorrect side information is used, thus side information should be used with care, and be justified properly before its use.

Corollary 3. If EFmg(X) = δ ≠ 0, then

  1. Under conditions of Theorem 1 (i),
    Ũnθa.s.AΩ1δ.
  2. Under conditions of Theorem 2 (i),
    n(ŨnθAΩ1δ)DN(0,σ2).
  3. If EFmG(X) = δ ≠ 0, then under conditions of Theorem 5 (i),
    2nCnmRG(θ)DZd+1Λ11/2Λ1Λ11/2Zd+1,Zd+1~DN(nΛ11/2δ,Id+1),
    when Λ = Λ1, Zd+1Λ11/2Λ1Λ11/2Zd+1=Zd+12(nδΛ1δ), the chi-squared distribution of degree d + 1 with noncentrality parameter nδ′Λ−1δ.

5. Examples and simulation studies

5.1. Examples

In this section we give some examples for illustration.

Example 1. For a given distribution F, let θ(F) = ∫ (x − μ)2dF(x) be the variance, where μ is the mean. Let μk, k ≥ 2 be the k-th moment of F. For the kernel h(x1, x2) = (x1x2)2/2, we have 1(x1) = [(x1 − μ)2 − θ]/2, η2 = E(h2) − θ2 = (μ4 + θ2)/2, η12=E(12)=(μ4θ2)/4. Without side information, the asymptotic variance of Un based on kernel h(x1, x2) is σ02=4η12=μ4θ2, which is the same as that for the sample variance estimator θn(n1)1i=1n(Xi)2.

If we know that F has median at 0: F (0) = 1/2, we take g(x1, x2) = [I(x1 ≤ 0) + I(x2 ≤ 0)]/2 − 1/2. Then 1(x1) = [I(x1 ≤ 0) − 1/2]/2, A1=E(11)=[0(xμ)2dF(x)θ/2]/4, and Ω1=E(12)=1/16. So by Theorem 3(i), the asymptotic variance of Ũn is now σ2=σ02A12Ω11=4η12[0(xμ)2dF(x)σ2/2]2, a deduction of [0(xμ)2dF(x)σ2/2]2 from σ02.

Example 2. For the Wilcoxon one-sample statistic, θ(F) = PF (x1 + x2 ≤ 0), the kernel for the corresponding U-statistic is h(x1, x2) = I(x1 + x2 ≤ 0), 1(x1) = F (−x1) − θ, η12=EF(1(x1))=F2(x)dF(x)θ2. Without side information, the asymptotic variance of Un based on kernel h(x1, x2) is σ02=4η12.

Suppose we know the distribution is symmetric about a > 0: F (xa) = 1 − F (ax) for all x. Take g(x1, x2) = [I(x1 ≤ 0) + I(x1 ≤ 2a) + I(x2 ≤ 0) + I(x2 ≤ 2a)]/2 − 1, then 1(x1) = [I(x1 ≤ 0) + I(x1 ≤ 2a)]/2 − 1/2, Ω1 = F (−a)/2, A1=[aF(x)dF(x)+aF(x)dF(x)]/2F(x)dF(x)/2. The deduction of asymptotic variance is A12Ω1.

Example 3. For the Gini difference, θ(F) = EF |x1x2|, the corresponding kernel for U-statistic is h(x1, x2) = |x1x2|. We have 1(x1)=x1xdF(x)x1xdF(x)θ,η12=(x1xdF(x)x1)2dF(x1)θ2. Without side information, the asymptotic variance of Un based on kernel h(x1, x2) is σ02=4η12.

If we know the distribution mean μ, and take g(x1, x2) = (x1 + x2)/2 − μ, then 1(x1) = (x1 − μ)/2, Ω1 = ∫(x − μ)2dF(x), A1={x1[x1xdF(x)x1xdF(x)]dF(x1)θ}/2. The deduction of asymptotic variance is A12Ω1.

5.2. Simulation studies

Simulation studies are conducted to assess the finite-sample performance of the proposed methods in this section. These studies are based on Examples 1 and 2 in Section 5.1. We compare variance estimates of U-statistics with and without side information, and calculate the variance reduction under different sample sizes. We also compare various U-EL based and normal approximation-based confidence intervals for θ in terms of coverage probability. Although side information does not have effect asymptotically, as indicated by Theorem 5, the finite sample property of constructing confidence intervals using the U-EL ratio is still of interest, and will be compared with those obtained through normal approximation based method.

Based on the U-EL theory developed in Section 4, we can construct three U-EL based intervals for θ as follows:

The first one, called EL1 interval, is defined as

{θ:2nm2Cnml(θ)q1α}

where q1−α is the (1 − α)-th quantile of the distribution of Zd+1Λ11/2Λ1Λ11/2Zd+1. q1−α can be estimated by using the sample estimates of Λ1 and Λ and Monte Carlo method.

One can also approximate the quantile of the distribution of l(θ) by using bootstrap method. Let {lb*(θ̂):b=1,,B} (B ≥ 200 is recommended) are B bootstrap replicates of l(θ). Then, the second EL-based interval for θ, called EL2, is given by

{θ:l(θ)l([B(1α)])*(θ̂)},

where l(b)*(θ) is the b-th ordered value of lb*(θ)’s, and [x] represents the integer part of x.

The third one, called EL3 interval, is constructed as follows:

{θ:c*l(θ)χd+1,1α2},

where c*=d+11Bb=1Blb*(θ̂). This interval is motivated by the fact that the distribution of Zd+1Λ11/2Λ1Λ11/2Zd+1 can be approximated by a scaled chi-squares distribution. i.e.,

c·l(θ)Dχd+12,

where c is an unknown constant, and cE(χd+12)E(l(θ))=d+1E(l(θ)).

The asymptotic normal distribution obtained in Theorem 2 can be used to construct two additional confidence intervals for θ, called AN1 and AN2 intervals, as follows:

{Ũnz1α/2σ̂/n,Ũn+z1α/2σ̂/n},{Ũnz1α/2σ̂*/n,Ũn+z1α/2σ̂*/n},

where σ̂ is the estimate of σ by plugging the sample estimates of all population quantities in Theorem 2. σ̂* is the bootstrap estimate of σ based on B bootstrap samples. For computation consideration, we take B = 200 in the simulation studies.

Examples 1 and 2 in Section 5.1 are considered in the simulation study. In the first example, the underlying distribution is chosen to be a skewed distribution with median 0. Here we take X ~ exp(1) − ln(2), the standard exponential distribution with a shifted center. Then EX = 1 − ln 2, Median(X) = 0, and θ = Var(X) = 1. In the second example, we consider a symmetric distribution with mean a. We choose X ~ 𝒩(1, 4), then θ=Φ(14), where Φ(x) is the cdf of the standard normal distribution. The simulation results are presented in Tables 14.

Table 1.

The asymptotic variance estimation of U-statistics. X ~ exp(1) − ln(2).

Method n = 50 n = 100 n = 150 n = 200
Without side information 8.5239 7.8569 7.3839 7.1557
With side information 8.4572 7.5524 7.2673 7.0791
Variance reduction 0.0667 0.3045 0.1165 0.0766

Table 4.

Coverage probabilities of various 95% confidence intervals for θ with side information. X ~ 𝒩(1, 4).

Sample size EL1 EL2 EL3 AN1 AN2
n = 50 0.876 0.983 0.918 0.956 0.945
n = 100 0.882 0.981 0.931 0.961 0.947
n = 150 0.926 0.978 0.968 0.978 0.968
n = 200 0.942 0.956 0.984 0.970 0.954

Tables 1 and 2 show the estimated asymptotic variances of U-statistics with or without side information respectively under sample size n = 50, 100, 150, 200. The reduction of variance is also calculated. The results are based on 1000 repetitions.

Table 2.

The asymptotic variance estimation of U-statistics. X ~ 𝒩(1, 4).

Method n = 50 n = 100 n = 150 n = 200
Without side information 0.2413 0.2208 0.2199 0.2203
With side information 0.0548 0.0526 0.0527 0.0572
Variance reduction 0.1865 0.1682 0.1673 0.1631

Tables 3 and 4 show the coverage probabilities of the EL-based intervals (EL1, EL2 and EL3) and the normal approximation-based intervals (AN1 and AN2) with side information.

Table 3.

Coverage probabilities of various 95% confidence intervals for θ with side information. X ~ exp(1) − ln(2).

Sample size EL1 EL2 EL3 AN1 AN2
n = 50 0.942 0.949 0.994 0.783 0.784
n = 100 0.929 0.950 0.991 0.858 0.878
n = 150 0.934 0.954 0.990 0.872 0.880
n = 200 0.950 0.949 0.989 0.898 0.904

The simulation results show that the proposed U-statistic Ũn performs well in finite sample cases. From Tables 1 and 2 we can clearly see a reduction of the variance of estimating θ. The variance reduction can be significant, as in Example 2, which shows that the proposed method could offer a more accurate estimation.

From the coverage probabilities in Tables 3 and 4, we see that the U-EL based confidence intervals work significantly better than the normal approximation-based confidence intervals when the underlying distribution is a skewed distribution (Example 1). When the underlying distribution is a symmetric distribution (Example 2), the performances of these methods of methods are comparable. Furthermore, in most cases, bootstrap-based methods work better than plugin methods.

Concluding remarks

We studied a method to implement side information into the U-statistic, via the empirical likelihood approach, and investigated some asymptotic behavior of the proposed method. We show, for parameter estimation, the proposed U-statistic with side information has advantages, such as smaller asymptotic variance, over that without side information incorporated. We also explored the construction of confidence intervals using U-statistic based empirical likelihood ratio. Although such U-EL ratio does not benefit from side information asymptotically, our simulation studies show that the corresponding confidence intervals still out perform those based on normal approximation in finite sample cases. We also note that, if incorrect side information is incorporated, the resulting estimates can be seriously biased. Thus in practice the incorporation of side information should be justified properly.

Acknowledgments

This work is supported in part by the National Center for Research Resources at NIH grant 2G12RR003048. Dr. Gengsheng Qin’s work is supported in part by US NSA grant H98230-12-1-0228. Dr. Wenqing He’s work is partially supported by the Natural Sciences and Engineering Research Council of Canada.

Appendix

Proof of the Lemma. (i) As in [26], write t = tn = ρne with ρn ≥ 0 and e = e(X1, …, Xn) a d-vector with ‖e‖ = 1. We first find the asymptotic order of tn. Denote b(t)=(Cnm)1iDn,mg(Xi)/(1+tg(Xi)), Zn = max{|eg(Xi)| : iDn,m}, and n = max{|eg(i)| : in,m} with |n,m|=Cnm, ∀i and jn,m, i and j have no common entry, and i = (i1, …, im), i1, …, im i.i.d. with X1 for all i1, …, im (i = 1, …, n). Since n is a maximum over Cnm i.i.d. samples, while Zn is that over Cnm dependent samples come from the same distribution, for large n, we have Znn (a.s.). Since Eeg(X)‖α < ∞, n=o((Cnm)1/α) (a.s.) as in [26], and so Zn=o((Cnm)1/α) (a.s.). We have

0=b(ρne)|eb(ρne)|=1Cnm|e(iDn,mg(Xi)ρniDn,mg(Xi)eg(Xi)1+ρneg(Xi))|ρnCnmiDn,m[eg(Xi)]21+ρneg(Xi)|1CnmiDn,meg(Xi)|ρn1+ρnZn1CnmiDn,m[eg(Xi)]2|1CnmiDn,meg(Xi)|=ρn1+ρnZne(1CnmiDn,mg(Xi)g(Xi))e|e(1CnmiDn,mg(Xi))|ρn1+ρnZneR1,ne|eR2,n|.

Below we will show, for some 0 < c < C < ∞, for all large n, uniformly for all the Xi’s and i’s,

c<eR1,ne=Ca.s.;andeR2,n={O(n1/2(log logn)1/2)a.s.ifro=1,o(nro/2logn)a.s.ifro>1. (A.1)

In fact, R1,n is a (matrix valued) U-statistic with a.s. limit 0 < E[g(X)g′(X)] = Ω < ∞, where the “0 <” is in the matrix positive definite sense and the “<∞” is in the componentwise sense. Let 0 < λ1 ≤ ⋯ ≤ λd < ∞ be all the eigenvalues of Ω, we have Ω = Q′diag(λ1, …, λd)Q with Q being orthonormal. Denote η = Qe, then η′η = 1. Then for large n, R1,n > Ω/2 (a.s.) thus eR1,ne > e′Ωe/2 = η′diag(λ1, …, λd)η/2 ≥ λ1/2 ≔ c > 0 (a.s.). Similarly, for large n, R1,n < 2Ω (a.s.) and eR1,ne < 2λdC (a.s.).

Since ‖e‖ = 1, we only need to prove the second assertion in (A.1) for R2,n. Note R2,n is a (vector) U-statistic with kernel g(x) satisfying E(g(X)) = 0. Recall gc is the canonical forms of g, let R2,nc be the corresponding Hoeffding forms of R2,n(c = 1, …, m). By the given condition we have E(‖gc (X)‖2) ≤ Eg(X)‖2 < ∞, (c = ro, …, m), Eg(X)‖4/3 < ∞, and R2,n=c=romCmcR2,nc in componentwise sense (component j in R2,nc is zero for c = ro, …, rj −1 if rj ≔ rank(gj) > ro). If ro = 1, let η2,12E(g1(X)2), by Theorem 9.1.1 in KB, we get

limnsup(2m2η2,12log lognn)1/2|R2,n|=1(a.s.).orR2,n=O(n1/2(log logn)1/2)(a.s.).

If ro > 1, by Lemma 9.2.1 in KB,

nc/2lognR2,nc0(a.s.),(c=ro,,m);and soR2,n=o(nro/2logn)(a.s.).

Now, since R1,n = O(1) (a.s.) with 0 < O(1) < ∞ and Zn=O((Cnm)1/α)=O(Cm/α), we have, for ro = 1, ρn/(1 + ρnZn) = O(|eR2,n|) = O(n−1/2(log log n)1/2) (a.s.), or ρn(1 − o(n−(1/2−m/α)(log log n)1/2)) = O(n−1/2(log log n)1/2) (a.s.). For ro > 1, ρn(1 − o(nm/α−ro/2 log n)) = o(nro/2 log n) (a.s.). Thus we have

tn=ρn={O(n1/2(log logn)1/2),ro=1;o(nro/2logn),1<rom(a.s.).

Since for all i, |tng(Xi)|tnZn=o(ρnnm/α)0 (a.s.), thus for large n, maxiDn,m |tg(Xi)| < 1 (a.s.), so we have

0=iDn,mg(Xi)1+tg(Xi)=iDn,m(g(Xi)j=0(1)j[tg(Xi)]j)=iDn,m(g(Xi)[1tg(Xi)+O[tg(Xi)]2])=iDn,m(g(Xi)[1tg(Xi)+g(Xi)2O(ρn2)]),(a.s.)

or

iDn,mg(Xi)g(Xi)t=iDn,m[g(Xi)+g(Xi)2g(Xi)O(ρn2)],(a.s.)

thus

t=(iDn,mg(Xi)g(Xi))1iDn,mg(Xi)+O(ρn2)(iDn,mg(Xi)g(Xi))1iDn,mg(Xi)2g(Xi)Bn+O(ρn2)(iDn,mg(Xi)g(Xi))1iDn,mg(Xi)2g(Xi)(a.s.).

We have already shown R2,n=(Cnm)1iDn,mg(Xi)=O(ρn) (a.s.). Also g(·)g′(·) is non-degenerate by (C1), also since m ≥ 2, Eg(X)‖4 < ∞ by (C2), thus by the law of iterated logarithm (LIL) for U-statistics, (Cnm)1iDn,mg(Xi)g(Xi)=Ω+O(n1/2(log logn)1/2) (a.s.), hence

Bn=[Ω1+O(n1/2(log logn)1/2)]1CnmiDn,mg(Xi)=Ω11CnmiDn,mg(Xi)+O(ρnn1/2(log logn)1/2)=O(ρn)(a.s.).

Similarly,

O(ρn2)(iDn,mg(Xi)g(Xi))1iDn,mg(Xi)2g(Xi)=O(ρn2)Ω1E[g(X)2g(X)]=o(Bn),(a.s.)

so,

t=tn=Bn+O(ρn2)=Ω11CnmjDn,mg(Xj)+O(ρnn1/2(log logn)1/2)+O(ρn2)(a.s.).

From this we get, (a.s.),

wi=1Cnm11+tg(Xi)=1Cnm[1tg(Xi)+g(Xi)2O(ρn2)]=1Cnm(1g(Xi)Ω11CnmjDn,mg(Xj)+1dg(Xi)O(ρnn1/2(log logn)1/2)+(1dg(Xi)+g(Xi)2)O(ρn2)).

(ii) As in the proof of (i), we only need to show the results regardless of e. Treating gg′ and R1,n as vectors of length d2. Under the given conditions, note gg′ is non-degenerate, by the central limit theorem (CLT) for U-statistics,

n(R1,nEgg(X))DN(0,Ξ),orR1,n=Egg(X)+Op(n1/2),

where Ξ is determined by gg′. Similarly, for R2,n, since Eg(X) = 0, if ro = 1, by standard U-statistics theory, nR2,n is asymptotical normal, or R2,n = Op(n−1/2). If ro > 1, note rom, the given conditions implies E|gc (x1, …, xc)|2c/(2cro) < ∞ for c = ro, …, m. So, by Theorem 4.4.1 in KB, nro/2R2,nDR2 for some non-degenerate random variable R2, thus R2,n = Op(nro/2). So, as in the proof of (i), we get ρn(1 − oP (nm/α−ro/2)) = Op(nro/2). Since oP (nm/α−ro/2) = op(1), we have ‖tn‖ = ρn = Op(nro/2).

Going through the proofs in (i), with O(n−1/2(log log n)1/2) replaced by Op(n−1/2), we get t=tn=Ω11CnmjDn,mg(Xj)+Op(ρnn1/2), and, with ρn = nro/2,

wi=1Cnm(1g(Xi)Ω11CnmjDn,mg(Xj)+1dg(Xi)Op(ρnn1/2)+(1dg(Xi)+g(Xi)2)Op(ρn2)).

Proof of Theorem 1. (i) By Lemma (i), we have

Ũn=Un(1CnmiDn,mg(Xi)h(Xi))Ω1(1CnmjDn,mg(Xj))+O(ρnn1/2(log logn)1/2)×1CnmiDn,m1dg(Xi)h(Xi)+O(ρn2)1CnmiDn,m(1dg(Xi)+g(Xi)2)h(Xi)(a.s.).

By the given conditions and the SLLN of U-statistics, Una.s.θ,

U0,n1CnmiDn,mg(Xi)a.s.EFmg(X)=0,U1,n1CnmiDn,mg(Xi)h(Xi)a.s.A<,

where A = EFm[g(X)h(X)], and

U2,n1CnmiDn,m(1dg(Xi)+g(Xi)2)h(Xi)a.s.EFm[(1dg(X)+g(X)2)h(X)]<.

Since by the given conditions, EFm|c (X)|γ < ∞ and EFm |c (X)| < ∞ for γ = cp/(p(c − 1) + 1), c = 1, …, m and 1 < p < 2, so by Corollary 3.4.1 in KB, n−1/p+1(Un − θ) → 0 (a.s.), or nq(Un − θ) → 0 (a.s.) and nqU0,n → 0 (a.s.) for all q < 1/2, and consequently, for all q < 1/2,

nq(Ũnθ)=nq(Unθ)O(nqU0,n)+O(ρnn(1/2q)(log logn)1/2)+O(nqρn2)0(a.s.).

(ii) Use notations in (i), we have

Ũn=UnU1,nΩ1U0,n+O(ρnn1/2(log logn)1/2)U1,n+O(ρn2)U2,n,(a.s.).

Recall for any U-statistic Un with rank r and canonical forms c (c = r, …, m) the following decomposition holds

Unθ=c=rmCmcUn,c,Un,c=(Cnc)11i1<<icnc(xi1,,xic).

Since Ehc2< for c = r, …, m, by Lemma 9.2.1 in KB, Un,c = o(nc/2 log n) (a.s.), so Un = o(nr/2 log n) (a.s.). Thus, when r1 = r0 = 1, U1,n = O(1) (a.s.), U0,n = o(nq) (a.s.) for any q < 1/2, and U2,n = O(1) (a.s.) as its kernel is always non-degenerate, so by Lemma (i) we have

Ũnθ=o(nr/2logn)+o(nq)+O(n1log logn)+O(n1log logn)=o(nq),q<1/2.

When r1 > r0 = 1, by Lemma 9.2.1 in KB, U1,n = o(nr1/2 log n) (a.s.), note O(n−1 log log n) = o(n−1 log n), and so

Ũnθ=o(nr/2logn)+o(n(r1/2+q)logn)+o(n(1+r1/2)logn(log logn)1/2)+O(n1log logn)=o(nmin{r/2,(r1+1)/2,1}logn)=o(nmin{r/2,1}logn).

When 1 = r1 < r0,

Ũnθ=o(nr/2logn)+o(n(r0/2)logn)+o(n(1+ro)/2logn(log logn)1/2)+o(nro(logn)2)=o(nmin{ro,r}/2logn);

and when r1, r0 > 1,

Ũnθ=o(nr/2logn)+o(n(r0+r1)/2(logn)2)+o(n(1+ro+r1)/2(logn)2(log logn)1/2)+o(nro(logn)2)=o(max{nr/2logn,nmin{(r0+r1)/2,r0}(logn)2}).

(iii) Using Lemma (i) and notations in the proof of (i), we have, a.s.,

Ũnθ=1CnmiDn,m(h(Xi)θAΩ1g(Xi))(U1,nA)Ω1U0,n+O(ρnn1/2(log logn)1/2)U1,n+O(ρn2)U2,n.

Recall that U0,n = o(nq), U1,nA = o(nq) (a.s.) for all 0 < q < 1/2, and U2,nC2 (a.s.) for some C2 < ∞. We have, a.s., for all 0 < q < 1/2,

Ũnθ=1CnmiDn,m(h(Xi)θAΩ1g(Xi))+o(n2q)+O(ρnn1/2(log logn)1/2)+O(ρn2).

By Theorem 9.1.1 of KB, the LIL holds for the first term above, and the above equation gives the desired result.

Proof of Theorem 2. (i) Using the fact that U1,nA (a.s.) and U2,nC2 (a.s.) for some C2 < ∞ as proved in Theorem 1(i), thus by Lemma (ii),

n(Ũnθ)=n1CnmiDn,m(h(Xi)θAΩ1g(Xi))n(U1,nA)Ω1U0,n+OP(nro/2)U1,n+OP(n(ro1/2))U2,n.

The second term above is, for all 0 < q < 1/2, n1/2O(n−2q) = oP (1); the third term above is OP (nro/2) as U1,nA(a.s.) < ∞; and the last term above is OP (n−(ro−1/2)) as U2,nC (a.s.) for some C < ∞; Thus we only need to deal with the first term above.

Let

H(x)=h(x)1(x1)1(xm)θ,G(x)=g(x)1(x1)1(xm),

then H1(x1) ≔ E[H(X)|X1 = x1] = 0, i.e. H(x) is a degenerate kernel. Similarly, G(x) is degenerate, so is K(x) = H(x) − A′Ω−1G(x), with EFmK(X) = 0 and rk ≔ rank(K) ≥ 2. Now we have

n(Ũnθ)=n1CnmiDn,m(1(Xi1)++1(Xim)AΩ1[1(Xi1)++1(Xim)])+n1CnmiDn,mK(Xi)+OP(n1/2)+oP(1)=nmni=1n(1(Xi)AΩ11(Xi))+n1CnmiDn,mK(Xi)+OP(n1/2)+oP(1).

Let c be the canonical forms of K, and ξc2=EFmKc2(X)< by the given conditions, (c = rk, …, m). So by Hoeffding’s formula,

Var(n1CnmiDn,mK(Xi))=nc=rkm(Cmc)2(Cnc)1ξc2=O(n(rk1))0,

and so n(Cnm)1iDn,mK(Xi)P0. We get

n(Ũnθ)=nmni=1n(1(Xi)AΩ11(Xi))+oP(1)+OP(n1/2).

Since Var[m(1(X1) − A′Ω−1 1(X1))] = σ2 if ro=1,and=m2η12 when ro > 1 (in this case 1(·) ≡ 0). Now the result follows from the standard CLT and Slutsky’s theorem.

(ii) We have, since U2,n = OP (1),

Ũnθ=UnθU1,nΩ1U0,n+OP(n(ro+1)/2)U1,n+OP(nr0).

By Theorem 4.4.2 in KB, nr/2(Unθ)DCmrJr(r), or Un − θ = OP (nr/2). Similarly in summary we have

Unθ=Op(nr/2),U0,n=OP(nr0/2),U1,nA=OP(nr1/2).

Also, U1,nA (a.s.), and when r0 = 1, nU0,nDN(0,m2Ω1).

First we consider the case A0. In this case, Un − θ = OP (nr/2), U1,nΩ1U0,n=OP(nro/2) and OP (n−(ro+1)/2)U1,n + OP (nro) = oP (nro/2).

Thus when ro < r, we have

ro1/2(Ũnθ)=U1,nΩ1nro/2U0,n+oP(1)DCmroAΩ1Jro(ro).

When ro = r,

nr/2(Ũnθ)=nr/2(Unθ)U1,nΩ1nr/2U0,n+oP(1)DCmrJr(rAΩ1r).

When ro > r,

nr/2(Ũnθ)=nr/2(Unθ)+oP(1)DCmrJr(r).

Now we consider the case A = 0, then U1,nΩ1U0,n=OP(n(r1+ro/2), and

Ũnθ=UnθU1,nΩ1U0,n+OP(n(r1+ro+1)/2)+OP(nr0).

When ro ≤ min{r1, r/2}, Ũn − θ = OP (nro), and its distribution needs more accurate expansion to evaluate. When r < min{2ro, r1 + ro},

nr/2(Ũnθ)=nr/2(Unθ)+oP(1)DCmrJr(r).

When r1 + ro < r or r1 < ro,

n(ro+1)/2(Ũnθ)=nr1/2U1,nΩ1nro/2U0,n+oP(1)DCmr1CmroJr1(r1)Ω1Jro(ro).

When r1 + ro = r,

nr/2(Ũnθ)=nr/2(Unθ)nr1/2U1,nΩ1nro/2U0,n+oP(1)DCmrJr(r)Cmr1CmroJr1(r1)Ω1Jro(ro).

Proof of Corollary 1. In this case we have h1(X1) = a[X1 + (m − 1)μ], 1(X1) = a(X1 − ν), η12=a2τ2. Also, g1(X1) = b(X1 − μ) = 1(X1), A1 = abτ2, A=abE[k=1mXkk=1m(Xkμ)]=mabτ2, Ω1 = b2τ2, Ω=b2E[k=1m(Xkμ)k1m(Xkμ)]=mb2τ2 and ro = 1. So by Theorem 2(i) we have σ2 = m2(a2τ2 − 2a2τ2 + a2τ2) = 0.

Proof of Theorem 3. (i) Note θ = EFmh(X). The information bound is for parameter of the form EF (s(X1)) for some s(·). Recall h1(x1) = E[h(X1, …, Xm)|X1 = x1] and EF (h1(X1)) = θ, thus we take s(·) = h1(·). Similarly, the constraint for computing the information bound should be a uni-variate function, we take it to be g1(x1).

Let f (x) be the density/mass function of F (x) with respect to some dominating measure μ(x), denote γ (f) = ∫h1(x)f (x)dμ(x) = θ as a functional of f, γ̇ (f)(x) be the adjoint (evaluated at 1) of its pathwise derivative with respect to log f (for definition, see, for example, [6]), γ1(f) = Ef [g1(X)] for the side information constraint, γ̇1(f)(x) the adjoint (evaluated at 1) of its pathwise derivative, L2,d,r (f) = {s(x) : s : RdRr, Ef [s(X)s′(X)] < ∞}, for s1L2,d,k(f) and s2L2,d,r (f), define the inner product (matrix) s1,s2=Ef[s1(X)s2(X)]=s1(x)s2(x)f(x)dμ(x), the norm (matrix) ‖s12 = 〈s1, s1〉 and ‖s1−2 ≔ (‖s12)−1 when ‖s12 is non-degenerate.

By Proposition A.5.2 in [6], we have γ̇ (f) = h1(X) − θ = 1(X) and γ̇1(f) = g1(X) = 1(X). Let ∏(υ|υ1) be the projection of υ onto [υ1], the linear span of υ1 with respect to f and μ, and υ1 the orthogonal complement of [υ1] with respect to f and μ. Without side information, the efficient influence function ℐ(X, γ (f)) for estimating γ (f) is ℐ(X, γ (f)) = γ̇ (f) and the information bound is ‖ℐ(X, γ (f))‖2. In the presence of side information γ1(f), by Example 3.2.3 in [6], the efficient influence function ℐ(X, γ (f)|γ1(f)) for estimating γ (f) is

(X,γ(f)|γ1(f))=Π(γ̇(f)|γ̇1(f))=γ̇(f)Π(γ̇(f)|γ̇1(f))=γ̇(f)γ̇(f),γ̇1(f)γ̇1(f)2γ̇1(f)=1(X)A1Ω111(X)

and the information (lower) bound for estimating θ, with side information g, is 𝕀(θ|g)=(X,γ(f)|γ1(f))2=η12A1Ω11A1.

When g(x) = (g(x1) + ⋯ + g(xm))/m, we have 1(x1) = g1(x1) = E[g(x1, …, xm)|x1] = g(x1)/m, A = E[g(X)h(X)] = E[g(X1)h1(X1)] = mE[g1(X1)h1(X1)] = mE[ 1(X1)1(X1)] = mA1, Ω=E[g(X)g(X)]=E[g(X1)g(X1)]/m=mE[1(X1)1(X1)]=mΩ1, thus

σ2=m2(η122AΩ1A1+AΩ1Ω1Ω1A)=m2𝕀(θ|g).

Since m2 is a known positive constant, we can just divide Ũn by m so that its asymptotic variance is 𝕀(θ|g), and thus it is efficient.

Since σ2 = ‖∏(γ̇ (f)| γ̇1(f))‖2 ≥ 0, with “=” iff γ̇ (f) = 1(X) ∈ [γ̇1(f)], the linear span of γ̇1(f) = 1(X), or θ is completely determined by 1(X), which is impossible. Also (γ̇(f)|γ̇1(f))2γ̇(f)2=η12, with “=” iff γ̇ (f) ∈ [γ̇1(f)], or 0 = 〈γ̇ (f), γ̇1(f)〉 = A.

(ii) Let f (x|θ, g) be the density function given the parameter θ and the information constraint g, S(x|θ, g) = ∂ log f (x|θ, g)/∂θ be the corresponding score function. The corresponding Fisher information is I(θ|g) = ‖S(X|θ, g)‖2. Although S(x|θ, g), hence I(θ|g), is not directly available, the corresponding efficient influence function ℐ(X, γ (f)|γ1(f)) is given in (i), and we have the following relationship between the information bound 𝕀(θ|g) and the Fisher information I(θ|g)

𝕀(θ|g)=(X,γ(f)|γ1(f))2=η12AΩ1A=I1(θ|g).

Let L(Xn|θ,g)=i=1nlogf(Xi|θ,g) be the log-likelihood, we have the following locally asymptotic normality [22] of the likelihood ratio

λnL(Xn|θn)L(Xn|θ)=bVnb2I(θ|g)/2+oP(1),

where Vn=n1/2i=1nS(Xi|θ,g)DV~N(0,I(θ|g)).

Let ϕY (t) = E[exp{itY}] be the characteristic function of a random variable Y. We are to show limn ϕWn (t) = ϕU (tZ (t). In fact, by assumption of regularity,

ϕWn(t)=Ef(·|θ)[exp{itWn}]=Ef(·|θn)[exp{it(Wnb)}]=Ef(·|θ)[exp{it(Wnb)+λn}]E[exp{it(Wb)+bVb2I(θ|g)/2}],

where the last step above is by the same argument as in [4]. Since bC is arbitrary, take b = −itI−1(θ|g), we get

it(Wb)+bVb2I(θ|g)/2=it(WI1(θ|g)V)I1(θ|g)t2/2=it(W𝕀(θ|g)V)𝕀(θ|g)t2/2,

thus

limnϕWn(t)=E[exp{it(W𝕀(θ|g)V)}]exp{𝕀(θ|g)t2/2}=ϕW𝕀(θ|g)V(t)ϕZ(t).

Now take U = W − 𝕀(θ|g)V, the proof is complete.

Proof of Theorem 4. (i) Denote the related U-statistics as functions of h, and note the O(·) terms in the lemma are independent of h. Note U1,n is a functional of gh, U2,n is a functional of (1dg+g2)h, θ and Un are functionals of h, and U0,n is a functional of g. As in the proof of Theorem 1(ii), we have

n,mhPmh=Ũn(h)θ(h)=Un(h)θ(h)U1,n(gh)Ω1U0,n(g)+O(ρnn1/2(log logn)1/2)U1,n(gh)+O(ρn2)U2,n((1dg+g2)h),(a.s.).

Since U0,n(g) → 0 (a.s.) and is independent of h, we only need to show, a.s.,

suph|Un(h)θ(h)|0;suph|U1,n(gh)|<,;andsuph|U2,n((1dg+g2)h)|<.

In fact, since gh ∈ ℋ for all h ∈ ℋ, and U1,n(h) = Un(h), we have suph∈ℋ |U1,n(gh)| ≤ suph∈ℋ |U1,n(gh) − Pm(gh)| + suph∈ℋ |Pm(gh)| ≤ suph∈ℋ |U1,n(h) − Pm(h)| + suph∈ℋ |Pm(h)| = suph∈ℋ |Un(h) − θ(h)| + suph∈ℋ |Pm(h)|. Since ℋ has an integrable envelope H, suph∈ℋ |Pm(h)| ≤ Pm(suph∈ℋ |h|) ≤ Pm(H) < ∞. Thus, suph∈ℋ |U1,n(gh)| < ∞ (a.s.), if suph∈ℋ |Un(h) − θ(h)| → 0 (a.s.).

Similarly, since ‖g2h ∈ ℋ for all h ∈ ℋ, and U2,n(h) = Un(h), we have suph∈ℋ |U2,n((1dg + ‖g2)h)| < ∞ (a.s.), if suph∈ℋ |Un(h) − θ(h)| → 0 (a.s.).

Now we only need to prove suph∈ℋ |Un(h) − θ(h)| → 0 (a.s.), (the class ℋ is then called P-Glivenko–Cantelli). Since the property of P-Glivenko–Cantelli is permanent for finite union of classes, we only need to prove this on ℋ1 and ℋ2 separately.

We first prove ℋ1 is P-Glivenko–Cantelli. For ε > 0, let N[ ](ε, ℋ1, L1(Pm)) be the bracketing entropy of the class ℋ1 with L1(P) norm: ∀h ∈ ℋ1, ‖hPm = EPm|h|. We first prove that if N[ ](ε, ℋ1, L1(Pm)) < ∞ for all ε > 0, then the conclusion is true. In fact, given ε > 0, since N[ ](ε, ℋ1, L1(Pm)) < ∞, there are finite many ε-brackets [li, ui] whose union covers ℋ1 and such that Pm(uili) < ε for all i. Then for any h ∈ ℋ1, there is an upper bracket ui such that

Un(h)Pm(h)=(Pn,mPm)h=(Pn,mPm)ui+(PmPn,m)(uih)(Pn,mPm)ui+Pm(uih)(Pn,mPm)ui+ε.

Consequently,

suph1(Un(h)Pm(h))maxi(Pn,mPm)ui+ε.

Since by SLLN for U-statistics, (Pn,mPm)ui → 0 (a.s.), thus limn suph∈ℋ1 (Un(h) − Pm(h)) ≤ ε (a.s.). Similarly, limn infh∈ℋ1 (Un(h) − Pm(h)) ≥ −ε (a.s.). Since ε > 0 is arbitrary, we get suph∈ℋ1 |Un(h) − Pm(h)| → 0 (a.s.).

Now we show N[ ](ε, ℋ1, L1(Pm)) < ∞ for all ε > 0. Let Ij1={xRm:xIj<1}. By Corollary 2.7.4 in VW, for some constant K depending only on υ and m,

logN[](ε,1,L1(Pm))Kευ(j=1λ(Ij1)1υ+1Mjυυ+1Pm(Ij)υυ+1)υ+1,

for every ε > 0, υ ≥ 1 and probability P. The given condition implies that maxj λ(Ij1)<. Take υ = 1, the right hand side above is finite by the given condition.

Now we show suph∈ℋ2 |Un(h) − θ(h)| → 0 (a.s.). Let, for ε > 0, N(ε, ℋ2, L1(Q)) be the entropy of ℋ2 without bracketing under norm L1(Q) for some probability measure Q and N(ε, ℋ2, ‖·‖) be that under norm ‖·‖. Recall H is also an envelope function on ℋ2. Since L1(Q) ≤ ‖·‖, N(ε ‖HQ, ℋ2, L1(Q)) ≤ N(ε ‖HQ, ℋ2, ‖·‖), with ‖HQ = (∫ H2dQ)1/2. Let M be the bound on ℋ2, and ℋ̃2 = {(h − infxC h(x))/M : h ∈ ℋ2}, then ℋ̃2 is the class of convex functions h : C ↦ [0, 1] with Lipschitz constant L/M, and N(ε, ℋ2, ‖·‖) = NM, ℋ̃2, ‖·‖). By Corollary 2.7.10 in VW, for any ε > 0,

logN(εM,ℋ̃2,·)K(1+L/M)m/2Mm/2εm/2,

for any probability measure Q, and K only depends on m and C. Also, since H is an envelope function over ℋ2 ≠ {0}, thus infQHQ ≥ δ for some δ > 0, and the infimum is over 𝒬 of all probability measures Q on C, with ‖HQ < ∞. Thus we have, for any ε > 0,

logN(εHPm,2,L1(Pm))supQ𝒬logN(εHQ,2,L1(Q))logN(εδ,2,·)=logN(εδM,ℋ̃2,)K(1+L/M)m/2Mm/2(εδ)m/2<,

also, ℋ2 is P-measurable by its definition, thus ℋ2 is P-Glivenko–Cantelli (cf. the statement in lines −5 to −3, p. 84 of VW).

(ii) The class ℋ with the stated property is called P-Donsker. First, it is apparent that for any k and h1, …, hk ∈ ℋ, (𝔾̃n,mh1,,𝔾̃n,mhk)D(𝔾h1,,𝔾hk) for the Gaussian process 𝔾 on ℋ as stated. So by Theorem 1.5.4 in VW, we only need to show that {𝔾̃n,m} is asymptotically tight on ℋ. Using Lemma (ii) and similar argument as in the proof of (i), since nρn2=o(1) and nρnn1/2(log logn)1/2=o(1), we only need to show this for {𝔾n,m}, and by Theorem 1.5.7 in VW, we only need to show that {𝔾n,m} is asymptotically equicontinuous and totally bounded on ℋ. Below we will show that if

0supQ𝒬logN(εHQ,2,,L2(Q))dε< (A.2)

then {𝔾n,m} is asymptotically equicontinuous and totally bounded on ℋ, where 𝒬 is the collection of all measures Q with ‖HQ < ∞.

With (A.2), Theorem 2.5.2 in VW asserted the corresponding conclusion for empirical measures. Now we extend the result to U-statistics. For this, we point out that the symmetrization Lemma 2.3.1 in VW still holds for U-statistics, also Hoeffding’s inequality holds for U-statistics (Arcones and Giné [2, Proposition 2.3, p. 1501]), thus the proofs there are valid in our situation.

To check (A.2) on ℋ, and we only need to check it for ℋ1 and ℋ2 separately. Using Corollary 2.7.4 in VW, we have

logN[](ε,1,L2(P))Kευ(j=1λ(Ij1)2υ+2Mj2υυ+2Pυυ+2(Ij))υ+22,

for all ε > 0, υ ≥ m/s. Since the given condition implies maxj λ(Ij1)<, in the above inequality choose υ = m/s, then j=1λ(Ij1)2υ+2Mj2υυ+2Pυυ+2(Ij)< by the given condition, and since υ < 2, we have

01logN[](ε,1,L2(P))dε<,

hence by the statement in p. 85 in VW, ℋ1 satisfies (A.2). The original statement in VW is for the integral 0. Since ℋ1 has a square integrable envelope function H, so ∀h1, h2 ∈ ℋ, ‖h1h2L2(P) ≤ ‖h1L2(P) + ‖h2L2(P) ≤ 2‖HL2(P) < ∞, i.e., ℋ1 itself is a ball with radius no greater than 2‖HL2(P), or N[ ](ε, ℋ1, L2(P)) = 1 for ε ≥ 2‖HL2(P), thus its entropy is zero for ε ≥ 2‖HL2(P), so the integration 0 is finite iff 01 is finite.

For ℋ2, similarly as in the proof of (i), for some η > 0,

supQ𝒬logN(εHQ,2,L2(Q))K(1+L/M)m/2Mm/2(εη)m/2.

Since m < 4, so

01supQ𝒬logN(εHQ,2,L2(Q))dεK1/2(1+L/M)m/4Mm/4ηm/401εm/4dε<,

thus by (2.1.7) in VW, ℋ2 is P-Donsker.

Proof of Corollary 2. From proof of Theorem 4(i), we only need to show suph∈ℋ |Pn,mPmh| → 0 (a.s.), which is true by Corollary 3.3, or 3.5 respectively in [2].

Proof of Theorem 5. (i) As in the proofs of the previous theorems, with g replaced by G, since ro ≔ min{rank(g1), …, rank(gd), rank(h)} = 1, by Lemma (ii) we have

wi=1Cnm(1G(Xi|θ)Λ11CnmjDn,mG(Xj|θ)+[1d+1G(Xi|θ)+G(Xi|θ)2]OP(n1)),

and by standard U-statistics theory,

n1CnmjDn,mG(Xj|θ)DN(0,m2Λ1),1CnmiDn,mG(Xi|θ)G(Xi|θ)=Λ+OP(n1/2).

Also, tΛ11CnmjDn,mG(Xj|θ)+OP(n1)=OP(n1/2),1CnmiDn,mG(Xi|θ)2a.s.EG(X|θ)2<, and maxi |tG(Xi|θ)|=OP(n1/2nm/α)P0 since m/α < ro/2 = 1/2, so

2nCnmRG(θ)=2nCnmiDn,mlog[1+tG(Xi|θ)]=2nCnmiDn,m(tG(Xi|θ)12tG(Xi|θ)G(Xi|θ)t+oP(n1)G(Xi|θ)2)=2n1CnmjDn,mG(Xj|θ)Λ11CnmiDn,mG(Xi|θ)nCnmiDn,m(1CnmjDn,mG(Xj|θ)Λ1)×G(Xi|θ)G(Xi|θ)(Λ11CnmjDn,mG(Xj|θ))+oP(n1)nCnmiDn,mG(Xi|θ)2=2n1CnmjDn,mG(Xj|θ)Λ11CnmiDn,mG(Xi|θ)n(1CnmjDn,mG(Xj|θ)Λ1)×1CnmiDn,mG(Xi|θ)G(Xi|θ)(Λ11CnmjDn,mG(Xj|θ))+oP(1)=2n1CnmjDn,mG(Xj|θ)Λ11CnmiDn,mG(Xi|θ)n1CnmjDn,mG(Xj|θ)Λ1×(Λ+OP(n1/2))Λ11CnmiDn,mG(Xi|θ)+oP(1)=n1CnmjDn,mG(Xj|θ)Λ11CnmiDn,mG(Xi|θ)OP(n1/2)+oP(1).

This completes the proof since

nm1Λ11/21CnmiDn,mG(Xi|θ)DN(0,Ir+1).

(ii) This is a special case of (i).

References

  • 1.Adimari G. Empirical likelihood type confidence intervals under random censorship. Annals of the Institute of Statistical Mathematics. 1997;49:447–466. [Google Scholar]
  • 2.Arcones MA, Giné E. Limit theorems for U-processes. Annals of Probability. 1993;21(3):1494–1542. [Google Scholar]
  • 3.Baggerly KA. Empirical likelihood as a goodness-of-fit measures. Biometrika. 1998;85:535–547. [Google Scholar]
  • 4.Begun JM, Hall WJ, Huang W, Weller JA. Information and asymptotic efficiency in parametric–nonparametric models. Annals of Statistics. 1983;11:432–452. [Google Scholar]
  • 5.Berk RH. Limiting behavior of posterior distributions when the model is incorrect. Annals of Mathematical Statistics. 1966;37:51–58. [Google Scholar]
  • 6.Bickel PJ, Klaassen CA, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. Baltimore, Maryland: Johns Hopkins University Press; 1993. [Google Scholar]
  • 7.Borovskich YuV. Institute of Mathematics, Ukraine. Acad. Sci., Kiev. Russia: 1986. Theory of U-Statistics in Hilbert Space. [Google Scholar]
  • 8.Chen SX. Empirical likelihood for nonparametric density estimation. Australian Journal of Statistics. 1997;39:47–56. [Google Scholar]
  • 9.Chen SX, Hall P. Smoothed empirical likelihood confidence intervals for quantiles. Annals of Statistics. 1993;21:1166–1181. [Google Scholar]
  • 10.Chen SX, Qin YS. Empirical likelihood confidence intervals for local linear smoothers. Biometrika. 2000;87:946–953. [Google Scholar]
  • 11.Giné E. Lectures on Probability Theory and Statistics, Sanit Flour 1996, in: Lecture Notes in Mathematics. vol. 1665. Berlin: Springer; 1997. Decoupling and limit theorems for U-statistics and U-processes; pp. 1–35. [Google Scholar]
  • 12.Giné E, Zinn J. Probability, Banach Spaces. vol. 8. Boston: Birkhäuser; 1992. Marcinkiewicz type laws of large numbers and convergence of moments for U-statistics; pp. 273–291. [Google Scholar]
  • 13.Gregory G. Large sample theory for U-statistics and tests of fit. Annals of Statistics. 1977;5:110–123. [Google Scholar]
  • 14.Hájek J. A characterization of limiting distributions of regular estimates. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete. 1970;14:323–330. [Google Scholar]
  • 15.Hoeffding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics. 1948;19:293–325. [Google Scholar]
  • 16.Hoeffding W. The strong law of large numbers for U-statistics. Inst. Statist. Mimeo. Ser. 1961;(302):1–10. [Google Scholar]
  • 17.Janson S. The asymptotic distribution of degenerate U-statistics. Department of Mathematics, University of Uppsala; 1979. pp. 1–17. Preprint No. 5. [Google Scholar]
  • 18.Jing BY, Yuan J, Zhou W. Empirical likelihood for U-statistics. Journal of the American Statistical Association. 2009;104:1224–1232. [Google Scholar]
  • 19.Kitamura Y. Empirical likelihood methods in econometrics: theory and practice. Discussion Paper 1569, Cowles Foundations for Research in Economics. 2006 [Google Scholar]
  • 20.Kolaczyk ED. Empirical likelihood for generalized linear models. Statistica Sinica. 1994;4:199–218. [Google Scholar]
  • 21.Koroljuk VS, Yu V, Borovskich . Theory of U-Statistics. The Netherlands: Kluwer Academic Publishers; 1994. [Google Scholar]
  • 22.LeCam L. Publ. Statist. vol. 3. Univ. California Press; 1960. Locally Asymptotically Normal Families of Distributions; pp. 37–98. [Google Scholar]
  • 23.Nolan D, Pollard D. U-process: rates of convergence. Annals of Statistics. 1987;15:780–799. [Google Scholar]
  • 24.Nolan D, Pollard D. Functional limit theorems for U-process. Annals of Statistics. 1988;16:1291–1298. [Google Scholar]
  • 25.Owen AB. Empirical likelihood ratio confidence intervals for a single functional. Biometrika. 1988;75:237–249. [Google Scholar]
  • 26.Owen AB. Empirical likelihood confidence regions. Annals of Statistics. 1990;18:90–120. [Google Scholar]
  • 27.Owen AB. Empirical likelihood for linear models. Annals of Statistics. 1991;19:1725–1747. [Google Scholar]
  • 28.Qin J, Lawless JL. Empirical likelihood and general estimating equations. Annals of Statistics. 1994;22:300–325. [Google Scholar]
  • 29.Qin GS, Tsao M. Empirical likelihood based inference for the derivative of the nonparametric regression function. Bernoulli. 2005;11:715–735. [Google Scholar]
  • 30.Qin J, Zhang B. Marginal likelihood, conditional likelihood and empirical likelihood: connections and applications. Biometrika. 2005;92:251–270. [Google Scholar]
  • 31.Qin GS, Zhou XH. Empirical likelihood inference for the area under ROC curve. Biometrics. 2006;62:613–622. doi: 10.1111/j.1541-0420.2005.00453.x. [DOI] [PubMed] [Google Scholar]
  • 32.Rubin H, Vitale RA. Asymptotic distribution of symmetric statistics. Annals of Statistics. 1980;8:165–170. [Google Scholar]
  • 33.Sen PK. Almost sure behavior of U-statistics and von Mises’ differentiable statistical functions. Annals of Statistics. 1974;2:387–395. [Google Scholar]
  • 34.Serfling R. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980. [Google Scholar]
  • 35.Thomas D, Grunkemeier G. Confidence interval estimation of survival probabilities of censored data. Journal of the American Statistical Association. 1975;70:865–871. [Google Scholar]
  • 36.van der Vaart A, Wellner J. Weak Convergence and Empirical Processes: With Applications to Statistics. New York: Springer-Verlag; 1996. [Google Scholar]
  • 37.von Mises R. On the asymptotic distributions of differentiable statistical functions. Annals of Mathematical Statistics. 1947;18:309–348. [Google Scholar]
  • 38.Wood ATA, Do KA, Broom NM. Sequential linearization of empirical likelihood constraints with application to U-statistics. Journal of Computational and Graphical Statistics. 1996;5:365–385. [Google Scholar]
  • 39.Zhang B. A note on kernel density estimation with auxiliary information. Communications in Statistics-Theory and Methods. 1998;27:1–11. [Google Scholar]

RESOURCES