U-statistic with side information

Ao Yuan; Wenqing He; Binhuan Wang; Gengsheng Qin

doi:10.1016/j.jmva.2012.04.008

. Author manuscript; available in PMC: 2013 May 21.

Published in final edited form as: J Multivar Anal. 2012 Oct;111:20–38. doi: 10.1016/j.jmva.2012.04.008

U-statistic with side information

Ao Yuan ^a,^*, Wenqing He ^b, Binhuan Wang ^c, Gengsheng Qin ^c

PMCID: PMC3660044 NIHMSID: NIHMS462605 PMID: 23704796

Abstract

In this paper we study U-statistics with side information incorporated using the method of empirical likelihood. Some basic properties of the proposed statistics are investigated. We find that by implementing the side information properly, the proposed U-statistics can have smaller asymptotic variance than the existing U-statistics in the literature. The proposed U-statistics can achieve asymptotic efficiency in a formal sense and their weak limits admit a convolution result. We also find that the corresponding U-likelihood ratio procedure, as well as the U-empirical likelihood based confidence interval construction, do not benefit from incorporating side information, a result that is consistent with the result under the standard empirical likelihood ratio procedure. The impact of incorrect side information implementation in the proposed U-statistics is also explored. Simulation studies are conducted to assess the finite sample performance of the proposed method. The numerical results show that with side information implemented, the deduction of asymptotic variance can be substantial in some cases, and the coverage probability of the confidence interval using the U-empirical likelihood ratio based method outperforms that of the normal approximation based method, in particular in the cases when the underlying distribution is skewed.

Keywords: Efficiency, Information bound, Side information, U-statistic

1. Introduction

Since the pioneering work of Hoeffding [15], the U-statistics have been an active research field in statistics due to their wide range of applications. Hoeffding [16] established some fundamental properties of U-statistics which had close relationship with the V-statistic proposed by von Mises [37]. Berk [5] discovered the reverse martingale structure for U-statistic. Sen (e.g. [33]) made a number of contributions in this topic. Parallel to the result for V-statistics, Gregory [13] obtained the asymptotic distribution for degenerate U-statistics with rank two. The asymptotic distribution of U-statistics with arbitrary rank was developed by Janson [17] and Rubin and Vitale [32], etc. Borovskich [7] extended the results to Hilbert space. A detailed review and major historical developments in this field can be found in the book by Koroljuk and Borovskich [21], hereafter denoted as KB.

The empirical likelihood (EL) is one of the recent major developments in statistics. The original idea can be traced back to Thomas and Grunkemeier [35]. The work of Owen [25–27] formally established the advantages and application scopes of this method, and paved the road of increasing popularity of EL due to the wide range of applications, the theoretical advantages, the simplicity of usage and the flexibility to incorporate auxiliary (or side) information in various forms. EL has been applied in various problems, for example, nonparametric confidence regions [9], the generalized linear model [20], survival analysis [1], density and quantile estimations [8,39], goodness-of-fit measure [3], nonparametric regression [10,29], marginal and conditional likelihood [30], ROC curve [31], econometrics [19], etc. It is well known that incorporating side information via empirical likelihood can reduce asymptotic variance of the estimators [28]. Motivated by this fact, we explore to incorporate side information into the U-statistic using the EL method, and expect that the new procedure can improve the performance of U-statistic under appropriate conditions.

It is also known that constructing confidence regions using EL ratio has various advantages than using normal approximation based method or bootstrap. For example, Wood et al. [38] and Jing et al. [18] have considered the EL method to U-statistics to construct confidence intervals without side information incorporated. We investigate the U-statistic to construct confidence intervals using the empirical likelihood by incorporating side information, and the resulting confidence intervals are compared with those based on normal approximation. Our method of formulating the weights of U-statistics is parallel to those in the EL, and is different from that in [38,18]. We find that by incorporating the side information properly, the proposed U-statistics will have smaller asymptotic variance than the existing U-statistics methods without side information. The proposed U-statistics can achieve asymptotic efficiency in a formal sense, and their weak limits admit a convolution result. We also find that the U-statistic EL based likelihood ratio procedure do not benefit from incorporating the side information asymptotically, a result that is consistent with the result under the standard empirical likelihood ratio procedure. The resulting coverage probability based on finite sample still outperforms that of the normal based approximation. The impact of incorrect side information incorporation is also explored.

In Section 2 we introduce the framework of the proposed U-statistics with side information incorporated, and investigate the basic asymptotic properties of the proposed U-statistics in Section 3. The U-empirical likelihood ratio with side information is formulated in Section 4. Examples and simulations results are given in Section 5 to illustrate the proposed method. All the relevant proofs are left in the Appendix.

2. Incorporating side information in U-Statistics

Let X₁, …, X_n be independent and identically distributed (i.i.d.) random variables with unknown distribution function F (x) = P(X_i ≤ x). In this paper, we assume X_i being random variable for simplicity, although there is no essential difference to extend it to the case of random vectors. Denote X = (X₁, …, X_m)′, (m ≥ 2). Let i = (i₁, …, i_m)′, X_i = (X_i₁, …, X_{i_m})′. D_n,m = {i : 1 ≤ i₁ < ⋯ < i_m ≤ n} denotes the collection of indices for the U-statistic of degree m. Let $C_{n}^{m}$ be the combination number of m elements out of n, x = (x₁, …, x_m)′, $F_{m} (x) = \prod_{j = 1}^{m} F (x_{j})$ and F_n,m(x) be the empirical distribution function of F_m based on the sample 𝒳_n ≔ {X_i : i ∈ D_n,m}, with mass $1 / C_{n}^{m}$ at each point in 𝒳_n. Given an m-variate symmetric kernel h, the U-statistic is defined as

U_{n} = {(C_{n}^{m})}^{- 1} \sum_{i \in D_{n, m}} h (X_{i}) = E_{F_{n, m}} h (X) .

The goal is to estimate θ = E_{F_m}h(X), where E_{F_m} denotes the expectation with respect to F_m. It is known that the U-statistic U_n is the minimal variance unbiased estimator of θ [34, p. 176].

Since the work of Owen [25], the empirical likelihood (EL) has gained increasing popularity due to its wide range of applications, simplicity to use and flexibility to incorporate auxiliary (or side) information. We examine here to combine both the EL method to flexibly incorporate side information and theU-statistics to achieve a better variance for the estimator.

We consider the set-up for EL as in [28]. Suppose the side information can be incorporated into the EL through a d-dimensional known function g(x) = (g₁(x), …, g_d(x))′ via the relationship

E [g (X_{1})] = 0,

where E[·] denotes the expectation with respect to F. The EL is defined as

L (F) = \prod_{i = 1}^{n} w_{i},

where the w_is are the nonparametric maximum likelihood estimated empirical masses assigned to the observation X_is. With the side information constraints, the EL is

max_{w} \prod_{i = 1}^{n} w_{i} subject to \sum_{i = 1}^{n} w_{i} = 1 and \sum_{i = 1}^{n} w_{i} g (X_{i}) = 0 .

Let t = (t₁, …, t_d)′ be the Lagrange multipliers corresponding to the constraint of g(·), and as in [26], we get

w_{i} = \frac{1}{n} \frac{1}{1 + t' g (X_{i})},

where t_j = t_j(X₁, …, X_n)(j = 1, …, d) are determined by

\sum_{i = 1}^{n} \frac{g (X_{i})}{1 + t' g (X_{i})} = 0 .

To combine the EL method and U-statistics, Wood et al. [38] considered a weighted U-statistic

{(C_{n}^{m})}^{- 1} \sum_{1 \leq i_{1} < \dots < i_{m} \leq n} n^{m} w_{i_{1}} \dots w_{i_{m}} h (X_{i_{1}}, \dots, X_{i_{m}})

with weight w(i₁, …, i_m) = w_i₁ ⋯ w_{i_m} being estimated using EL procedure. Jing et al. [18] proposed a Jackknife EL for the U-statistic without side information considered. They first merge the $C_{n}^{m}$ observed h(X_i)’s into a Jackknife sample, then treat this Jackknife pseudo sample as a sample of n i.i.d observations and apply the standard EL method for the mean to obtain the EL estimate for U-statistic.

In this paper, our goal is to estimate θ = E_{F_m}h(X) under the information constraints to incorporate side information in the form

E_{F_{m}} g (X) = 0 .

(1)

Without loss of generality g(·) is assumed symmetric with respect to its arguments (otherwise we can set g(x₁, …, x_m) = 1/m! ∑_(p) g(x_i₁, …, x_{i_m}) to make it symmetric, where the notation ∑_(p) denote summation over the indices (i₁, …, i_m) of all the permutations of (1, …, m)). This function g includes constraints E_F (g(X₁)) = 0 as a special case by setting a componentwise product $g (X) = \prod_{j = 1}^{m} g (X_{j})$ . Some examples of g(·) will be given in Section 5 for illustration.

To formulate the proposed U-statistic we consider a different but a direct way to define the weights w(i₁, …, i_m)’s. Let w_i = F_m({X_i}) and w = (w_i : i ∈ D_n,m). Since the w_i’s are unknown (as is F_m), we maximizes the product of the w_i’s subject to appropriate constraints (they may not be independent of each other). Re-write the EL subject to the side information constraints as

max_{w} \prod_{i \in D_{n, m}} w_{i} subject to \sum_{i \in D_{n, m}} w_{i} = 1 and \sum_{i \in D_{n, m}} w_{i} g (X_{i}) = 0 .

We get, as in [26], that

w_{i} = {(C_{n}^{m})}^{- 1} \frac{1}{1 + t' g (X_{i})},

(2)

and t = t_n = (t_n1, …, t_nd)′ with t_nj = t_nj(X₁, …, X_n)(j = 1, …, d) being determined by

\sum_{i \in D_{n, m}} \frac{g (X_{i})}{1 + t' g (X_{i})} = 0 .

(3)

For details regarding the existence of t as the solution of (3) see, for example, the papers by Owen and others. The proposed weights for U-statistics are parallel to those in the EL, and simpler than some existing method in that there is no need to form a product of m elements from w₁, …, w_n as in [38], nor to merge the data as in [18].

Similar to Hoeffding [15], for any kernel h(·) with E_{F_m}(h(X)) < ∞, let h_c (x₁, …, x_c) = E(h(X₁, …, X_m)|X₁ = x₁, …, X_c = x_c), $h_{c}^{o} = h_{c} - θ$ be its centered version (c = 1, …, m), ${h̃}_{1} (x_{1}) = h_{1}^{o} (x_{1}), {h̃}_{2} (x_{1}, x_{2}) = h_{2}^{o} (x_{1}, x_{2}) - {h̃}_{1} (x_{1}) - {h̃}_{1} (x_{2}), {h̃}_{3} (x_{1}, x_{2}, x_{3}) = h_{3}^{o} (x_{1}, x_{2}, x_{3}) - \sum_{i = 1}^{3} {h̃}_{1} (x_{i}) - \sum_{1 \leq i \leq j \leq 3} {h̃}_{2} (x_{i}, x_{j})$ , and in general,

{h̃}_{c} (x_{1}, \dots, x_{c}) = h^{o} (x_{1}, \dots, x_{c}) - \sum_{i = 1}^{c} {h̃}_{1} (x_{i}) - \sum_{1 \leq i < j \leq c} {h̃}_{2} (x_{i}, x_{j}) - \dots - \sum_{1 \leq i_{1} < \dots < i_{c - 1} \leq c} {h̃}_{c - 1} (x_{i_{1}}, \dots, x_{i_{c - 1}}) = \int \dots \int h_{c} (y_{1}, \dots, y_{c}) \prod_{s = 1}^{c} d (δ_{x_{s}} (y_{s}) - F (y_{s})), (c = 1, \dots, m),

where δ_{x_s} (y_s) is the Dirac function, taking value 1 if y_s = x_s and 0 otherwise. The integration representation above can be found in KB. The h̃_c’s are called canonical forms of h. If h̃₁ = ⋯ = h̃_k−1 = 0 and h̃_k ≠ 0 (or equivalently Var(h̃₁) = ⋯ = Var(h̃_k−1) = 0 and Var(h̃_k) ≠ 0), the U-statistic U_n with kernel h is said of rank k(1 ≤ k ≤ m). When k > 1U_n is called degenerate; when k = m it is called complete degenerate. U_n has the following Hoeffding [16] representation

U_{n} - θ = \sum_{c = k}^{m} C_{m}^{c} U_{n c}, U_{n c} = {(C_{n}^{c})}^{- 1} \sum_{1 \leq i_{1} < \dots < i_{c} \leq n} {h̃}_{c} (X_{i_{1}}, \dots, X_{i_{c}}) .

Let $η_{c}^{2} = E [{h̃}_{c}^{2}] (c = 1, \dots, m)$ , U_n has the following variance formula [16]

Var (U_{n}) = {(C_{n}^{m})}^{- 1} \sum_{c = 1}^{m} C_{m}^{c} C_{n - m}^{m - c} η_{c}^{2} .

Define g_c = (g_c,1, …, g_c,d)′ with

g_{c, j} (x_{1}, \dots, x_{c}) = E_{F_{m}} (g_{j} (X_{1}, \dots, X_{m}) | X_{1} = x_{1}, \dots, X_{c} = x_{c}), (j = 1, \dots, d; c = 1, \dots, m)

and the canonical forms g̃_c = (g̃_c,1, …, g̃_c,d)′ for g as,

{g̃}_{c} (x_{1}, \dots, x_{c}) = \int \dots \int g_{c} (y_{1}, \dots, y_{c}) \prod_{s = 1}^{c} d (δ_{x_{s}} (y_{s}) - F (y_{s})), (j = 1, \dots, d; c = 1, \dots, m) .

Similarly, let q_c (c = 1, …, m) be the canonical forms of g(·)h(·) = (g₁(·)h(·), …, g_d(·)h(·))′. The canonical forms h̃_c and q̃_c (c = 1, …, m) exist theoretically, but are unknown in practice since F is unknown. Let r_o = min{rank(g₁), …, rank(g_d)}, r = rank(h), r₁ = min{rank(g₁h), …, rank(g_dh)}, and F̃_nm be the empirical distribution with mass w_i at the observation x_i. Using the weights w_i’s given in (2) and (3), we define the U-statistic with side information given by the constraints g as

Ũ_{n} = \sum_{i \in D_{n, m}} w_{i} h (X_{i}) = E_{{F̃}_{n, m}} h (X) .

(4)

In comparison, the commonly used U-statistic U_n has weight ${(C_{n}^{m})}^{- 1}$ at each observation h(X_i), while with the EL formulation, the weights are replaced by w_i. In the following we investigate the basic asymptotic properties of Ũ_n.

3. The asymptotic properties of Ũ_n

In this section we study some basic asymptotic behavior of the proposed U-statistic, including its convergence, asymptotic distribution, uniform convergence, and asymptotic efficiency. The following conditions will be used in this section:

(C1). Ω ≔ E[g(X)g′(X)] is positive definite.
(C2). E‖g(X)‖^α < ∞ for some α > 0 to be specified.
(C3). E_{F_m}|h(X)| < ∞.
(C4). E_{F_m}h²(X) < ∞.
(C5). E_{F_m}[‖g(X)‖²|h(X)|] < ∞.

where ‖·‖ denotes the Euclidean norm. We note that (C2) with α ≥ 4 plus (C4) implies (C5).

3.1. Convergence rate of Ũ_n

We first give a lemma to characterize the asymptotic form of the weight w_i’s, which will be used repeatedly in the asymptotic study.

Lemma. Assume (C1) and (C2) for α > 2m/r_o, we have

$w_{i} \overset{a . s .}{=} \frac{1}{C_{n}^{m}} (1 - g' (X_{i}) Ω^{- 1} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} g (X_{j}) + 1_{d}^{'} g (X_{i}) O (ρ_{n} n^{- 1 / 2} {(log log n)}^{1 / 2}) + [1_{d}^{'} g (X_{i}) + {‖ g (X_{i}) ‖}^{2}] O (ρ_{n}^{2})),$
where, 1_d = (1, …, 1)′ is the d-dimensional vector of 1’s, the O(·) terms are uniformly for all X_i’s and i’s, with
$ρ_{n} = {\begin{matrix} O (n^{- 1 / 2} {(log log n)}^{1 / 2}), & r_{o} = 1; \\ o (n^{- r_{o} / 2} log n), & 1 < r_{o} \leq m . \end{matrix}$
$w_{i} = \frac{1}{C_{n}^{m}} (1 - g' (X_{i}) Ω^{- 1} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} g (X_{j}) + 1_{d}^{'} g (X_{i}) O_{p} (n^{- (r_{o} + 1) / 2}) + [1_{d}^{'} g (X_{i}) + {‖ g (X_{i}) ‖}^{2}] O_{p} (n^{- r_{o}})) .$

The O_p(·) terms above are uniformly for all the x_i’s and i’s.

Theorem 1. (i). Assume the conditions in the lemma plus (C3) and (C5), if r = 1, then

n^{q} (Ũ_{n} - θ) \to 0, a . s . for all q < 1 / 2 .

(ii). Assume conditions in the lemma plus (C4) and (C5), if r > 1, then

a_{n} (Ũ_{n} - θ) \to 0, (a . s .), where a_{n} = {\begin{matrix} n^{q} for all q < 1 / 2, & if r_{1} = r_{o} = 1; \\ n^{min {r / 2, 1}} / log n, & if r_{1} > r_{o} = 1; \\ n^{min {r_{o}, r} / 2} / log n, & if 1 = r_{1} < r_{o}; \\ min {\frac{n^{r / 2}}{log n}, \frac{n^{min {(r_{0} + r_{1}) / 2, r_{0}}}}{{(log n)}^{2}}}, & if r_{o}, r_{1} > 1 . \end{matrix}

(iii). Assume (C4) and conditions of Lemma (i), if r = 1, then with σ² given in Theorem 2 (i),

lim_{n} sup {(2 σ^{2} \frac{log log n}{n})}^{- 1 / 2} | Ũ_{n} - θ | = 1, (a . s .) .

3.2. Asymptotic distribution of Ũ_n

Let J₁(h) be the Gaussian process indexed by h ∈ L²(R, ℬ, F) with mean E(J₁(h)) = 0 and covariance Cov(J₁(h), J₁(g)) = ∫ h(x)g(x)F(dx) for all h, g ∈ L²(R, ℬ, F). Let W(·) be the Gaussian random measure on L²(R, ℬ, P) defined by W(A) = J₁(I_A), A ∈ ℬ. J₁(h) = ∫ h(x)W(dx) is called the Wiener–Itô integral of order 1. Generally, for h ∈ L²(R^r, ℬ^r, F^r), the Wiener–Itô integral of order r is defined as

J_{r} (h) = \int \dots \int h (x_{1}, \dots, x_{r}) W (d x_{1}) \dots W (d x_{r}), \forall h \in L^{2} (R^{r}, ℬ^{r}, F),

and its covariance is given by

Cov (J_{r} (h), J_{r} (g)) = r! \int \dots \int h (x_{1}, \dots, x_{r}) g (x_{1}, \dots, x_{r}) F (d x_{1}) \dots F (d x_{r}), \forall h, g \in L^{2} (R^{r}, ℬ^{r}, F) .

For a vector function h = (h₁, …, h_d)′ with h_j ∈ L²(R^r, ℬ^r, F)(j = 1, …, d), define J_r (h) componentwisely as a d-dimensional random process. Denote $\overset{D}{\to}$ for convergence in distribution.

Theorem 2. (i) Assume (C4) and conditions of the lemma, if r = 1, then

\sqrt{n} (Ũ_{n} - θ) \overset{D}{\to} N (0, σ^{2}), σ^{2} = {\begin{matrix} m^{2} (η_{1}^{2} - 2 A' Ω^{- 1} A_{1} + A' Ω^{- 1} Ω_{1} Ω^{- 1} A), & r_{o} = 1; \\ m^{2} η_{1}^{2}, & r_{o} > 1; \end{matrix}

where $η_{1}^{2} = E_{F} {h̃}_{1}^{2} (X_{1}), Ω_{1} = E_{F} ({g̃}_{1} (X_{1}) {g̃}_{1}^{'} (X_{1}))$ , A = E_{F_m}[g(X)h(X)] and A₁ = E_F [g̃₁(X₁)h̃₁(X₁)].

(ii) Assume (C4), conditions of Lemma (ii) and r > 1, then

n^{b / 2} (Ũ_{n} - θ) \overset{D}{\to} Z, where

when A ≠ 0,

{\begin{matrix} b = r_{o}, & Z = - C_{m}^{r_{o}} A' Ω^{- 1} J_{r_{o}} ({g̃}_{r_{o}}), & if r_{o} < r; \\ b = r, & Z = C_{m}^{r} J_{r} ({h̃}_{r} - A' Ω^{- 1} {g̃}_{r}), & if r_{o} = r; \\ b = r, & Z = C_{m}^{r} J_{r} ({h̃}_{r}), & if r_{o} > r; \end{matrix}

when A = 0,

{\begin{matrix} b = r_{o}, & Z = O_{P} (1), & if r_{o} \leq min {r_{1}, r / 2}; \\ b = r, & Z = C_{m}^{r} J_{r} ({h̃}_{r}), & if r < min {r_{1} + r_{o}, 2 r_{o}}; \\ b = r, & Z = C_{m}^{r} J_{r} ({h̃}_{r}) - C_{m}^{r_{1}} C_{m}^{r_{o}} J_{r_{1}} ({q̃}_{r_{1}}) Ω^{- 1} J_{r_{o}} ({g̃}_{r_{o}}), & if r_{1} + r_{o} = 2 r; \\ b = r_{o} + r_{1}, & Z = - C_{m}^{r_{1}} C_{m}^{r_{o}} J_{r_{1}} ({q̃}_{r_{1}}) Ω^{- 1} J_{r_{o}} ({g̃}_{r_{o}}), & if r_{1} + r_{o} < r or r_{1} < r_{o} . \end{matrix}

From Theorem 2 we see that the most interesting case is r = r_o = r₁ = 1, in which $\sqrt{n} (Ũ_{n} - θ)$ is asymptotic non-degenerate normal, with asymptotic variance being smaller than that of $\sqrt{n} (U_{n} - θ)$ . σ² is the same as that of U_n either when r₁ > 1, A = 0, or when r_o > 1, A₁ = 0 and Ω₁ = 0. Thus, for the side information to be of practical meaning, we need r = r_o = r₁ = 1.

It is interesting to note that if we have full information about the parameter to be estimated, then we can “estimate” the parameter with perfection, i.e., its asymptotic variance being reduced to zero. As an artificial example, let a and b are nonzero known constants, μ = E(X₁), $h (X_{1}, \dots, X_{m}) = a \sum_{k = 1}^{m} X_{k}$ and $g (X_{1}, \dots, X_{m}) = b \sum_{k = 1}^{m} (X_{k} - μ)$ . Since g is a known function, μ must be known. We are to estimate θ = Eh(X₁, …, X_m) = amμ using the U-statistic (4), with the w_i’s given by (2) and (3). In this case, θ is already known as is μ, and we can “estimate” θ with zero asymptotic variance, as in the following

Corollary 1. Assume τ² = Var(X₁) < ∞, and h and g are as given in the above. Then Theorem 2 (i) holds with σ² = 0.

3.3. The optimality property of Ũ_n

To study the asymptotic efficiency of the estimators of θ, let 𝕀(θ|g) be the information bound [6] for estimating θ given the side information in g, to be given in Theorem 3(i) below. When the asymptotic variance of an estimator achieves this bound or equals to this bound up to a known multiplicative positive constant, the estimator is called asymptotically efficient. It is the limit version of the Cramer–Rao lower bound for variances of unbiased estimators. For Euclidean parameters without g, the information (lower) bound is the inverse of Fisher information.

Suppose f (·|θ) is the density function of X given θ, θ_n = θ + n^−1/2b for some b ∈ C, the complex plane. An estimator T_n = T_n(X₁, …, X_n) is said to be regular, if under f (·|θ_n), $W_{n} ≔ \sqrt{n} (T_{n} - θ_{n}) \overset{D}{\to} W$ for some random variable W, and the result does not depend on the sequence {θ_n}. Let Z ⊕ V denote the summation of two independent random variables Z and V, I(θ) be the Fisher information at θ, and Z ~ N(0, I⁻¹(θ)). The convolution Theorem [14] states that for any regular estimator T_n with weak limit W, there is a V such that

W = Z \oplus V .

This result further characterizes the weak limit of an asymptotic efficient estimator without side information: it is a normal random variable with mean zero and variance I⁻¹(θ). Below we obtain the information bound and convolution result for the proposed estimators with the presence of side information.

Theorem 3. Assume r = r_o = 1, (C4) and conditions in the lemma, we have

$𝕀 (θ | g) = η_{1}^{2} - A_{1}^{'} Ω_{1}^{- 1} A_{1} .$
Thus, if we set g(x) = (g(x₁) + ⋯ + g(x_m))/m, then rank(g) = 1, A = mA₁, Ω = mΩ₁, σ² = m²𝕀(θ|g) and Ũ_n is efficient.
Assume further that the density f (·|θ) of X has the second order continuous partial derivative with respect to θ, then for any regular estimator T_n with weak limit W of $W_{n} ≔ \sqrt{n} (T_{n} - θ)$ , W can be decomposed as, for some V,
$W = Z \oplus V, with Z ~ N (0, 𝕀 (θ | g)) .$

It is easy to see that any U-statistic with side information of the form Ũ_n is regular, thus is optimal in the sense of convolution under the conditions of Theorem 3. Without side information, the asymptotic variance of $\sqrt{n} (U_{n} - θ)$ is $η_{1}^{2}$ ; when side information presents, the asymptotic variance of $\sqrt{n} (Ũ_{n} - θ)$ is $η_{1}^{2} - A_{1}^{'} Ω_{1}^{- 1} A_{1}$ , with a reduction of $A_{1}^{'} Ω_{1}^{- 1} A_{1}$ . From the proof of Theorem 3(i) we see that 𝕀(θ|g) is the length of the projection of h̃₁(X) onto [g̃₁(X)^⊥], the linear span of the orthogonal complements of g̃₁(X). Increasing the components in g (and thus in g̃₁) shrinks the space [g̃₁(X)^⊥], and shortens the length of the projection or increases the efficiency of Ũ_n, or increasing the number of information constraints reduces the asymptotic variance of the U-statistic.

Remark. By Theorem 2(i) or Theorem 3(i), given a nominal level α, a level (1 − α) confidence interval of θ can be obtained as [Ũ_n ± n^−1/2σΦ⁻¹(1 − α/2)], without using likelihood ratio, where Φ⁻¹(·) is the standard normal quantile function. Here σ is smaller with the presence of side information than no side information involved, hence the inference becomes more accurate.

3.4. The uniform SLLN and CLT of Ũ_n-processes

Let P̃_n,m, P_n,m, P_m and P be the (random) probability measures induced by F̃_n,m, F_n,m, F_m and F respectively. For a function h, denote P̃_n,mh = ∑_{i∈D_n,m} w_ih(X_i), P_mh = E_{P_m}h(X), ${𝔾̃}_{n, m} h = \sqrt{n} ({P̃}_{n, m} h - P_{m} h)$ and $𝔾_{n, m} h = \sqrt{n} (P_{n, m} h - P_{m} h)$ . For fixed h and g, we have shown that, under appropriate conditions,

{P̃}_{n, m} h \to P_{m} h = P {h̃}_{1} (a . s .) and {𝔾̃}_{n, m} h \overset{D}{\to} N (0, σ^{2})

with $σ^{2} = σ^{2} (h) = P {h̃}_{1}^{2} - P ({g̃}_{1}^{'} {h̃}_{1}) Ω_{1}^{- 1} P ({g̃}_{1} {h̃}_{1})$ . In contrast, $𝔾_{n, m} h \overset{D}{\to} N (0, η_{1}^{2})$ with $η_{1}^{2} = P {η̃}_{1}^{2}$ . Thus incorporating the side information g reduces the asymptotic variance by the amount of $P ({g̃}_{1}^{'} {h̃}_{1}) Ω_{1}^{- 1} P ({g̃}_{1} h̃)$ .

It is of interest to have a uniform version of the above SLLN and CLT over a class of functions ℋ. The uniformity means supremum over ℋ, which may or may not be measurable, thus the almost sure and weak convergence results here will be in the sense of outer measure P* of P (cf. [36]; hereafter VW). When the corresponding quantity is measurable, the convergence is automatically in the sense of the measure P itself. Nolan and Pollard [23,24] study the uniform SLLN and the CLT for U-process of order two. Giné and Zinn [12], Arcones and Giné [2] and Giné [11], among others, study other types of uniform problems in general situations. Here we explore the uniform laws for U-statistics under different conditions.

Let ℋ be a class of functions satisfying (C4), and for any probability measure Q, denote ‖Qh‖_ℱ = sup{|Qh| : h ∈ ℋ}. Let L^∞(ℋ) be the space of functionals z : ℋ ↦ R with norm ‖z‖_ℋ = sup_h∈ℋ |z(h)| and the metric on L^∞(ℋ) is given by d(z₁, z₂) = ‖z₁ − z₂‖_ℋ for z₁, z₂ ∈ L^∞(ℋ). For an integer m-vector k = (k₁, …, k_m), a subset 𝒳 of R^m and a function h : 𝒳 ↦ R, denote |k| = k₁ + ⋯ + k_m, $D^{k} h (x) = \partial^{| k |} h (x) / (\partial x_{1}^{k_{1}} \dots \partial x_{m}^{k_{m}})$ . ‖x‖ is the Euclidean norm for x ∈ 𝒳,

{‖ h ‖}_{s} = max_{| k | \leq s} sup_{x \in 𝒳} | D^{k} h (x) | + max_{| k | = s} sup_{x, y \in 𝒳} | D^{k} h (x) - D^{k} h (y) |,

and C_M(𝒳) is the set of functions h : 𝒳 ↦ R with ‖h‖_s ≤ M. Let $R^{m} = \cup_{j = 1}^{\infty} I_{j}$ be a partition of R^m into bounded convex sets with non-empty interior, ℋ₁ be the class of functions h such that the restrictions ℋ_{1|I_j} belong to C_{M_j} (I_j) for every j and M = max_j M_j < ∞. Let ℋ₂ be the class of convex functions h : C ↦ R for some convex compact C ⊂ R^m such that |h(x) − h(y)| ≤ L‖x − y‖ for some 0 < L < ∞, all x, y ∈ C and h ∈ ℋ₂, and ‖ ∑_{i∈D_n,m} e_ih(x_i)‖_ℋ₂ is measurable for each n and each e_i ∈ {−1, 1} (ℋ₂ is then called P-measurable in VW). An envelope function G of ℋ is a function such that |h(x)| ≤ G(x) for all x, and h ∈ ℋ. Let ℋ = ℋ₁ ∪ ℋ₂ with (C4) satisfied on ℋ and λ(·) be the Lebesgue measure on R^m. Let $\overset{D}{\Rightarrow}$ denote weak convergence in L^∞(ℋ). We have

Theorem 4. (i) Under the conditions of Theorem 1(i), for ℋ defined above, assume that ∀h ∈ ℋ, gh ∈ ℋ in the componentwise sense, ℋ₁ has a square integrable envelope function H, max_j λ(I_j) < ∞, $\sum_{j = 1}^{\infty} M_{j}^{1 / 2} P_{m}^{1 / 2} (I_{j}) < \infty$ , and ℋ₂ is bounded. Then we have

sup_{h \in ℋ} | {P̃}_{n, m} h - P_{m} h | = 0, (a . s .^{*}) .

(ii) Under the conditions of Theorem 3(ii), assume ℋ has a square integrable envelope function H, max_j λ(I_j) < ∞, max_j λ(I_j) < ∞, m < 4, s > m/2 for ℋ₁, and $\sum_{j = 1}^{\infty} M_{j}^{\frac{2 υ}{υ + 2}} P^{\frac{υ}{υ + 2}} (I_{j}) < \infty$ with υ = m/s. Then

{𝔾̃}_{n, m} \overset{D}{\Rightarrow} 𝔾 in L^{\infty} (ℋ),

where 𝔾 is a Gaussian process indexed by ℋ, with E_P (𝔾h) = 0 and ${Cov}_{P} (𝔾 h, 𝔾 q) = P ({h̃}_{1} {q̃}_{1}) - P ({g̃}_{1}^{'} {h̃}_{1}) Ω_{1}^{- 1} P ({g̃}_{1} {q̃}_{1})$ for all h, q ∈ ℋ.

Using results in [2,11], we can get many more results, below we only mention one.

Corollary 2. For a class of functions ℋ, assume that ∀h ∈ ℋ, gh ∈ ℋ; that ℋ is a measurable VC-subgraph class of functions with envelope H and P_mH < ∞, or ∀ε > 0, $N_{[]}^{(1)} (ε, ℋ, P_{m}) < \infty$ (for definition, see p. 1512, [2]). Then

sup_{h \in ℋ} | {P̃}_{n, m} h - P_{m} h | \to 0, a . s .

4. The empirical likelihood ratio for U-statistics with side information

Next we define the empirical likelihood ratio for θ, and construct the confidence interval for θ in the presence of side information. Let G(x|θ) = (g′(x), h(x) − θ)′, we have E_{F_m} G(X|θ) = 0. Without side information, the weights that maximize ∏_{i∈D_n,m} w_i subject to ∑_{i∈D_n,m} w_i = 1 are $w_{i} = {(C_{n}^{m})}^{- 1}$ for all i ∈ D_n,m; while the weights that maximize ∏_{i∈D_n,m} w_i subject to ∑_{i∈D_n,m} w_i = 1 and ∑_{i∈D_n,m} w_iG(X_i|θ) = 0 are $w_{i} = {(C_{n}^{m})}^{- 1} 1 / (1 + t' G (X_{i} | θ))$ and t is determined by (3) with g(·) replaced by G(·|θ). Therefore we define the empirical log likelihood ratio of θ with the presence of side information by

R_{G} (θ) = L_{n} (θ) / {(C_{n}^{m})}^{- C_{n}^{m}} = \prod_{i \in D_{n, m}} (C_{n}^{m} w_{i}),

where

L_{n} (θ) = max_{\sum_{i \in D_{n, m}} w_{i} = 1, \sum_{i \in D_{n, m}} w_{i} G (X_{i} | θ) = 0} \prod_{i \in D_{n, m}} w_{i}

and denote

l (θ) = - log R_{G} (θ) = \sum_{i \in D_{n, m}} log [1 + t' G (X_{i} | θ)] .

Let $Λ = E_{F_{m}} (G (x | θ) G' (X | θ)) = (\begin{matrix} Ω & A \\ A' & η^{2} \end{matrix})$ , η² = Var(h(X)); and Λ₁ = Cov(G̃₁), G̃₁ is the first canonical form (vector) of G.

Note that when there is no side information, G(·|θ) reduces to h(·) − θ, and t is a scalar determined by ∑_{i∈D_n,m}(h(X_i) − θ)/[1 + t(h(X_i) − θ)] = 0. The corresponding log-likelihood ratio is

l_{h} (θ) = \sum_{i \in D_{n, m}} log [1 + t (h (X_{i}) - θ)] .

Theorem 5. (i) Under conditions of Theorem 2 (i) or Theorem 3 (i), assume r_o = 1 and Λ to be positive definite, then

\frac{2 n}{m^{2} C_{n}^{m}} l (θ) \overset{D}{\to} Z_{d + 1}^{'} Λ_{1}^{1 / 2} Λ^{- 1} Λ_{1}^{1 / 2} Z_{d + 1}, Z_{d + 1} ~ N (0, I_{d + 1}) .

(ii) Assume (C4), then

\frac{2 n η^{2}}{m^{2} C_{n}^{m} η_{1}^{2}} l_{h} (θ) \overset{D}{\to} χ_{1}^{2} .

When m = 1, $Λ_{1}^{1 / 2} = Λ^{1 / 2}$ and the above result for U-statistic automatically reduces to that for the common EL ratio, and the right hand side in Theorem 5(i) is $χ_{d + 1}^{2}$ (see the corresponding result in Theorem 2 of Qin and Lawless [28]), therefore with side information incorporated in the likelihood ratio, the length of confidence region for θ cannot be reduced, this is an interesting contrast to the estimation with side information, in which the asymptotic variance is reduced. However, using the EL ratio, the shape of the confidence region is more natural than many other commonly used methods, such as the normal approximation, which are forced to be symmetric. The latter method may have poorer coverage probability because of the shorter interval length and its subjective shape.

Although side information is widely applied in practice to improve performance of estimators via the EL, the following Corollary 3 describes the effects when incorrect side information is used, thus side information should be used with care, and be justified properly before its use.

Corollary 3. If E_{F_m}g(X) = δ ≠ 0, then

Under conditions of Theorem 1 (i),
$Ũ_{n} - θ \overset{a . s .}{\to} A' Ω^{- 1} δ .$
Under conditions of Theorem 2 (i),
$\sqrt{n} (Ũ_{n} - θ - A' Ω^{- 1} δ) \overset{D}{\approx} N (0, σ^{2}) .$
If E_{F_m}G(X) = δ ≠ 0, then under conditions of Theorem 5 (i),
$- \frac{2 n}{C_{n}^{m}} R_{G} (θ) \overset{D}{\approx} Z_{d + 1}^{'} Λ_{1}^{1 / 2} Λ^{- 1} Λ_{1}^{1 / 2} Z_{d + 1}, Z_{d + 1} \overset{D}{~} N (\sqrt{n} Λ_{1}^{- 1 / 2} δ, I_{d + 1}),$
when Λ = Λ₁, $Z_{d + 1}^{'} Λ_{1}^{1 / 2} Λ^{- 1} Λ_{1}^{1 / 2} Z_{d + 1} = Z_{d + 1}^{2} (n δ' Λ^{- 1} δ)$ , the chi-squared distribution of degree d + 1 with noncentrality parameter nδ′Λ⁻¹δ.

5. Examples and simulation studies

5.1. Examples

In this section we give some examples for illustration.

Example 1. For a given distribution F, let θ(F) = ∫ (x − μ)²dF(x) be the variance, where μ is the mean. Let μ_k, k ≥ 2 be the k-th moment of F. For the kernel h(x₁, x₂) = (x₁ − x₂)²/2, we have h̃₁(x₁) = [(x₁ − μ)² − θ]/2, η² = E(h²) − θ² = (μ₄ + θ²)/2, $η_{1}^{2} = E ({h̃}_{1}^{2}) = (μ_{4} - θ^{2}) / 4$ . Without side information, the asymptotic variance of U_n based on kernel h(x₁, x₂) is $σ_{0}^{2} = 4 η_{1}^{2} = μ_{4} - θ^{2}$ , which is the same as that for the sample variance estimator $θ_{n} ≔ {(n - 1)}^{- 1} \sum_{i = 1}^{n} {(X_{i} - X̅)}^{2}$ .

If we know that F has median at 0: F (0) = 1/2, we take g(x₁, x₂) = [I(x₁ ≤ 0) + I(x₂ ≤ 0)]/2 − 1/2. Then g̃₁(x₁) = [I(x₁ ≤ 0) − 1/2]/2, $A_{1} = E ({g̃}_{1} {h̃}_{1}) = [\int_{- \infty}^{0} {(x - μ)}^{2} d F (x) - θ / 2] / 4$ , and $Ω_{1} = E ({g̃}_{1}^{2}) = 1 / 16$ . So by Theorem 3(i), the asymptotic variance of Ũ_n is now $σ^{2} = σ_{0}^{2} - A_{1}^{2} Ω_{1}^{- 1} = 4 η_{1}^{2} - {[\int_{- \infty}^{0} {(x - μ)}^{2} d F (x) - σ^{2} / 2]}^{2}$ , a deduction of ${[\int_{- \infty}^{0} {(x - μ)}^{2} d F (x) - σ^{2} / 2]}^{2}$ from $σ_{0}^{2}$ .

Example 2. For the Wilcoxon one-sample statistic, θ(F) = P_F (x₁ + x₂ ≤ 0), the kernel for the corresponding U-statistic is h(x₁, x₂) = I(x₁ + x₂ ≤ 0), h̃₁(x₁) = F (−x₁) − θ, $η_{1}^{2} = E_{F} ({h̃}_{1} (x_{1})) = \int F^{2} (- x) d F (x) - θ^{2}$ . Without side information, the asymptotic variance of U_n based on kernel h(x₁, x₂) is $σ_{0}^{2} = 4 η_{1}^{2}$ .

Suppose we know the distribution is symmetric about a > 0: F (x − a) = 1 − F (a − x) for all x. Take g(x₁, x₂) = [I(x₁ ≤ 0) + I(x₁ ≤ 2a) + I(x₂ ≤ 0) + I(x₂ ≤ 2a)]/2 − 1, then g̃₁(x₁) = [I(x₁ ≤ 0) + I(x₁ ≤ 2a)]/2 − 1/2, Ω₁ = F (−a)/2, $A_{1} = [\int_{- \infty}^{a} F (- x) d F (x) + \int_{- \infty}^{- a} F (- x) d F (x)] / 2 - \int F (- x) d F (x) / 2$ . The deduction of asymptotic variance is $A_{1}^{2} Ω^{- 1}$ .

Example 3. For the Gini difference, θ(F) = E_F |x₁ − x₂|, the corresponding kernel for U-statistic is h(x₁, x₂) = |x₁ − x₂|. We have ${h̃}_{1} (x_{1}) = \int_{x_{1}}^{\infty} x d F (x) - \int_{- \infty}^{x_{1}} x d F (x) - θ, η_{1}^{2} = \int {(\int_{x_{1}}^{\infty} x d F (x) - \int_{- \infty}^{x_{1}})}^{2} d F (x_{1}) - θ^{2}$ . Without side information, the asymptotic variance of U_n based on kernel h(x₁, x₂) is $σ_{0}^{2} = 4 η_{1}^{2}$ .

If we know the distribution mean μ, and take g(x₁, x₂) = (x₁ + x₂)/2 − μ, then g̃₁(x₁) = (x₁ − μ)/2, Ω₁ = ∫(x − μ)²dF(x), $A_{1} = {\int x_{1} [\int_{x_{1}}^{\infty} x d F (x) - \int_{- \infty}^{x_{1}} x d F (x)] d F (x_{1}) - θ} / 2$ . The deduction of asymptotic variance is $A_{1}^{2} Ω^{- 1}$ .

5.2. Simulation studies

Simulation studies are conducted to assess the finite-sample performance of the proposed methods in this section. These studies are based on Examples 1 and 2 in Section 5.1. We compare variance estimates of U-statistics with and without side information, and calculate the variance reduction under different sample sizes. We also compare various U-EL based and normal approximation-based confidence intervals for θ in terms of coverage probability. Although side information does not have effect asymptotically, as indicated by Theorem 5, the finite sample property of constructing confidence intervals using the U-EL ratio is still of interest, and will be compared with those obtained through normal approximation based method.

Based on the U-EL theory developed in Section 4, we can construct three U-EL based intervals for θ as follows:

The first one, called EL1 interval, is defined as

{θ : \frac{2 n}{m^{2} C_{n}^{m}} l (θ) \leq q_{1 - α}}

where q_1−α is the (1 − α)-th quantile of the distribution of $Z_{d + 1}^{'} Λ_{1}^{1 / 2} Λ^{- 1} Λ_{1}^{1 / 2} Z_{d + 1}$ . q_1−α can be estimated by using the sample estimates of Λ₁ and Λ and Monte Carlo method.

One can also approximate the quantile of the distribution of l(θ) by using bootstrap method. Let ${l_{b}^{*} (θ̂) : b = 1, \dots, B}$ (B ≥ 200 is recommended) are B bootstrap replicates of l(θ). Then, the second EL-based interval for θ, called EL2, is given by

{θ : l (θ) \leq l_{([B (1 - α)])}^{*} (θ̂)},

where $l_{(b)}^{*} (θ)$ is the b-th ordered value of $l_{b}^{*} (θ)$ ’s, and [x] represents the integer part of x.

The third one, called EL3 interval, is constructed as follows:

{θ : c^{*} l (θ) \leq χ_{d + 1, 1 - α}^{2}},

where $c^{*} = \frac{d + 1}{\frac{1}{B} \sum_{b = 1}^{B} l_{b}^{*} (θ̂)}$ . This interval is motivated by the fact that the distribution of $Z_{d + 1}^{'} Λ_{1}^{1 / 2} Λ^{- 1} Λ_{1}^{1 / 2} Z_{d + 1}$ can be approximated by a scaled chi-squares distribution. i.e.,

c \cdot l (θ) \overset{D}{\to} χ_{d + 1}^{2},

where c is an unknown constant, and $c \approx \frac{E (χ_{d + 1}^{2})}{E (l (θ))} = \frac{d + 1}{E (l (θ))}$ .

The asymptotic normal distribution obtained in Theorem 2 can be used to construct two additional confidence intervals for θ, called AN1 and AN2 intervals, as follows:

{Ũ_{n} - z_{1 - α / 2} σ̂ / \sqrt{n}, Ũ_{n} + z_{1 - α / 2} σ̂ / \sqrt{n}}, {Ũ_{n} - z_{1 - α / 2} {σ̂}^{*} / \sqrt{n}, Ũ_{n} + z_{1 - α / 2} {σ̂}^{*} / \sqrt{n}},

where σ̂ is the estimate of σ by plugging the sample estimates of all population quantities in Theorem 2. σ̂* is the bootstrap estimate of σ based on B bootstrap samples. For computation consideration, we take B = 200 in the simulation studies.

Examples 1 and 2 in Section 5.1 are considered in the simulation study. In the first example, the underlying distribution is chosen to be a skewed distribution with median 0. Here we take X ~ exp(1) − ln(2), the standard exponential distribution with a shifted center. Then EX = 1 − ln 2, Median(X) = 0, and θ = Var(X) = 1. In the second example, we consider a symmetric distribution with mean a. We choose X ~ 𝒩(1, 4), then $θ = Φ (- \frac{1}{4})$ , where Φ(x) is the cdf of the standard normal distribution. The simulation results are presented in Tables 1–4.

Table 1.

The asymptotic variance estimation of U-statistics. X ~ exp(1) − ln(2).

Method	n = 50	n = 100	n = 150	n = 200
Without side information	8.5239	7.8569	7.3839	7.1557
With side information	8.4572	7.5524	7.2673	7.0791
Variance reduction	0.0667	0.3045	0.1165	0.0766

Open in a new tab

Table 4.

Coverage probabilities of various 95% confidence intervals for θ with side information. X ~ 𝒩(1, 4).

Sample size	EL1	EL2	EL3	AN1	AN2
n = 50	0.876	0.983	0.918	0.956	0.945
n = 100	0.882	0.981	0.931	0.961	0.947
n = 150	0.926	0.978	0.968	0.978	0.968
n = 200	0.942	0.956	0.984	0.970	0.954

Open in a new tab

Tables 1 and 2 show the estimated asymptotic variances of U-statistics with or without side information respectively under sample size n = 50, 100, 150, 200. The reduction of variance is also calculated. The results are based on 1000 repetitions.

Table 2.

The asymptotic variance estimation of U-statistics. X ~ 𝒩(1, 4).

Method	n = 50	n = 100	n = 150	n = 200
Without side information	0.2413	0.2208	0.2199	0.2203
With side information	0.0548	0.0526	0.0527	0.0572
Variance reduction	0.1865	0.1682	0.1673	0.1631

Open in a new tab

Tables 3 and 4 show the coverage probabilities of the EL-based intervals (EL1, EL2 and EL3) and the normal approximation-based intervals (AN1 and AN2) with side information.

Table 3.

Coverage probabilities of various 95% confidence intervals for θ with side information. X ~ exp(1) − ln(2).

Sample size	EL1	EL2	EL3	AN1	AN2
n = 50	0.942	0.949	0.994	0.783	0.784
n = 100	0.929	0.950	0.991	0.858	0.878
n = 150	0.934	0.954	0.990	0.872	0.880
n = 200	0.950	0.949	0.989	0.898	0.904

Open in a new tab

The simulation results show that the proposed U-statistic Ũ_n performs well in finite sample cases. From Tables 1 and 2 we can clearly see a reduction of the variance of estimating θ. The variance reduction can be significant, as in Example 2, which shows that the proposed method could offer a more accurate estimation.

From the coverage probabilities in Tables 3 and 4, we see that the U-EL based confidence intervals work significantly better than the normal approximation-based confidence intervals when the underlying distribution is a skewed distribution (Example 1). When the underlying distribution is a symmetric distribution (Example 2), the performances of these methods of methods are comparable. Furthermore, in most cases, bootstrap-based methods work better than plugin methods.

Concluding remarks

We studied a method to implement side information into the U-statistic, via the empirical likelihood approach, and investigated some asymptotic behavior of the proposed method. We show, for parameter estimation, the proposed U-statistic with side information has advantages, such as smaller asymptotic variance, over that without side information incorporated. We also explored the construction of confidence intervals using U-statistic based empirical likelihood ratio. Although such U-EL ratio does not benefit from side information asymptotically, our simulation studies show that the corresponding confidence intervals still out perform those based on normal approximation in finite sample cases. We also note that, if incorrect side information is incorporated, the resulting estimates can be seriously biased. Thus in practice the incorporation of side information should be justified properly.

Acknowledgments

This work is supported in part by the National Center for Research Resources at NIH grant 2G12RR003048. Dr. Gengsheng Qin’s work is supported in part by US NSA grant H98230-12-1-0228. Dr. Wenqing He’s work is partially supported by the Natural Sciences and Engineering Research Council of Canada.

Appendix

Proof of the Lemma. (i) As in [26], write t = t_n = ρ_ne with ρ_n ≥ 0 and e = e(X₁, …, X_n) a d-vector with ‖e‖ = 1. We first find the asymptotic order of t_n. Denote $b (t) = {(C_{n}^{m})}^{- 1} \sum_{i \in D_{n, m}} g (X_{i}) / (1 + t' g (X_{i}))$ , Z_n = max{|e′g(X_i)| : i ∈ D_n,m}, and Z̃_n = max{|e′g(X̃_i)| : i ∈ D̃_n,m} with $| {D̃}_{n, m} | = C_{n}^{m}$ , ∀i and j ∈ D̃_n,m, i and j have no common entry, and X̃_i = (X̃_i₁, …, X̃_{i_m}), X̃_i₁, …, X̃_{i_m} i.i.d. with X₁ for all i₁, …, i_m (i = 1, …, n). Since Z̃_n is a maximum over $C_{n}^{m}$ i.i.d. samples, while Z_n is that over $C_{n}^{m}$ dependent samples come from the same distribution, for large n, we have Z_n ≤ Z̃_n (a.s.). Since E‖e′g(X)‖^α < ∞, ${Z̃}_{n} = o ({(C_{n}^{m})}^{1 / α})$ (a.s.) as in [26], and so $Z_{n} = o ({(C_{n}^{m})}^{1 / α})$ (a.s.). We have

0 = ‖ b (ρ_{n} e) ‖ \geq | e' b (ρ_{n} e) | = \frac{1}{C_{n}^{m}} | e' (\sum_{i \in D_{n, m}} g (X_{i}) - ρ_{n} \sum_{i \in D_{n, m}} \frac{g (X_{i}) e' g (X_{i})}{1 + ρ_{n} e' g (X_{i})}) | \geq \frac{ρ_{n}}{C_{n}^{m}} \sum_{i \in D_{n, m}} \frac{{[e' g (X_{i})]}^{2}}{1 + ρ_{n} e' g (X_{i})} - | \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} e' g (X_{i}) | \geq \frac{ρ_{n}}{1 + ρ_{n} Z_{n}} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} {[e' g (X_{i})]}^{2} - | \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} e' g (X_{i}) | = \frac{ρ_{n}}{1 + ρ_{n} Z_{n}} e' (\frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} g (X_{i}) g' (X_{i})) e - | e' (\frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} g (X_{i})) | ≔ \frac{ρ_{n}}{1 + ρ_{n} Z_{n}} e' R_{1, n} e - | e' R_{2, n} | .

Below we will show, for some 0 < c < C < ∞, for all large n, uniformly for all the X_i’s and i’s,

c < e' R_{1, n} e = C a . s .; and e' R_{2, n} = {\begin{matrix} O (n^{- 1 / 2} {(log log n)}^{1 / 2}) a . s . if r_{o} = 1, \\ o (n^{- r_{o} / 2} log n) a . s . if r_{o} > 1 . \end{matrix}

(A.1)

In fact, R_1,n is a (matrix valued) U-statistic with a.s. limit 0 < E[g(X)g′(X)] = Ω < ∞, where the “0 <” is in the matrix positive definite sense and the “<∞” is in the componentwise sense. Let 0 < λ₁ ≤ ⋯ ≤ λ_d < ∞ be all the eigenvalues of Ω, we have Ω = Q′diag(λ₁, …, λ_d)Q with Q being orthonormal. Denote η = Qe, then η′η = 1. Then for large n, R_1,n > Ω/2 (a.s.) thus e′R_1,ne > e′Ωe/2 = η′diag(λ₁, …, λ_d)η/2 ≥ λ₁/2 ≔ c > 0 (a.s.). Similarly, for large n, R_1,n < 2Ω (a.s.) and e′R_1,ne < 2λ_d ≔ C (a.s.).

Since ‖e‖ = 1, we only need to prove the second assertion in (A.1) for R_2,n. Note R_2,n is a (vector) U-statistic with kernel g(x) satisfying E(g(X)) = 0. Recall g_c is the canonical forms of g, let R_2,nc be the corresponding Hoeffding forms of R_2,n(c = 1, …, m). By the given condition we have E(‖g_c (X)‖²) ≤ E‖g(X)‖² < ∞, (c = r_o, …, m), E‖g(X)‖^4/3 < ∞, and $R_{2, n} = \sum_{c = r_{o}}^{m} C_{m}^{c} R_{2, n c}$ in componentwise sense (component j in R_2,nc is zero for c = r_o, …, r_j −1 if r_j ≔ rank(g_j) > r_o). If r_o = 1, let $η_{2, 1}^{2} ≔ E ({‖ g_{1} (X) ‖}^{2})$ , by Theorem 9.1.1 in KB, we get

lim_{n} sup {(2 m^{2} η_{2, 1}^{2} \frac{log log n}{n})}^{- 1 / 2} | R_{2, n} | = 1 (a . s .) . or R_{2, n} = O (n^{- 1 / 2} {(log log n)}^{1 / 2}) (a . s .) .

If r_o > 1, by Lemma 9.2.1 in KB,

\frac{n^{c / 2}}{log n} R_{2, n c} \to 0 (a . s .), (c = r_{o}, \dots, m); and so R_{2, n} = o (n^{- r_{o} / 2} log n) (a . s .) .

Now, since R_1,n = O(1) (a.s.) with 0 < O(1) < ∞ and $Z_{n} = O ({(C_{n}^{m})}^{1 / α}) = O (C^{m / α})$ , we have, for r_o = 1, ρ_n/(1 + ρ_nZ_n) = O(|e′R_2,n|) = O(n^−1/2(log log n)^1/2) (a.s.), or ρ_n(1 − o(n^{−(1/2−m/α)}(log log n)^1/2)) = O(n^−1/2(log log n)^1/2) (a.s.). For r_o > 1, ρ_n(1 − o(n^m/α−r_o/2 log n)) = o(n^−r_o/2 log n) (a.s.). Thus we have

‖ t_{n} ‖ = ρ_{n} = {\begin{matrix} O (n^{- 1 / 2} {(log log n)}^{1 / 2}), & r_{o} = 1; \\ o (n^{- r_{o} / 2} log n), & 1 < r_{o} \leq m \end{matrix} (a . s .) .

Since for all i, $| t_{n}^{'} g (X_{i}) | \leq ‖ t_{n} ‖ Z_{n} = o (ρ_{n} n^{m / α}) \to 0$ (a.s.), thus for large n, max_{i∈D_n,m} |t′g(X_i)| < 1 (a.s.), so we have

0 = \sum_{i \in D_{n, m}} \frac{g (X_{i})}{1 + t' g (X_{i})} = \sum_{i \in D_{n, m}} (g (X_{i}) \sum_{j = 0}^{\infty} {(- 1)}^{j} {[t' g (X_{i})]}^{j}) = \sum_{i \in D_{n, m}} (g (X_{i}) [1 - t' g (X_{i}) + O {[t' g (X_{i})]}^{2}]) = \sum_{i \in D_{n, m}} (g (X_{i}) [1 - t' g (X_{i}) + {‖ g (X_{i}) ‖}^{2} O (ρ_{n}^{2})]), (a . s .)

\sum_{i \in D_{n, m}} g (X_{i}) g' (X_{i}) t = \sum_{i \in D_{n, m}} [g (X_{i}) + {‖ g (X_{i}) ‖}^{2} g (X_{i}) O (ρ_{n}^{2})], (a . s .)

thus

t = {(\sum_{i \in D_{n, m}} g (X_{i}) g' (X_{i}))}^{- 1} \sum_{i \in D_{n, m}} g (X_{i}) + O (ρ_{n}^{2}) {(\sum_{i \in D_{n, m}} g (X_{i}) g' (X_{i}))}^{- 1} \sum_{i \in D_{n, m}} {‖ g (X_{i}) ‖}^{2} g (X_{i}) ≔ B_{n} + O (ρ_{n}^{2}) {(\sum_{i \in D_{n, m}} g (X_{i}) g' (X_{i}))}^{- 1} \sum_{i \in D_{n, m}} {‖ g (X_{i}) ‖}^{2} g (X_{i}) (a . s .) .

We have already shown $R_{2, n} = {(C_{n}^{m})}^{- 1} \sum_{i \in D_{n, m}} g (X_{i}) = O (ρ_{n})$ (a.s.). Also g(·)g′(·) is non-degenerate by (C1), also since m ≥ 2, E‖g(X)‖⁴ < ∞ by (C2), thus by the law of iterated logarithm (LIL) for U-statistics, ${(C_{n}^{m})}^{- 1} \sum_{i \in D_{n, m}} g (X_{i}) g' (X_{i}) = Ω + O (n^{- 1 / 2} {(log log n)}^{1 / 2})$ (a.s.), hence

B_{n} = [Ω^{- 1} + O (n^{- 1 / 2} {(log log n)}^{1 / 2})] \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} g (X_{i}) = Ω^{- 1} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} g (X_{i}) + O (ρ_{n} n^{- 1 / 2} {(log log n)}^{1 / 2}) = O (ρ_{n}) (a . s .) .

Similarly,

O (ρ_{n}^{2}) {(\sum_{i \in D_{n, m}} g (X_{i}) g' (X_{i}))}^{- 1} \sum_{i \in D_{n, m}} {‖ g (X_{i}) ‖}^{2} g (X_{i}) = O (ρ_{n}^{2}) Ω^{- 1} E [{‖ g (X) ‖}^{2} g (X)] = o (B_{n}), (a . s .)

so,

t = t_{n} = B_{n} + O (ρ_{n}^{2}) = Ω^{- 1} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} g (X_{j}) + O (ρ_{n} n^{- 1 / 2} {(log log n)}^{1 / 2}) + O (ρ_{n}^{2}) (a . s .) .

From this we get, (a.s.),

w_{i} = \frac{1}{C_{n}^{m}} \frac{1}{1 + t' g (X_{i})} = \frac{1}{C_{n}^{m}} [1 - t' g (X_{i}) + {‖ g (X_{i}) ‖}^{2} O (ρ_{n}^{2})] = \frac{1}{C_{n}^{m}} (1 - g' (X_{i}) Ω^{- 1} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} g (X_{j}) + 1_{d}^{'} g (X_{i}) O (ρ_{n} n^{- 1 / 2} {(log log n)}^{1 / 2}) + (1_{d}^{'} g (X_{i}) + {‖ g (X_{i}) ‖}^{2}) O (ρ_{n}^{2})) .

(ii) As in the proof of (i), we only need to show the results regardless of e. Treating gg′ and R_1,n as vectors of length d². Under the given conditions, note gg′ is non-degenerate, by the central limit theorem (CLT) for U-statistics,

\sqrt{n} (R_{1, n} - Egg' (X)) \overset{D}{\to} N (0, Ξ), or R_{1, n} = Egg' (X) + O_{p} (n^{- 1 / 2}),

where Ξ is determined by gg′. Similarly, for R_2,n, since Eg(X) = 0, if r_o = 1, by standard U-statistics theory, $\sqrt{n} R_{2, n}$ is asymptotical normal, or R_2,n = O_p(n^−1/2). If r_o > 1, note r_o ≤ m, the given conditions implies E|g_c (x₁, …, x_c)|^{2c/(2c−r_o)} < ∞ for c = r_o, …, m. So, by Theorem 4.4.1 in KB, $n^{r_{o} / 2} R_{2, n} \overset{D}{\to} R_{2}$ for some non-degenerate random variable R₂, thus R_2,n = O_p(n^−r_o/2). So, as in the proof of (i), we get ρ_n(1 − o_P (n^m/α−r_o/2)) = O_p(n^−r_o/2). Since o_P (n^m/α−r_o/2) = o_p(1), we have ‖t_n‖ = ρ_n = O_p(n^−r_o/2).

Going through the proofs in (i), with O(n^−1/2(log log n)^1/2) replaced by O_p(n^−1/2), we get $t = t_{n} = Ω^{- 1} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} g (X_{j}) + O_{p} (ρ_{n} n^{- 1 / 2})$ , and, with ρ_n = n^−r_o/2,

w_{i} = \frac{1}{C_{n}^{m}} (1 - g' (X_{i}) Ω^{- 1} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} g (X_{j}) + 1_{d}^{'} g (X_{i}) O_{p} (ρ_{n} n^{- 1 / 2}) + (1_{d}^{'} g (X_{i}) + {‖ g (X_{i}) ‖}^{2}) O_{p} (ρ_{n}^{2})) .

Proof of Theorem 1. (i) By Lemma (i), we have

Ũ_{n} = U_{n} - (\frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} g' (X_{i}) h (X_{i})) Ω^{- 1} (\frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} g (X_{j})) + O (ρ_{n} n^{- 1 / 2} {(log log n)}^{1 / 2}) \times \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} 1_{d}^{'} g (X_{i}) h (X_{i}) + O (ρ_{n}^{2}) \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} (1_{d}^{'} g (X_{i}) + {‖ g (X_{i}) ‖}^{2}) h (X_{i}) (a . s .) .

By the given conditions and the SLLN of U-statistics, $U_{n} \overset{a . s .}{\to} θ$ ,

U_{0, n} ≔ \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} g (X_{i}) \overset{a . s .}{\to} E_{F_{m}} g (X) = 0, U_{1, n} ≔ \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} g (X_{i}) h (X_{i}) \overset{a . s .}{\to} A < \infty,

where A = E_{F_m}[g(X)h(X)], and

U_{2, n} ≔ \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} (1_{d}^{'} g (X_{i}) + {‖ g (X_{i}) ‖}^{2}) h (X_{i}) \overset{a . s .}{\to} E_{F_{m}} [(1_{d}^{'} g (X) + {‖ g (X) ‖}^{2}) h (X)] < \infty .

Since by the given conditions, E_{F_m}|h̃_c (X)|^γ < ∞ and E_{F_m} |g̃_c (X)| < ∞ for γ = cp/(p(c − 1) + 1), c = 1, …, m and 1 < p < 2, so by Corollary 3.4.1 in KB, n^−1/p+1(U_n − θ) → 0 (a.s.), or n^q(U_n − θ) → 0 (a.s.) and n^qU_0,n → 0 (a.s.) for all q < 1/2, and consequently, for all q < 1/2,

n^{q} (Ũ_{n} - θ) = n^{q} (U_{n} - θ) - O (n^{q} U_{0, n}) + O (ρ_{n} n^{- (1 / 2 - q)} {(log log n)}^{1 / 2}) + O (n^{q} ρ_{n}^{2}) \to 0 (a . s .) .

(ii) Use notations in (i), we have

Ũ_{n} = U_{n} - U_{1, n}^{'} Ω^{- 1} U_{0, n} + O (ρ_{n} n^{- 1 / 2} {(log log n)}^{1 / 2}) U_{1, n} + O (ρ_{n}^{2}) U_{2, n}, (a . s .) .

Recall for any U-statistic U_n with rank r and canonical forms h̃_c (c = r, …, m) the following decomposition holds

U_{n} - θ = \sum_{c = r}^{m} C_{m}^{c} U_{n, c}, U_{n, c} = {(C_{n}^{c})}^{- 1} \sum_{1 \leq i_{1} < \dots < i_{c} \leq n} {h̃}_{c} (x_{i_{1}}, \dots, x_{i_{c}}) .

Since $E h_{c}^{2} < \infty$ for c = r, …, m, by Lemma 9.2.1 in KB, U_n,c = o(n^−c/2 log n) (a.s.), so U_n = o(n^−r/2 log n) (a.s.). Thus, when r₁ = r₀ = 1, U_1,n = O(1) (a.s.), U_0,n = o(n^−q) (a.s.) for any q < 1/2, and U_2,n = O(1) (a.s.) as its kernel is always non-degenerate, so by Lemma (i) we have

Ũ_{n} - θ = o (n^{- r / 2} log n) + o (n^{- q}) + O (n^{- 1} log log n) + O (n^{- 1} log log n) = o (n^{- q}), q < 1 / 2 .

When r₁ > r₀ = 1, by Lemma 9.2.1 in KB, U_1,n = o(n^−r₁/2 log n) (a.s.), note O(n⁻¹ log log n) = o(n⁻¹ log n), and so

Ũ_{n} - θ = o (n^{- r / 2} log n) + o (n^{- (r_{1} / 2 + q)} log n) + o (n^{- (1 + r_{1} / 2)} log n {(log log n)}^{1 / 2}) + O (n^{- 1} log log n) = o (n^{- min {r / 2, (r_{1} + 1) / 2, 1}} log n) = o (n^{- min {r / 2, 1}} log n) .

When 1 = r₁ < r₀,

Ũ_{n} - θ = o (n^{- r / 2} log n) + o (n^{- (r_{0} / 2)} log n) + o (n^{- (1 + r_{o}) / 2} log n {(log log n)}^{1 / 2}) + o (n^{- r_{o}} {(log n)}^{2}) = o (n^{- min {r_{o}, r} / 2} log n);

and when r₁, r₀ > 1,

Ũ_{n} - θ = o (n^{- r / 2} log n) + o (n^{- (r_{0} + r_{1}) / 2} {(log n)}^{2}) + o (n^{- (1 + r_{o} + r_{1}) / 2} {(log n)}^{2} {(log log n)}^{1 / 2}) + o (n^{- r_{o}} {(log n)}^{2}) = o (max {n^{- r / 2} log n, n^{- min {(r_{0} + r_{1}) / 2, r_{0}}} {(log n)}^{2}}) .

(iii) Using Lemma (i) and notations in the proof of (i), we have, a.s.,

Ũ_{n} - θ = \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} (h (X_{i}) - θ - A' Ω^{- 1} g (X_{i})) - (U_{1, n}^{'} - A') Ω^{- 1} U_{0, n} + O (ρ_{n} n^{- 1 / 2} {(log log n)}^{1 / 2}) U_{1, n} + O (ρ_{n}^{2}) U_{2, n} .

Recall that U_0,n = o(n^−q), U_1,n −A = o(n^−q) (a.s.) for all 0 < q < 1/2, and U_2,n → C₂ (a.s.) for some C₂ < ∞. We have, a.s., for all 0 < q < 1/2,

Ũ_{n} - θ = \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} (h (X_{i}) - θ - A' Ω^{- 1} g (X_{i})) + o (n^{- 2 q}) + O (ρ_{n} n^{- 1 / 2} {(log log n)}^{1 / 2}) + O (ρ_{n}^{2}) .

By Theorem 9.1.1 of KB, the LIL holds for the first term above, and the above equation gives the desired result.

Proof of Theorem 2. (i) Using the fact that U_1,n → A (a.s.) and U_2,n → C₂ (a.s.) for some C₂ < ∞ as proved in Theorem 1(i), thus by Lemma (ii),

\sqrt{n} (Ũ_{n} - θ) = \sqrt{n} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} (h (X_{i}) - θ - A' Ω^{- 1} g (X_{i})) - \sqrt{n} (U_{1, n}^{'} - A') Ω^{- 1} U_{0, n} + O_{P} (n^{- r_{o} / 2}) U_{1, n} + O_{P} (n^{- (r_{o} - 1 / 2)}) U_{2, n} .

The second term above is, for all 0 < q < 1/2, n^1/2O(n^−2q) = o_P (1); the third term above is O_P (n^−r_o/2) as U_1,n → A(a.s.) < ∞; and the last term above is O_P (n^{−(r_o−1/2)}) as U_2,n → C (a.s.) for some C < ∞; Thus we only need to deal with the first term above.

Let

H (x) = h (x) - {h̃}_{1} (x_{1}) - \dots - {h̃}_{1} (x_{m}) - θ, G (x) = g (x) - {g̃}_{1} (x_{1}) - \dots - {g̃}_{1} (x_{m}),

then H₁(x₁) ≔ E[H(X)|X₁ = x₁] = 0, i.e. H(x) is a degenerate kernel. Similarly, G(x) is degenerate, so is K(x) = H(x) − A′Ω⁻¹G(x), with E_{F_m}K(X) = 0 and r_k ≔ rank(K) ≥ 2. Now we have

\sqrt{n} (Ũ_{n} - θ) = \sqrt{n} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} ({h̃}_{1} (X_{i_{1}}) + \dots + {h̃}_{1} (X_{i_{m}}) - A' Ω^{- 1} [{g̃}_{1} (X_{i_{1}}) + \dots + {g̃}_{1} (X_{i_{m}})]) + \sqrt{n} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} K (X_{i}) + O_{P} (n^{- 1 / 2}) + o_{P} (1) = \sqrt{n} \frac{m}{n} \sum_{i = 1}^{n} ({h̃}_{1} (X_{i}) - A' Ω^{- 1} {g̃}_{1} (X_{i})) + \sqrt{n} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} K (X_{i}) + O_{P} (n^{- 1 / 2}) + o_{P} (1) .

Let K̃_c be the canonical forms of K, and $ξ_{c}^{2} = E_{F_{m}} K_{c}^{2} (X) < \infty$ by the given conditions, (c = r_k, …, m). So by Hoeffding’s formula,

Var (\sqrt{n} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} K (X_{i})) = n \sum_{c = r_{k}}^{m} {(C_{m}^{c})}^{2} {(C_{n}^{c})}^{- 1} ξ_{c}^{2} = O (n^{- (r_{k} - 1)}) \to 0,

and so $\sqrt{n} {(C_{n}^{m})}^{- 1} \sum_{i \in D_{n, m}} K (X_{i}) \overset{P}{\to} 0$ . We get

\sqrt{n} (Ũ_{n} - θ) = \sqrt{n} \frac{m}{n} \sum_{i = 1}^{n} ({h̃}_{1} (X_{i}) - A' Ω^{- 1} {g̃}_{1} (X_{i})) + o_{P} (1) + O_{P} (n^{- 1 / 2}) .

Since Var[m(h̃₁(X₁) − A′Ω⁻¹ g̃₁(X₁))] = σ² if $r_{o} = 1, and = m^{2} η_{1}^{2}$ when r_o > 1 (in this case g̃₁(·) ≡ 0). Now the result follows from the standard CLT and Slutsky’s theorem.

(ii) We have, since U_2,n = O_P (1),

Ũ_{n} - θ = U_{n} - θ - U_{1, n}^{'} Ω^{- 1} U_{0, n} + O_{P} (n^{- (r_{o} + 1) / 2}) U_{1, n} + O_{P} (n^{- r_{0}}) .

By Theorem 4.4.2 in KB, $n^{r / 2} (U_{n} - θ) \overset{D}{\to} C_{m}^{r} J_{r} ({h̃}_{r})$ , or U_n − θ = O_P (n^−r/2). Similarly in summary we have

U_{n} - θ = O_{p} (n^{- r / 2}), U_{0, n} = O_{P} (n^{- r_{0} / 2}), U_{1, n} - A = O_{P} (n^{- r_{1} / 2}) .

Also, U_1,n → A (a.s.), and when r₀ = 1, $\sqrt{n} U_{0, n} \overset{D}{\to} N (0, m^{2} Ω_{1})$ .

First we consider the case A ≠ 0. In this case, U_n − θ = O_P (n^−r/2), $U_{1, n}^{'} Ω^{- 1} U_{0, n} = O_{P} (n^{- r_{o} / 2})$ and O_P (n^−(r_o+1)/2)U_1,n + O_P (n^−r_o) = o_P (n^−r_o/2).

Thus when r_o < r, we have

r_{o}^{1 / 2} (Ũ_{n} - θ) = - U_{1, n}^{'} Ω^{- 1} n^{- r_{o} / 2} U_{0, n} + o_{P} (1) \overset{D}{\to} - C_{m}^{r_{o}} A' Ω^{- 1} J_{r_{o}} ({g̃}_{r_{o}}) .

When r_o = r,

n^{r / 2} (Ũ_{n} - θ) = n^{r / 2} (U_{n} - θ) - U_{1, n}^{'} Ω^{- 1} n^{r / 2} U_{0, n} + o_{P} (1) \overset{D}{\to} C_{m}^{r} J_{r} ({h̃}_{r} - A' Ω^{- 1} {g̃}_{r}) .

When r_o > r,

n^{r / 2} (Ũ_{n} - θ) = n^{r / 2} (U_{n} - θ) + o_{P} (1) \overset{D}{\to} C_{m}^{r} J_{r} ({h̃}_{r}) .

Now we consider the case A = 0, then $U_{1, n}^{'} Ω^{- 1} U_{0, n} = O_{P} (n^{- (r_{1} + r_{o} / 2})$ , and

Ũ_{n} - θ = U_{n} - θ - U_{1, n}^{'} Ω^{- 1} U_{0, n} + O_{P} (n^{- (r_{1} + r_{o} + 1) / 2}) + O_{P} (n^{- r_{0}}) .

When r_o ≤ min{r₁, r/2}, Ũ_n − θ = O_P (n^−r_o), and its distribution needs more accurate expansion to evaluate. When r < min{2r_o, r₁ + r_o},

n^{- r / 2} (Ũ_{n} - θ) = n^{- r / 2} (U_{n} - θ) + o_{P} (1) \overset{D}{\to} C_{m}^{r} J_{r} ({h̃}_{r}) .

When r₁ + r_o < r or r₁ < r_o,

n^{- (r_{o} + 1) / 2} (Ũ_{n} - θ) = - n^{- r_{1} / 2} U_{1, n}^{'} Ω^{- 1} n^{- r_{o} / 2} U_{0, n} + o_{P} (1) \overset{D}{\to} C_{m}^{r_{1}} C_{m}^{r_{o}} J_{r_{1}} ({q̃}_{r_{1}}) Ω^{- 1} J_{r_{o}} ({g̃}_{r_{o}}) .

When r₁ + r_o = r,

n^{- r / 2} (Ũ_{n} - θ) = n^{- r / 2} (U_{n} - θ) - n^{- r_{1} / 2} U_{1, n}^{'} Ω^{- 1} n^{- r_{o} / 2} U_{0, n} + o_{P} (1) \overset{D}{\to} C_{m}^{r} J_{r} ({h̃}_{r}) - C_{m}^{r_{1}} C_{m}^{r_{o}} J_{r_{1}} ({q̃}_{r_{1}}) Ω^{- 1} J_{r_{o}} ({g̃}_{r_{o}}) .

Proof of Corollary 1. In this case we have h₁(X₁) = a[X₁ + (m − 1)μ], h̃₁(X₁) = a(X₁ − ν), $η_{1}^{2} = a^{2} τ^{2}$ . Also, g₁(X₁) = b(X₁ − μ) = g̃₁(X₁), A₁ = abτ², $A = a b E [\sum_{k = 1}^{m} X_{k} \sum_{k = 1}^{m} (X_{k} - μ)] = m a b τ^{2}$ , Ω₁ = b²τ², $Ω = b^{2} E [\sum_{k = 1}^{m} (X_{k} - μ) \sum_{k - 1}^{m} (X_{k} - μ)] = m b^{2} τ^{2}$ and r_o = 1. So by Theorem 2(i) we have σ² = m²(a²τ² − 2a²τ² + a²τ²) = 0.

Proof of Theorem 3. (i) Note θ = E_{F_m}h(X). The information bound is for parameter of the form E_F (s(X₁)) for some s(·). Recall h₁(x₁) = E[h(X₁, …, X_m)|X₁ = x₁] and E_F (h₁(X₁)) = θ, thus we take s(·) = h₁(·). Similarly, the constraint for computing the information bound should be a uni-variate function, we take it to be g₁(x₁).

Let f (x) be the density/mass function of F (x) with respect to some dominating measure μ(x), denote γ (f) = ∫h₁(x)f (x)dμ(x) = θ as a functional of f, γ̇ (f)(x) be the adjoint (evaluated at 1) of its pathwise derivative with respect to log f (for definition, see, for example, [6]), γ₁(f) = E_f [g₁(X)] for the side information constraint, γ̇₁(f)(x) the adjoint (evaluated at 1) of its pathwise derivative, L_2,d,r (f) = {s(x) : s : R^d → R^r, E_f [s(X)s′(X)] < ∞}, for s₁ ∈ L_2,d,k(f) and s₂ ∈ L_2,d,r (f), define the inner product (matrix) $〈 s_{1}, s_{2} 〉 = E_{f} [s_{1} (X) s_{2}^{'} (X)] = \int s_{1} (x) s_{2}^{'} (x) f (x) d μ (x)$ , the norm (matrix) ‖s₁‖² = 〈s₁, s₁〉 and ‖s₁‖⁻² ≔ (‖s₁‖²)⁻¹ when ‖s₁‖² is non-degenerate.

By Proposition A.5.2 in [6], we have γ̇ (f) = h₁(X) − θ = h̃₁(X) and γ̇₁(f) = g₁(X) = g̃₁(X). Let ∏(υ|υ₁) be the projection of υ onto [υ₁], the linear span of υ₁ with respect to f and μ, and $υ_{1}^{⊥}$ the orthogonal complement of [υ₁] with respect to f and μ. Without side information, the efficient influence function ℐ(X, γ (f)) for estimating γ (f) is ℐ(X, γ (f)) = γ̇ (f) and the information bound is ‖ℐ(X, γ (f))‖². In the presence of side information γ₁(f), by Example 3.2.3 in [6], the efficient influence function ℐ(X, γ (f)|γ₁(f)) for estimating γ (f) is

ℐ (X, γ (f) | γ_{1} (f)) = Π (γ̇ (f) | {γ̇}_{1} {(f)}^{⊥}) = γ̇ (f) - Π (γ̇ (f) | {γ̇}_{1} (f)) = γ̇ (f) - 〈 γ̇ (f), {γ̇}_{1} (f) 〉 {‖ {γ̇}_{1} (f) ‖}^{- 2} {γ̇}_{1} (f) = {h̃}_{1} (X) - A_{1}^{'} Ω_{1}^{- 1} {g̃}_{1} (X)

and the information (lower) bound for estimating θ, with side information g, is $𝕀 (θ | g) = {‖ ℐ (X, γ (f) | γ_{1} (f)) ‖}^{2} = η_{1}^{2} - A_{1}^{'} Ω_{1}^{- 1} A_{1}$ .

When g(x) = (g(x₁) + ⋯ + g(x_m))/m, we have g̃₁(x₁) = g₁(x₁) = E[g(x₁, …, x_m)|x₁] = g(x₁)/m, A = E[g(X)h(X)] = E[g(X₁)h₁(X₁)] = mE[g₁(X₁)h₁(X₁)] = mE[ g̃₁(X₁)h̃₁(X₁)] = mA₁, $Ω = E [g (X) g' (X)] = E [g (X_{1}) g' (X_{1})] / m = m E [{g̃}_{1} (X_{1}) {g̃}_{1}^{'} (X_{1})] = m Ω_{1}$ , thus

σ^{2} = m^{2} (η_{1}^{2} - 2 A' Ω^{- 1} A_{1} + A' Ω^{- 1} Ω_{1} Ω^{- 1} A) = m^{2} 𝕀 (θ | g) .

Since m² is a known positive constant, we can just divide Ũ_n by m so that its asymptotic variance is 𝕀(θ|g), and thus it is efficient.

Since σ² = ‖∏(γ̇ (f)| γ̇₁(f)^⊥)‖² ≥ 0, with “=” iff γ̇ (f) = h̃₁(X) ∈ [γ̇₁(f)], the linear span of γ̇₁(f) = g̃₁(X), or θ is completely determined by g̃₁(X), which is impossible. Also ${‖ \prod (γ̇ (f) | {γ̇}_{1} {(f)}^{⊥}) ‖}^{2} \leq {‖ γ̇ (f) ‖}^{2} = η_{1}^{2}$ , with “=” iff γ̇ (f) ∈ [γ̇₁(f)^⊥], or 0 = 〈γ̇ (f), γ̇₁(f)〉 = A.

(ii) Let f (x|θ, g) be the density function given the parameter θ and the information constraint g, S(x|θ, g) = ∂ log f (x|θ, g)/∂θ be the corresponding score function. The corresponding Fisher information is I(θ|g) = ‖S(X|θ, g)‖². Although S(x|θ, g), hence I(θ|g), is not directly available, the corresponding efficient influence function ℐ(X, γ (f)|γ₁(f)) is given in (i), and we have the following relationship between the information bound 𝕀(θ|g) and the Fisher information I(θ|g)

𝕀 (θ | g) = {‖ ℐ (X, γ (f) | γ_{1} (f)) ‖}^{2} = η_{1}^{2} - A' Ω^{- 1} A = I^{- 1} (θ | g) .

Let $L (X^{n} | θ, g) = \sum_{i = 1}^{n} log f (X_{i} | θ, g)$ be the log-likelihood, we have the following locally asymptotic normality [22] of the likelihood ratio

λ_{n} ≔ L (X^{n} | θ_{n}) - L (X^{n} | θ) = b V_{n} - b^{2} I (θ | g) / 2 + o_{P} (1),

where $V_{n} = n^{- 1 / 2} \sum_{i = 1}^{n} S (X_{i} | θ, g) \overset{D}{\to} V ~ N (0, I (θ | g))$ .

Let ϕ_Y (t) = E[exp{itY}] be the characteristic function of a random variable Y. We are to show lim_n ϕ_{W_n} (t) = ϕ_U (t)ϕ_Z (t). In fact, by assumption of regularity,

ϕ_{W_{n}} (t) = E_{f (\cdot | θ)} [exp {i t W_{n}}] = E_{f (\cdot | θ_{n})} [exp {i t (W_{n} - b)}] = E_{f (\cdot | θ)} [exp {i t (W_{n} - b) + λ_{n}}] \to E [exp {i t (W - b) + b V - b^{2} I (θ | g) / 2}],

where the last step above is by the same argument as in [4]. Since b ∈ C is arbitrary, take b = −itI⁻¹(θ|g), we get

i t (W - b) + b V - b^{2} I (θ | g) / 2 = i t (W - I^{- 1} (θ | g) V) - I^{- 1} (θ | g) t^{2} / 2 = i t (W - 𝕀 (θ | g) V) - 𝕀 (θ | g) t^{2} / 2,

thus

lim_{n} ϕ_{W_{n}} (t) = E [exp {i t (W - 𝕀 (θ | g) V)}] exp {- 𝕀 (θ | g) t^{2} / 2} = ϕ_{W - 𝕀 (θ | g) V} (t) ϕ_{Z} (t) .

Now take U = W − 𝕀(θ|g)V, the proof is complete.

Proof of Theorem 4. (i) Denote the related U-statistics as functions of h, and note the O(·) terms in the lemma are independent of h. Note U_1,n is a functional of gh, U_2,n is a functional of $(1_{d}^{'} g + {‖ g ‖}^{2}) h$ , θ and U_n are functionals of h, and U_0,n is a functional of g. As in the proof of Theorem 1(ii), we have

{P̃}_{n, m} h - P_{m} h = Ũ_{n} (h) - θ (h) = U_{n} (h) - θ (h) - U_{1, n}^{'} (g h) Ω^{- 1} U_{0, n} (g) + O (ρ_{n} n^{- 1 / 2} {(log log n)}^{1 / 2}) U_{1, n} (g h) + O (ρ_{n}^{2}) U_{2, n} ((1_{d} g + {‖ g ‖}^{2}) h), (a . s .) .

Since U_0,n(g) → 0 (a.s.) and is independent of h, we only need to show, a.s.,

sup_{h \in ℋ} | U_{n} (h) - θ (h) | \to 0; sup_{h \in ℋ} | U_{1, n} (g h) | < \infty,; and sup_{h \in ℋ} | U_{2, n} ((1_{d}^{'} g + {‖ g ‖}^{2}) h) | < \infty .

Similarly, since ‖g‖²h ∈ ℋ for all h ∈ ℋ, and U_2,n(h) = U_n(h), we have sup_h∈ℋ |U_2,n((1_dg + ‖g‖²)h)| < ∞ (a.s.), if sup_h∈ℋ |U_n(h) − θ(h)| → 0 (a.s.).

Now we only need to prove sup_h∈ℋ |U_n(h) − θ(h)| → 0 (a.s.), (the class ℋ is then called P-Glivenko–Cantelli). Since the property of P-Glivenko–Cantelli is permanent for finite union of classes, we only need to prove this on ℋ₁ and ℋ₂ separately.

We first prove ℋ₁ is P-Glivenko–Cantelli. For ε > 0, let N_{[ ]}(ε, ℋ₁, L₁(P_m)) be the bracketing entropy of the class ℋ₁ with L₁(P) norm: ∀h ∈ ℋ₁, ‖h‖_{P_m} = E_{P_m}|h|. We first prove that if N_{[ ]}(ε, ℋ₁, L₁(P_m)) < ∞ for all ε > 0, then the conclusion is true. In fact, given ε > 0, since N_{[ ]}(ε, ℋ₁, L₁(P_m)) < ∞, there are finite many ε-brackets [l_i, u_i] whose union covers ℋ₁ and such that P_m(u_i − l_i) < ε for all i. Then for any h ∈ ℋ₁, there is an upper bracket u_i such that

U_{n} (h) - P_{m} (h) = (P_{n, m} - P_{m}) h = (P_{n, m} - P_{m}) u_{i} + (P_{m} - P_{n, m}) (u_{i} - h) \leq (P_{n, m} - P_{m}) u_{i} + P_{m} (u_{i} - h) \leq (P_{n, m} - P_{m}) u_{i} + ε .

Consequently,

sup_{h \in ℋ_{1}} (U_{n} (h) - P_{m} (h)) \leq max_{i} (P_{n, m} - P_{m}) u_{i} + ε .

Since by SLLN for U-statistics, (P_n,m − P_m)u_i → 0 (a.s.), thus lim_n sup_h∈ℋ₁ (U_n(h) − P_m(h)) ≤ ε (a.s.). Similarly, lim_n inf_h∈ℋ₁ (U_n(h) − P_m(h)) ≥ −ε (a.s.). Since ε > 0 is arbitrary, we get sup_h∈ℋ₁ |U_n(h) − P_m(h)| → 0 (a.s.).

Now we show N_{[ ]}(ε, ℋ₁, L₁(P_m)) < ∞ for all ε > 0. Let $I_{j}^{1} = {x \in R^{m} : ‖ x - I_{j} ‖ < 1}$ . By Corollary 2.7.4 in VW, for some constant K depending only on υ and m,

log N_{[]} (ε, ℋ_{1}, L_{1} (P_{m})) \leq K ε^{- υ} {(\sum_{j = 1}^{\infty} λ {(I_{j}^{1})}^{\frac{1}{υ + 1}} M_{j}^{\frac{υ}{υ + 1}} P_{m} {(I_{j})}^{\frac{υ}{υ + 1}})}^{υ + 1},

for every ε > 0, υ ≥ 1 and probability P. The given condition implies that max_j $λ (I_{j}^{1}) < \infty$ . Take υ = 1, the right hand side above is finite by the given condition.

Now we show sup_h∈ℋ₂ |U_n(h) − θ(h)| → 0 (a.s.). Let, for ε > 0, N(ε, ℋ₂, L₁(Q)) be the entropy of ℋ₂ without bracketing under norm L₁(Q) for some probability measure Q and N(ε, ℋ₂, ‖·‖_∞) be that under norm ‖·‖_∞. Recall H is also an envelope function on ℋ₂. Since L₁(Q) ≤ ‖·‖_∞, N(ε ‖H‖_Q, ℋ₂, L₁(Q)) ≤ N(ε ‖H‖_Q, ℋ₂, ‖·‖_∞), with ‖H‖_Q = (∫ H²dQ)^1/2. Let M be the bound on ℋ₂, and ℋ̃₂ = {(h − inf_x∈C h(x))/M : h ∈ ℋ₂}, then ℋ̃₂ is the class of convex functions h : C ↦ [0, 1] with Lipschitz constant L/M, and N(ε, ℋ₂, ‖·‖_∞) = N(εM, ℋ̃₂, ‖·‖_∞). By Corollary 2.7.10 in VW, for any ε > 0,

log N (ε M, {ℋ̃}_{2}, {‖ \cdot ‖}_{\infty}) \leq K {(1 + L / M)}^{m / 2} M^{- m / 2} ε^{- m / 2},

for any probability measure Q, and K only depends on m and C. Also, since H is an envelope function over ℋ₂ ≠ {0}, thus inf_Q ‖H‖_Q ≥ δ for some δ > 0, and the infimum is over 𝒬 of all probability measures Q on C, with ‖H‖_Q < ∞. Thus we have, for any ε > 0,

log N (ε {‖ H ‖}_{P_{m}}, ℋ_{2}, L_{1} (P_{m})) \leq sup_{Q \in 𝒬} log N (ε {‖ H ‖}_{Q}, ℋ_{2}, L_{1} (Q)) \leq log N (ε δ, ℋ_{2}, {‖ \cdot ‖}_{\infty}) = log N (ε δ M, {ℋ̃}_{2}, {‖ \cdot ‖}_{\infty}) \leq K {(1 + L / M)}^{m / 2} M^{- m / 2} {(ε δ)}^{- m / 2} < \infty,

also, ℋ₂ is P-measurable by its definition, thus ℋ₂ is P-Glivenko–Cantelli (cf. the statement in lines −5 to −3, p. 84 of VW).

(ii) The class ℋ with the stated property is called P-Donsker. First, it is apparent that for any k and h₁, …, h_k ∈ ℋ, $({𝔾̃}_{n, m} h_{1}, \dots, {𝔾̃}_{n, m} h_{k})' \overset{D}{\to} (𝔾 h_{1}, \dots, 𝔾 h_{k})'$ for the Gaussian process 𝔾 on ℋ as stated. So by Theorem 1.5.4 in VW, we only need to show that {𝔾̃_n,m} is asymptotically tight on ℋ. Using Lemma (ii) and similar argument as in the proof of (i), since $\sqrt{n} ρ_{n}^{2} = o (1)$ and $\sqrt{n} ρ_{n} n^{- 1 / 2} {(log log n)}^{1 / 2} = o (1)$ , we only need to show this for {𝔾_n,m}, and by Theorem 1.5.7 in VW, we only need to show that {𝔾_n,m} is asymptotically equicontinuous and totally bounded on ℋ. Below we will show that if

\int_{0}^{\infty} sup_{Q \in 𝒬} \sqrt{log N (ε {‖ H ‖}_{Q, 2}, ℋ, L_{2} (Q))} d ε < \infty

(A.2)

then {𝔾_n,m} is asymptotically equicontinuous and totally bounded on ℋ, where 𝒬 is the collection of all measures Q with ‖H‖_Q < ∞.

With (A.2), Theorem 2.5.2 in VW asserted the corresponding conclusion for empirical measures. Now we extend the result to U-statistics. For this, we point out that the symmetrization Lemma 2.3.1 in VW still holds for U-statistics, also Hoeffding’s inequality holds for U-statistics (Arcones and Giné [2, Proposition 2.3, p. 1501]), thus the proofs there are valid in our situation.

To check (A.2) on ℋ, and we only need to check it for ℋ₁ and ℋ₂ separately. Using Corollary 2.7.4 in VW, we have

log N_{[]} (ε, ℋ_{1}, L_{2} (P)) \leq K ε^{- υ} {(\sum_{j = 1}^{\infty} λ {(I_{j}^{1})}^{\frac{2}{υ + 2}} M_{j}^{\frac{2 υ}{υ + 2}} P^{\frac{υ}{υ + 2}} (I_{j}))}^{\frac{υ + 2}{2}},

for all ε > 0, υ ≥ m/s. Since the given condition implies max_j $λ (I_{j}^{1}) < \infty$ , in the above inequality choose υ = m/s, then $\sum_{j = 1}^{\infty} λ {(I_{j}^{1})}^{\frac{2}{υ + 2}} M_{j}^{\frac{2 υ}{υ + 2}} P^{\frac{υ}{υ + 2}} (I_{j}) < \infty$ by the given condition, and since υ < 2, we have

\int_{0}^{1} \sqrt{log N_{[]} (ε, ℋ_{1}, L_{2} (P))} d ε < \infty,

hence by the statement in p. 85 in VW, ℋ₁ satisfies (A.2). The original statement in VW is for the integral $\int_{0}^{\infty}$ . Since ℋ₁ has a square integrable envelope function H, so ∀h₁, h₂ ∈ ℋ, ‖h₁ − h₂‖_L₂(P) ≤ ‖h₁‖_L₂(P) + ‖h₂‖_L₂(P) ≤ 2‖H‖_L₂(P) < ∞, i.e., ℋ₁ itself is a ball with radius no greater than 2‖H‖_L₂(P), or N_{[ ]}(ε, ℋ₁, L₂(P)) = 1 for ε ≥ 2‖H‖_L₂(P), thus its entropy is zero for ε ≥ 2‖H‖_L₂(P), so the integration $\int_{0}^{\infty}$ is finite iff $\int_{0}^{1}$ is finite.

For ℋ₂, similarly as in the proof of (i), for some η > 0,

sup_{Q \in 𝒬} log N (ε {‖ H ‖}_{Q}, ℋ_{2}, L_{2} (Q)) \leq K {(1 + L / M)}^{m / 2} M^{- m / 2} {(ε η)}^{- m / 2} .

Since m < 4, so

\int_{0}^{1} sup_{Q \in 𝒬} \sqrt{log N (ε {‖ H ‖}_{Q}, ℋ_{2}, L_{2} (Q))} d ε \leq K^{1 / 2} {(1 + L / M)}^{m / 4} M^{- m / 4} η^{- m / 4} \int_{0}^{1} ε^{- m / 4} d ε < \infty,

thus by (2.1.7) in VW, ℋ₂ is P-Donsker.

Proof of Corollary 2. From proof of Theorem 4(i), we only need to show sup_h∈ℋ |P_n,m − P_mh| → 0 (a.s.), which is true by Corollary 3.3, or 3.5 respectively in [2].

Proof of Theorem 5. (i) As in the proofs of the previous theorems, with g replaced by G, since r_o ≔ min{rank(g₁), …, rank(g_d), rank(h)} = 1, by Lemma (ii) we have

w_{i} = \frac{1}{C_{n}^{m}} (1 - G' (X_{i} | θ) Λ^{- 1} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G (X_{j} | θ) + [1_{d + 1}^{'} G (X_{i} | θ) + {‖ G (X_{i} | θ) ‖}^{2}] O_{P} (n^{- 1})),

and by standard U-statistics theory,

\sqrt{n} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G (X_{j} | θ) \overset{D}{\to} N (0, m^{2} Λ_{1}), \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} G (X_{i} | θ) G' (X_{i} | θ) = Λ + O_{P} (n^{- 1 / 2}) .

Also, $t ≔ Λ^{- 1} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G (X_{j} | θ) + O_{P} (n^{- 1}) = O_{P} (n^{- 1 / 2}), \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} {‖ G (X_{i} | θ) ‖}^{2} \overset{a . s .}{\to} E {‖ G (X | θ) ‖}^{2} < \infty$ , and max_i $| t' G (X_{i} | θ) | = O_{P} (n^{- 1 / 2} n^{m / α}) \overset{P}{\to} 0$ since m/α < r_o/2 = 1/2, so

- \frac{2 n}{C_{n}^{m}} R_{G} (θ) = \frac{2 n}{C_{n}^{m}} \sum_{i \in D_{n, m}} log [1 + t' G (X_{i} | θ)] = \frac{2 n}{C_{n}^{m}} \sum_{i \in D_{n, m}} (t' G (X_{i} | θ) - \frac{1}{2} t' G (X_{i} | θ) G (X_{i} | θ) t + o_{P} (n^{- 1}) {‖ G (X_{i} | θ) ‖}^{2}) = 2 n \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G' (X_{j} | θ) Λ^{- 1} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} G (X_{i} | θ) - \frac{n}{C_{n}^{m}} \sum_{i \in D_{n, m}} (\frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G' (X_{j} | θ) Λ^{- 1}) \times G (X_{i} | θ) G' (X_{i} | θ) (Λ^{- 1} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G (X_{j} | θ)) + o_{P} (n^{- 1}) \frac{n}{C_{n}^{m}} \sum_{i \in D_{n, m}} {‖ G (X_{i} | θ) ‖}^{2} = 2 n \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G' (X_{j} | θ) Λ^{- 1} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} G (X_{i} | θ) - n (\frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G' (X_{j} | θ) Λ^{- 1}) \times \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} G (X_{i} | θ) G' (X_{i} | θ) (Λ^{- 1} \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G (X_{j} | θ)) + o_{P} (1) = 2 n \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G' (X_{j} | θ) Λ^{- 1} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} G (X_{i} | θ) - n \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G (X_{j} | θ) Λ^{- 1} \times (Λ + O_{P} (n^{- 1 / 2})) Λ^{- 1} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} G' (X_{i} | θ) + o_{P} (1) = n \frac{1}{C_{n}^{m}} \sum_{j \in D_{n, m}} G' (X_{j} | θ) Λ^{- 1} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} G (X_{i} | θ) - O_{P} (n^{- 1 / 2}) + o_{P} (1) .

This completes the proof since

\sqrt{n} m^{- 1} Λ_{1}^{- 1 / 2} \frac{1}{C_{n}^{m}} \sum_{i \in D_{n, m}} G (X_{i} | θ) \overset{D}{\to} N (0, I_{r + 1}) .

(ii) This is a special case of (i).

References

1.Adimari G. Empirical likelihood type confidence intervals under random censorship. Annals of the Institute of Statistical Mathematics. 1997;49:447–466. [Google Scholar]
2.Arcones MA, Giné E. Limit theorems for U-processes. Annals of Probability. 1993;21(3):1494–1542. [Google Scholar]
3.Baggerly KA. Empirical likelihood as a goodness-of-fit measures. Biometrika. 1998;85:535–547. [Google Scholar]
4.Begun JM, Hall WJ, Huang W, Weller JA. Information and asymptotic efficiency in parametric–nonparametric models. Annals of Statistics. 1983;11:432–452. [Google Scholar]
5.Berk RH. Limiting behavior of posterior distributions when the model is incorrect. Annals of Mathematical Statistics. 1966;37:51–58. [Google Scholar]
6.Bickel PJ, Klaassen CA, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. Baltimore, Maryland: Johns Hopkins University Press; 1993. [Google Scholar]
7.Borovskich YuV. Institute of Mathematics, Ukraine. Acad. Sci., Kiev. Russia: 1986. Theory of U-Statistics in Hilbert Space. [Google Scholar]
8.Chen SX. Empirical likelihood for nonparametric density estimation. Australian Journal of Statistics. 1997;39:47–56. [Google Scholar]
9.Chen SX, Hall P. Smoothed empirical likelihood confidence intervals for quantiles. Annals of Statistics. 1993;21:1166–1181. [Google Scholar]
10.Chen SX, Qin YS. Empirical likelihood confidence intervals for local linear smoothers. Biometrika. 2000;87:946–953. [Google Scholar]
11.Giné E. Lectures on Probability Theory and Statistics, Sanit Flour 1996, in: Lecture Notes in Mathematics. vol. 1665. Berlin: Springer; 1997. Decoupling and limit theorems for U-statistics and U-processes; pp. 1–35. [Google Scholar]
12.Giné E, Zinn J. Probability, Banach Spaces. vol. 8. Boston: Birkhäuser; 1992. Marcinkiewicz type laws of large numbers and convergence of moments for U-statistics; pp. 273–291. [Google Scholar]
13.Gregory G. Large sample theory for U-statistics and tests of fit. Annals of Statistics. 1977;5:110–123. [Google Scholar]
14.Hájek J. A characterization of limiting distributions of regular estimates. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete. 1970;14:323–330. [Google Scholar]
15.Hoeffding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics. 1948;19:293–325. [Google Scholar]
16.Hoeffding W. The strong law of large numbers for U-statistics. Inst. Statist. Mimeo. Ser. 1961;(302):1–10. [Google Scholar]
17.Janson S. The asymptotic distribution of degenerate U-statistics. Department of Mathematics, University of Uppsala; 1979. pp. 1–17. Preprint No. 5. [Google Scholar]
18.Jing BY, Yuan J, Zhou W. Empirical likelihood for U-statistics. Journal of the American Statistical Association. 2009;104:1224–1232. [Google Scholar]
19.Kitamura Y. Empirical likelihood methods in econometrics: theory and practice. Discussion Paper 1569, Cowles Foundations for Research in Economics. 2006 [Google Scholar]
20.Kolaczyk ED. Empirical likelihood for generalized linear models. Statistica Sinica. 1994;4:199–218. [Google Scholar]
21.Koroljuk VS, Yu V, Borovskich . Theory of U-Statistics. The Netherlands: Kluwer Academic Publishers; 1994. [Google Scholar]
22.LeCam L. Publ. Statist. vol. 3. Univ. California Press; 1960. Locally Asymptotically Normal Families of Distributions; pp. 37–98. [Google Scholar]
23.Nolan D, Pollard D. U-process: rates of convergence. Annals of Statistics. 1987;15:780–799. [Google Scholar]
24.Nolan D, Pollard D. Functional limit theorems for U-process. Annals of Statistics. 1988;16:1291–1298. [Google Scholar]
25.Owen AB. Empirical likelihood ratio confidence intervals for a single functional. Biometrika. 1988;75:237–249. [Google Scholar]
26.Owen AB. Empirical likelihood confidence regions. Annals of Statistics. 1990;18:90–120. [Google Scholar]
27.Owen AB. Empirical likelihood for linear models. Annals of Statistics. 1991;19:1725–1747. [Google Scholar]
28.Qin J, Lawless JL. Empirical likelihood and general estimating equations. Annals of Statistics. 1994;22:300–325. [Google Scholar]
29.Qin GS, Tsao M. Empirical likelihood based inference for the derivative of the nonparametric regression function. Bernoulli. 2005;11:715–735. [Google Scholar]
30.Qin J, Zhang B. Marginal likelihood, conditional likelihood and empirical likelihood: connections and applications. Biometrika. 2005;92:251–270. [Google Scholar]
31.Qin GS, Zhou XH. Empirical likelihood inference for the area under ROC curve. Biometrics. 2006;62:613–622. doi: 10.1111/j.1541-0420.2005.00453.x. [DOI] [PubMed] [Google Scholar]
32.Rubin H, Vitale RA. Asymptotic distribution of symmetric statistics. Annals of Statistics. 1980;8:165–170. [Google Scholar]
33.Sen PK. Almost sure behavior of U-statistics and von Mises’ differentiable statistical functions. Annals of Statistics. 1974;2:387–395. [Google Scholar]
34.Serfling R. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980. [Google Scholar]
35.Thomas D, Grunkemeier G. Confidence interval estimation of survival probabilities of censored data. Journal of the American Statistical Association. 1975;70:865–871. [Google Scholar]
36.van der Vaart A, Wellner J. Weak Convergence and Empirical Processes: With Applications to Statistics. New York: Springer-Verlag; 1996. [Google Scholar]
37.von Mises R. On the asymptotic distributions of differentiable statistical functions. Annals of Mathematical Statistics. 1947;18:309–348. [Google Scholar]
38.Wood ATA, Do KA, Broom NM. Sequential linearization of empirical likelihood constraints with application to U-statistics. Journal of Computational and Graphical Statistics. 1996;5:365–385. [Google Scholar]
39.Zhang B. A note on kernel density estimation with auxiliary information. Communications in Statistics-Theory and Methods. 1998;27:1–11. [Google Scholar]

[R1] 1.Adimari G. Empirical likelihood type confidence intervals under random censorship. Annals of the Institute of Statistical Mathematics. 1997;49:447–466. [Google Scholar]

[R2] 2.Arcones MA, Giné E. Limit theorems for U-processes. Annals of Probability. 1993;21(3):1494–1542. [Google Scholar]

[R3] 3.Baggerly KA. Empirical likelihood as a goodness-of-fit measures. Biometrika. 1998;85:535–547. [Google Scholar]

[R4] 4.Begun JM, Hall WJ, Huang W, Weller JA. Information and asymptotic efficiency in parametric–nonparametric models. Annals of Statistics. 1983;11:432–452. [Google Scholar]

[R5] 5.Berk RH. Limiting behavior of posterior distributions when the model is incorrect. Annals of Mathematical Statistics. 1966;37:51–58. [Google Scholar]

[R6] 6.Bickel PJ, Klaassen CA, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. Baltimore, Maryland: Johns Hopkins University Press; 1993. [Google Scholar]

[R7] 7.Borovskich YuV. Institute of Mathematics, Ukraine. Acad. Sci., Kiev. Russia: 1986. Theory of U-Statistics in Hilbert Space. [Google Scholar]

[R8] 8.Chen SX. Empirical likelihood for nonparametric density estimation. Australian Journal of Statistics. 1997;39:47–56. [Google Scholar]

[R9] 9.Chen SX, Hall P. Smoothed empirical likelihood confidence intervals for quantiles. Annals of Statistics. 1993;21:1166–1181. [Google Scholar]

[R10] 10.Chen SX, Qin YS. Empirical likelihood confidence intervals for local linear smoothers. Biometrika. 2000;87:946–953. [Google Scholar]

[R11] 11.Giné E. Lectures on Probability Theory and Statistics, Sanit Flour 1996, in: Lecture Notes in Mathematics. vol. 1665. Berlin: Springer; 1997. Decoupling and limit theorems for U-statistics and U-processes; pp. 1–35. [Google Scholar]

[R12] 12.Giné E, Zinn J. Probability, Banach Spaces. vol. 8. Boston: Birkhäuser; 1992. Marcinkiewicz type laws of large numbers and convergence of moments for U-statistics; pp. 273–291. [Google Scholar]

[R13] 13.Gregory G. Large sample theory for U-statistics and tests of fit. Annals of Statistics. 1977;5:110–123. [Google Scholar]

[R14] 14.Hájek J. A characterization of limiting distributions of regular estimates. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete. 1970;14:323–330. [Google Scholar]

[R15] 15.Hoeffding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics. 1948;19:293–325. [Google Scholar]

[R16] 16.Hoeffding W. The strong law of large numbers for U-statistics. Inst. Statist. Mimeo. Ser. 1961;(302):1–10. [Google Scholar]

[R17] 17.Janson S. The asymptotic distribution of degenerate U-statistics. Department of Mathematics, University of Uppsala; 1979. pp. 1–17. Preprint No. 5. [Google Scholar]

[R18] 18.Jing BY, Yuan J, Zhou W. Empirical likelihood for U-statistics. Journal of the American Statistical Association. 2009;104:1224–1232. [Google Scholar]

[R19] 19.Kitamura Y. Empirical likelihood methods in econometrics: theory and practice. Discussion Paper 1569, Cowles Foundations for Research in Economics. 2006 [Google Scholar]

[R20] 20.Kolaczyk ED. Empirical likelihood for generalized linear models. Statistica Sinica. 1994;4:199–218. [Google Scholar]

[R21] 21.Koroljuk VS, Yu V, Borovskich . Theory of U-Statistics. The Netherlands: Kluwer Academic Publishers; 1994. [Google Scholar]

[R22] 22.LeCam L. Publ. Statist. vol. 3. Univ. California Press; 1960. Locally Asymptotically Normal Families of Distributions; pp. 37–98. [Google Scholar]

[R23] 23.Nolan D, Pollard D. U-process: rates of convergence. Annals of Statistics. 1987;15:780–799. [Google Scholar]

[R24] 24.Nolan D, Pollard D. Functional limit theorems for U-process. Annals of Statistics. 1988;16:1291–1298. [Google Scholar]

[R25] 25.Owen AB. Empirical likelihood ratio confidence intervals for a single functional. Biometrika. 1988;75:237–249. [Google Scholar]

[R26] 26.Owen AB. Empirical likelihood confidence regions. Annals of Statistics. 1990;18:90–120. [Google Scholar]

[R27] 27.Owen AB. Empirical likelihood for linear models. Annals of Statistics. 1991;19:1725–1747. [Google Scholar]

[R28] 28.Qin J, Lawless JL. Empirical likelihood and general estimating equations. Annals of Statistics. 1994;22:300–325. [Google Scholar]

[R29] 29.Qin GS, Tsao M. Empirical likelihood based inference for the derivative of the nonparametric regression function. Bernoulli. 2005;11:715–735. [Google Scholar]

[R30] 30.Qin J, Zhang B. Marginal likelihood, conditional likelihood and empirical likelihood: connections and applications. Biometrika. 2005;92:251–270. [Google Scholar]

[R31] 31.Qin GS, Zhou XH. Empirical likelihood inference for the area under ROC curve. Biometrics. 2006;62:613–622. doi: 10.1111/j.1541-0420.2005.00453.x. [DOI] [PubMed] [Google Scholar]

[R32] 32.Rubin H, Vitale RA. Asymptotic distribution of symmetric statistics. Annals of Statistics. 1980;8:165–170. [Google Scholar]

[R33] 33.Sen PK. Almost sure behavior of U-statistics and von Mises’ differentiable statistical functions. Annals of Statistics. 1974;2:387–395. [Google Scholar]

[R34] 34.Serfling R. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980. [Google Scholar]

[R35] 35.Thomas D, Grunkemeier G. Confidence interval estimation of survival probabilities of censored data. Journal of the American Statistical Association. 1975;70:865–871. [Google Scholar]

[R36] 36.van der Vaart A, Wellner J. Weak Convergence and Empirical Processes: With Applications to Statistics. New York: Springer-Verlag; 1996. [Google Scholar]

[R37] 37.von Mises R. On the asymptotic distributions of differentiable statistical functions. Annals of Mathematical Statistics. 1947;18:309–348. [Google Scholar]

[R38] 38.Wood ATA, Do KA, Broom NM. Sequential linearization of empirical likelihood constraints with application to U-statistics. Journal of Computational and Graphical Statistics. 1996;5:365–385. [Google Scholar]

[R39] 39.Zhang B. A note on kernel density estimation with auxiliary information. Communications in Statistics-Theory and Methods. 1998;27:1–11. [Google Scholar]

PERMALINK

U-statistic with side information

Ao Yuan

Wenqing He

Binhuan Wang

Gengsheng Qin

Abstract

1. Introduction

2. Incorporating side information in U-Statistics

3. The asymptotic properties of Ũ_n

3.1. Convergence rate of Ũ_n

3.2. Asymptotic distribution of Ũ_n

3.3. The optimality property of Ũ_n

3.4. The uniform SLLN and CLT of Ũ_n-processes

4. The empirical likelihood ratio for U-statistics with side information

5. Examples and simulation studies

5.1. Examples

5.2. Simulation studies

Table 1.

Table 4.

Table 2.

Table 3.

Concluding remarks

Acknowledgments

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

U-statistic with side information

Ao Yuan

Wenqing He

Binhuan Wang

Gengsheng Qin

Abstract

1. Introduction

2. Incorporating side information in U-Statistics

3. The asymptotic properties of Ũn

3.1. Convergence rate of Ũn

3.2. Asymptotic distribution of Ũn

3.3. The optimality property of Ũn

3.4. The uniform SLLN and CLT of Ũn-processes

4. The empirical likelihood ratio for U-statistics with side information

5. Examples and simulation studies

5.1. Examples

5.2. Simulation studies

Table 1.

Table 4.

Table 2.

Table 3.

Concluding remarks

Acknowledgments

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3. The asymptotic properties of Ũ_n

3.1. Convergence rate of Ũ_n

3.2. Asymptotic distribution of Ũ_n

3.3. The optimality property of Ũ_n

3.4. The uniform SLLN and CLT of Ũ_n-processes