Skip to main content
Entropy logoLink to Entropy
. 2023 Jun 30;25(7):1013. doi: 10.3390/e25071013

Robust Z-Estimators for Semiparametric Moment Condition Models

Aida Toma 1,2
Editor: Leandro Pardo
PMCID: PMC10377762  PMID: 37509960

Abstract

In the present paper, we introduce a class of robust Z-estimators for moment condition models. These new estimators can be seen as robust alternatives for the minimum empirical divergence estimators. By using the multidimensional Huber function, we first define robust estimators of the element that realizes the supremum in the dual form of the divergence. A linear relationship between the influence function of a minimum empirical divergence estimator and the influence function of the estimator of the element that realizes the supremum in the dual form of the divergence led to the idea of defining new Z-estimators for the parameter of the model, by using robust estimators in the dual form of the divergence. The asymptotic properties of the proposed estimators were proven, including here the consistency and their asymptotic normality. Then, the influence functions of the estimators were derived, and their robustness is demonstrated.

Keywords: moment condition models, divergences, robustness

1. Introduction

A moment condition model is a family M1 of probability measures, all defined on the same measurable space (Rm,B(Rm)), such that

g(x,θ)dQ(x)=0,forallQM1. (1)

The parameter θ belongs to ΘRd; the function g:=(g1,,gl) is defined on Rm×Θ, each of the gi’s being real-valued, ld, and the functions g1,,gl and 1X are supposed to be linearly independent. Denote by M1 the set of all probability measures on (Rm,B(Rm)) and

Mθ1:={QM1:g(x,θ)dQ(x)=0}, (2)

such that

M1=θΘMθ1. (3)

Let X1,,Xn be an i.i.d. sample on the random vector X with unknown probability distribution P0. We considered the problem of the estimation of the parameter θ0 for which the constraints of the model are satisfied:

g(x,θ0)dP0(x)=0. (4)

We supposed that θ0 is the unique solution of Equation (4). Thus, we assumed that information about θ0 and P0 is available in the form of ld functionally independent unbiased estimating functions, and we used this information to estimate θ0.

Among the best-known estimation methods for moment condition models, we mention the generalized method of moments (GMM) [1], the continuous updating (CU) estimator [2], the empirical likelihood (EL) estimator [3,4], the exponential tilting (ET) estimator [5], and the generalized empirical likelihood (GEL) estimators [6]. Although the EL estimator is superior to other estimators in terms of higher-order asymptotic properties, these properties hold only under the correct specification of the moment conditions. In [7] was proposed the exponentially tilted empirical likelihood (ETEL) estimator, which has the same higher-order property as the EL estimator under the correct specification, while maintaining the usual asymptotic properties such as the consistency and asymptotic normality under misspecification. The so-called information and entropy econometric techniques have been proposed to improve the finite sample performance of the GMM-estimators and tests (see, e.g., [4,5]).

Some recent methods for the estimation and testing of moment condition models are based on using divergences. Divergences between probability measures are widely used in statistics and data science in order to perform inference in models of various kinds, parametric or semiparametric. Statistical methods based on divergence minimization extend the likelihood paradigm and often have the advantage of providing a trade-off between efficiency and robustness [8,9,10,11]. A general methodology for the estimation and testing of moment condition models was developed in [12]. This approach is based on minimizing divergences in their dual form and allows the asymptotic study of the estimators, called minimum empirical divergence estimators, and of the associated test statistics, both under the model and under misspecification of the model. The approach based on minimizing dual forms of divergences was initially used in the case of parametric models, the results being published in a series of articles [13,14,15,16]. The broad class of minimum empirical divergence estimators contains in particular the EL estimator, the CU estimator, as well as the ET estimator mentioned above. Using the influence function as the robustness measure, it has been shown that the minimum empirical divergence estimators are not robust, because the corresponding influence functions are generally not bounded [17]. On the other hand, the minimum empirical divergence estimators have the same efficiency of first order, and moreover, the EL estimator, which belong to this class, is superior in higher-order efficiency. Therefore, proposing robust versions of the minimum empirical divergence estimators would bring a trade-off between robustness and efficiency. These aspects motivated our studies in the present paper.

Some robust estimation methods for moment condition models have been proposed in the literature, for example in [18,19,20,21,22]. In the present paper, we introduce a class of robust Z-estimators for moment condition models. These new estimators can be seen as robust alternatives for the minimum empirical divergence estimators. By using the multidimensional Huber function, we first define robust estimators of the element that realizes the supremum in the dual form of the divergence. A linear relationship between the influence function of a minimum empirical divergence estimator and the influence function of the estimator of the element that realizes the supremum in the dual form of the divergence led to the idea of defining new Z-estimators for the parameter of the model, by using robust estimators in the dual form of the divergence. The asymptotic properties of the proposed estimators were proven, including here the consistency and their asymptotic normality. Then, the influence functions of the estimators were derived, and their robustness is demonstrated.

The paper is organized as follows. In Section 2, we briefly recall the context and the definitions of the minimum empirical divergence estimators, these being necessary for defining the new estimators. In Section 3, the new Z-estimators for moment condition models are defined. The asymptotic properties of these estimators were proven, including here the consistency and their asymptotic normality. Then, the influence functions of the estimators were derived, and their robustness is demonstrated. The proofs of the theoretical results are deferred in Appendix A.

2. Minimum Empirical Divergence Estimators

2.1. Statistical Divergences

Let φ be a convex function defined on R and 0,-valued, such that φ(1)=0, and let PM1 be some probability measure. For any signed finite measure Q defined on the same measurable space (Rm,B(Rm)), absolutely continuous (a.c.) with respect to P, the φ divergence between Q and P is defined by

Dφ(Q,P):=φdQdP(x)dP(x). (5)

When Q is not a.c. with respect to P, we set Dφ(Q,P)=. This extension, for the case when Q is not absolutely continuous with respect to P, was considered in order to have a unique definition of divergences, appropriate for both cases—that of continuous probability laws and that of discrete probability laws.

This definition extends the one of divergences between probability measures [23], and the necessity of working with signed finite measures will be explained in Section 2.2.

Largely used in information theory, the Kullback–Leibler divergence is associated with the real convex function φ(x):=xlogxx+1 and is defined by

KL(Q,P):=logdQdPdQ.

The modified Kullback–Leibler divergence is associated with the convex function φ(x):=logx+x1 and is defined through

KLm(Q,P):=logdQdPdP.

Other divergences, largely used in inferential statistics, are the χ2 and the modified χ2 divergences, namely

χ2(Q,P):=12dQdP12dP,
χm2(Q,P):=12dQdP12dQdPdP,

these being associated with the convex functions φ(x):=12(x1)2 and φ(x):=12(x1)2/x, respectively. The Hellinger distance and the L1 distance are also φ divergences. They are associated with the convex functions φ(x):=2(x1)2 and φ(x):=|x1|, respectively.

All the preceding examples, except the L1 distance, belong to the class of power divergences introduced by Cressie and Read [24] and defined by the convex functions:

xR+*φγ(x):=xγγx+γ1γ(γ1), (6)

for γR{0,1} and φ0(x):=logx+x1, φ1(x):=xlogxx+1. The Kullback–Leibler divergence is associated with φ1, the modified Kullback–Leibler with φ0, the χ2 divergence with φ2, the modified χ2 divergence with φ1, and the Hellinger distance with φ1/2. When φγ is not defined on (,0) or when φγ is not convex, the definition of the corresponding power divergence function QM1Dφγ(Q,P) can be extended to the whole set of signed finite measures by taking the following extension of φγ:

φγ:xRφγ(x)10,(x)+(+)1,0(x).

The φ divergence between some set Ω of signed finite measures and a probability measure P is defined by

Dφ(Ω,P)=infQΩDφ(Q,P). (7)

Assuming that Dφ(Ω,P) is finite, a measure Q*Ω is called a φ-projection of P on Ω if

Dφ(Q*,P)Dφ(Q,P),forallQΩ.

2.2. Minimum Empirical Divergence Estimators

Let X1,,Xn be an i.i.d. sample on the random vector X with the probability distribution P0. The “plug-in” estimator of the φ divergence Dφ(Mθ1,P0) between the set Mθ1 and the probability measure P0 is defined by replacing P0 with the empirical measure associated with the sample. More precisely,

D^φ(Mθ1,P0)=infQMθ1Dφ(Q,Pn)=infQMθ1φdQdPn(x)dPn(x), (8)

where Pn:=1ni=1nδXi is the empirical measure associated with the sample, δx being the Dirac measure putting all mass at x. If the projection of the measure Pn on Mθ1 exists, it is a law a.c. with respect to Pn. Then, it is natural to consider

Mθ(n)={QM1:Qa.c.withrespecttoPnandi=1ng(Xi,θ)Q(Xi)=0}, (9)

and then, the plug-in estimator (8) can be written as

D^φ(Mθ1,P0)=infQMθ(n)1ni=1nφ(nQ(Xi)). (10)

The infimum in the above expression (10) may be achieved at a point situated on the frontier of the set Mθ(n), a case in which the Lagrange method for characterizing the infimum and computing D^φ(Mθ1,P0) cannot be applied. In order to avoid this difficulty, Broniatowski and Keziou [12,25] proposed to work on sets of signed finite measures and defined

Mθ:={QM:dQ=1andg(x,θ)dQ(x)=0}, (11)

where M denotes the set of all signed finite measures on the measurable space (Rm,B(Rm)), and

M:=θΘMθ. (12)

They showed that, if Q1* the projection of Pn on Mθ1 is an interior point of Mθ1 and Q* the projection of Pn on Mθ is an interior point of Mθ, then both approaches based on signed finite measures, respectively on probability measures, for defining minimum divergence estimators coincide. On the other hand, in the case when Q1* is a frontier point of Mθ1, the estimator of the parameter θ0 defined using the context of signed finite measures converges to θ0. These aspects justify the substitution of Mθ1 by Mθ.

In the following, we briefly recall the definitions of the estimators for the moment condition proposed in [12] in the context of signed finite measure sets.

Denote by g¯ the function defined on Rm×Θ and Rl+1-valued:

g¯(x,θ):=1X(x),g1(x,θ),,gl(x,θ). (13)

Given a φ divergence, when the function φ is strictly convex on its domain, denote

φ*(u):=uφ1(u)φ(φ1(u)),

the convex conjugate of the function φ. For a given probability measure PM1 and a fixed θΘ, define

Λθ(P):={tRl+1:|φ*(t0+j=1ltjgj(x,θ))|dP(x)<}. (14)

We also use the notations Λθ for Λθ(P0) and Λθ(n) for Λθ(Pn).

Supposing that P0 admits a projection Qθ* on Mθ with the same support as P0 and that the function φ is strictly convex on its domain, then the φ divergence Dφ(Mθ,P0) admits the dual representation:

Dφ(Mθ,P0)=suptΛθm(x,θ,t)dP0(x), (15)

where m(x,θ,t):=t0φ*(tg¯(x,θ)).

The supremum in (15) is unique and is reached at a point that we denote as tθ=tθ(P0):

tθ:=argsuptΛθm(x,θ,t)dP0(x). (16)

Then, Dφ(Mθ,P0), tθ, Dφ(M,P0) and θ0 can be estimated respectively by

D^φ(Mθ,P0):=suptΛθ(n)m(x,θ,t)dPn(x), (17)
t^θ:=argsuptΛθ(n)m(x,θ,t)dPn(x), (18)
D^φ(M,P0):=infθΘsuptΛθ(n)m(x,θ,t)dPn(x), (19)
θ^φ:=arginfθΘsuptΛθ(n)m(x,θ,t)dPn(x). (20)

The estimators defined in (20) are called minimum empirical divergence estimators. We refer to [12] for the complete study of the existence and of the asymptotic properties of the above estimators.

The influence functions of these estimators and corresponding robustness properties were studied in [17]. According to those results, for θΘ fixed, the influence function of the estimator t^θ is given by

IF(x;tθ,P0)=22tm(y,θ,tθ(P0))dP0(y)1tm(x,θ,tθ(P0)), (21)

where

tm(x,θ,t)=(1,0l)φ1(tg¯(x,θ))g¯(x,θ), (22)
22tm(x,θ,t)=1φ(φ1(tg¯(x,θ)))g¯(x,θ)g¯(x,θ), (23)

with the particular case θ=θ0:

IF(x;tθ0,P0)=φ(1)g¯(y,θ0)g¯(y,θ0)dP0(y)1(0,g(x,θ0)). (24)

On the other hand, the influence function of the estimator θ^φ is given by

IF(x;Tφ,P0)=θg(y,θ0)dP0(y)g(y,θ0)g(y,θ0)dP0(y)1θg(y,θ0)dP0(y)1·θg(y,θ0)dP0(y)g(y,θ0)g(y,θ0)dP0(y)1g(x,θ0). (25)

Since the function xg(x,θ) is usually not bounded, for example, when we have linear constraints, the influence function IF(x;Tφ,P0) is not bounded; therefore, the minimum empirical divergence estimators θ^φ defined in (20) are generally not robust.

Through the calculations, it can be seen that there is a connection between the influence functions IF(x;tθ0,P0) and IF(x;Tφ,P0), namely the relation

θg¯(y,θ0)dP0(y)·θt(θ0,P0)IF(x;Tφ,P0)+IF(x;tθ0,P0)=0.

Since IF(x;Tφ,P0) is linearly related to IF(x;tθ0,P0), using a robust estimator of tθ=tθ(P0) in the original duality Formula (15) would lead to a new robust estimator of θ0. This is the idea at the basis of our proposal in this paper, for constructing new robust estimators for moment condition models.

3. Robust Estimators for Moment Condition Models

3.1. Definitions of New Estimators

In this section, we define robust versions of the estimators t^θ from (18) and robust versions of minimum empirical divergence estimators θ^φ from (20). First, we define robust estimators of tθ, by using a truncated version of the function xtm(x,θ,t), and then, we insert such a robust estimator in the estimating equation corresponding to the minimum empirical divergence estimator. The truncated function is based on the multidimensional Huber function and contains a shift vector τθ and a scale matrix Aθ to calibrate tθ and, thus, tθ, which realizes the supremum in the duality formula, will also be the solution of a new equation based on the new truncated function.

For simplicity, for fixed θΘ, we also use the notation mθ(x,t):=m(x,θ,t). With this notation, tθ=tθ(P0) defined in (16) is the unique solution of the equation:

tmθ(x,tθ(P0))dP0(x)=0. (26)

Consider the system

tmθ(y,t)dP0(y)=0 (27)
Hc(A[tmθ(y,t)τ])dP0(y)=0 (28)
Hc(A[tmθ(y,t)τ])Hc(A[tmθ(y,t)τ])dP0(y)=I+1 (29)

where

Hc(y):=y·min1,cyify00ify=0 (30)

is the multidimensional Huber function, with c>0, I+1 the identity matrix, A is a (l+1)×(l+1) matrix, and τRl+1. For fixed θ, this system admits a unique solution (t,A,τ)=(tθ(P0),Aθ(P0),τθ(P0)) (according to [18], p. 17).

The multidimensional Huber function is useful to define robust estimators; it transforms each point outside a hypersphere of c radius to the nearest point of it and leaves those inside unchanged (see [26], p. 239, [27]). By applying the multidimensional Huber function to the function ytmθ(y,t), together with considering the scale matrix Aθ and the shift vector τθ, a modification is produced there, where the norm exceeds the bound c, and in the meantime, the original tθ remains the solution of the equation based on the new truncated function. For parametric models, the multidimensional Huber function was also used in other contexts, for example to define optimal Bs-robust estimators or optimal Bi-robust estimators (see [26], p. 244).

The above arguments can be used for each probability measure P from the moment condition model M1. This context allows defining the truncated version of the function ytmθ(y,t), which we denote by ψθ(y,t), such that the original tθ(P0), the solution of Equation (26), is also the solution of the equation ψθ(y,tθ(P0))dP0(y)=0.

For θ fixed and P a probability measure, the equation tmθ(y,t)dP(y)=0 has a unique solution t=tθ(P)Λθ(P) assuring the supremum in the dual form of the divergence Dφ(Mθ,P) (see [12]). For each t, we define the Aθ(t) and τθ(t) solutions of the system:

Hc(Aθ(t)[tmθ(y,t)τθ(t)])dP(y)=0 (31)
Hc(Aθ(t)[tmθ(y,t)τθ(t)])Hc(Aθ(t)[tmθ(y,t)τθ(t)])dP(y)=I+1. (32)

We define a new estimator t^θc of tθ=tθ(P0), as a Z-estimator corresponding to the ψ-function:

ψθ(x,t):=Hc(Aθ(t)[tmθ(x,t)τθ(t)]); (33)

more precisely, t^θc is defined by

ψθ(y,t^θc)dPn(y)=0ori=1nHc(Aθ(t^θc)[tmθ(Xi,t^θc)τθ(t^θc)])=0, (34)

the theoretical counterpart of this estimating equation being

ψθ(y,tθ(P0))dP0(y)=0. (35)

For a given probability measure P, the statistical functional tθc(P) associated with the estimator t^θc, whenever it exists, is defined by

ψθ(y,tθc(P))dP(y)=Hc(Aθ(tθc(P))[tmθ(y,tθc(P))τθ(tθc(P))])dP(y)=0. (36)

Note that

tθc(P0)=tθ(P0), (37)

by construction.

Remark 1.

We notice a similarity between the Z-estimator defined in (34) and the classical optimal Bs-robust estimator for parametric models from [26]. In the case of the parametric models, the M-estimator corresponding to the ψ-function (33), but defined for the classical score function t(lnft(x))=tft(x)ft(x) instead of the function tmθ(x,t) (inclusively in the system (31) and (32) defining Aθ(t) and τθ(t)), is the classical optimal Bs-robust estimator (ft(x) denotes the density corresponding to a parametric model indexed by the parameter t). The classical optimal Bs-robust estimator for parametric models has the optimal property that minimizes a measure of the asymptotic mean-squared error, among all the Fisher-consistent estimators with a self-standardized sensitivity smaller than the positive constant c.

In the following, for a given divergence, using the estimators t^θc for tθ(P0), we constructed new estimators of the parameter θ0 of the model. In Section 3.3, we prove that all the estimators t^θc are robust, and this property will be transferred to the new estimators that we define for the parameter θ0.

To define new estimators for θ0, we used the dual representation (15) of the divergence Dφ(Mθ,P0). Since

θ0=arginfθΘDφ(Mθ,P0)=arginfθΘsuptΛθmθ(y,t)dP0(y) (38)
=arginfθΘmθ(y,tθ(P0))dP0(y), (39)

θ=θ0 is the solution of the equation:

θ[m(y,θ,t(θ,P0))]dP0(y)=0, (40)

where we used the notation t(θ,P):=tθ(P). Equation (40) may be written as

θm(y,θ0,t(θ0,P0))dP0(y)+θt(θ0,P0)tm(y,θ0,t(θ0,P0))dP0(y)=0. (41)

On the basis of the definition of tθ(P0)=t(θ,P0), for θ=θ0, we have

tm(y,θ0,t(θ0,P0))dP0(y)=0; (42)

therefore, we deduce that θ=θ0 is the solution of equation:

θm(y,θ,t(θ,P0))dP0(y)=0. (43)

Using (37), namely tc(θ,P0)=t(θ,P0), we obtain that θ=θ0 is in fact the solution of equation:

θm(y,θ,tc(θ,P0))dP0(y)=0. (44)

Then, we define a new estimator θ^φc of θ0, as a plug-in estimator solution of the equation:

θm(y,θ^φc,tc(θ^φc,Pn))dPn(y)=0. (45)

For a probability measure P, the statistical functional Tc corresponding to the estimator θ^φc, whenever it exists, is defined by

θm(y,Tc(P),tc(Tc(P),P))dP(y)=0. (46)

The functional Tc is Fisher-consistent, because

Tc(P0)=θ0. (47)

This equality is obtained by using (46) for P=P0, the fact that tc(Tc(P0),P0)=t(Tc(P0),P0), and the definition of tθ(P0)=t(θ,P0) for θ=Tc(P0), all these leading to

θm(y,Tc(P0),t(Tc(P0),P0))dP0(y)+θt(Tc(P0),P0)tm(y,Tc(P0),t(Tc(P0),P0))dP0(y)=0. (48)

Since θ0 is the unique solution of Equation (41) and, according to (48), Tc(P0) would be another solution to the same equation, we deduce (47).

From (34) and (45), we have

ψθ^φc(y,tc(θ^φc,Pn))dPn(y)=0,θm(y,θ^φc,tc(θ^φc,Pn))dPn(y)=0,

and then,

ψ(y,θ^φc,t^θ^φcc)dPn(y)=0,θm(y,θ^φc,t^θ^φcc)dPn(y)=0,

with ψ(y,θ,t):=ψθ(y,t). The couple of estimators θ^φc,t^θ^φcc can be viewed as a Z-estimator solution of the above system. Denoting

Ψ(y,θ,t):=(ψ(y,θ,t),(θm(y,θ,t))), (49)

the Z-estimators θ^φc,t^θ^φcc are the solutions of the system:

Ψ(y,θ^φc,t^θ^φcc)dPn(y)=0, (50)

and the theoretical counterpart is given by

Ψ(y,θ0,tθ0)dP0(y)=0. (51)

3.2. Asymptotic Properties

In this section, we establish the consistency and the asymptotic distributions for the estimators θ^φc and t^θ^φcc. In order to prove the consistency of the estimators, we adopted the results from the general theory of Z-estimators as presented for example in [28]. Then, using the consistency of the estimators, as well as supplementary conditions, we proved that the asymptotic distributions of the estimators are multivariate normal:

Assumption A1.

  • (a) 
    There exist compact neighbourhoods Vθ0 of θ0 and Vtθ0 of tθ0 such that
    supθVθ0,tVtθ0Ψ(y,θ,t)dP0(y)<.
  • (b) 
    For any positive ε, the following condition holds
    inf(θ,t)MΨ(y,θ,t)dP0(y)>0=Ψ(y,θ0,tθ0)dP0(y),

    where M:={(θ,t)s.t.(θ,t)(θ0,tθ0)>ε}.

Proposition 1.

Under Assumption 1, θ^φc converges in probability to θ0 and t^θ^φcc converges in probability to tθ0:

Assumption 2.

  • (a) 

    Both estimators θ^φc and t^θ^φcc converge in probability to θ0 and tθ0, respectively.

  • (b) 

    The function (θ,t)ψ(x,θ,t) is C2 on some neighbourhood V(θ0,tθ0) for all x (P0 a.s.), and the partial derivatives of order two of the functions {(θ,t)ψ(x,θ,t);(θ,t)V(θ0,tθ0)} are dominated by some P0-integrable function H1(x).

  • (c) 

    The function (θ,t)m(x,θ,t) is C3 on some neighbourhood U(θ0,tθ0) for all x (P0 a.s.), and the partial derivatives of order three of the functions {(θ,t)m(x,θ,t);(θ,t)U(θ0,tθ0)} are dominated by some P0-integrable function H2(x).

  • (d) 
    θm(y,θ0,tθ0)2dP0(y) is finite, and the matrix:
    S:=S11S12S21S22, (52)

    with S11:=(tψ(y,θ0,tθ0)dP0(y)), S12:=(θψ(y,θ0,tθ0)dP0(y)), S21:=(2θtm(y,θ0,tθ0)dP0(y)) and S22:=22θm(y,θ0,tθ0)dP0(y), exists and is invertible.

Proposition 2.

Let P0 belong to the model M1, and suppose that Assumption 2 holds. Then, both n(θ^φcθ0) and n(t^θ^φcctθ0) converge in distribution to a centred multivariate normal variable with covariance matrices given by

[[S21S111S12]1S21S111]×[[S21S111S12]1S21S111], (53)

and

[S111S111S12[S21S111S12]1S21S111]×[S111S111S12[S21S111S12]1S21S111]. (54)

The condition of Type (a) from Assumption 1 is usually considered to apply the uniform law of large numbers. For many choices of divergence (for example, those from the Cressie–Read family), the function Ψ is continuous in (θ,t), and consequently, this condition is verified. The second condition from Assumption 1 is imposed for the uniqueness of (θ0,tθ0) as a solution of the equation and is verified, for example, whenever Ψ is continuous and the parameter space is compact ([28], p. 46). Furthermore, the conditions of Type (b)–(d), included in Assumption 2, are often imposed in order to apply the law of large numbers or the central limit theorem and can be verified for the functions appearing in the definitions of estimators proposed in the present paper.

3.3. Influence Functions and Robustness

In this section, we derive the influence functions of the estimators t^θc and θ^φc and prove their B-robustness. The corresponding statistical functionals are defined by (36) and (46), respectively.

Recall that, a map T, defined on a set of probability measures and parameter-space-valued, is a statistical functional corresponding to an estimator θ^ of the parameter θ0 from the model P0, if θ^=T(Pn), Pn being the empirical measure corresponding to the sample. The influence function of T at P0 is defined by

IF(x;T,P0):=T(P˜εx)εε=0,

where P˜εx:=(1ε)P0+εδx, δx being the Dirac measure. An unbounded influence function implies an unbounded asymptotic bias of a statistic under single-point contamination of the model. Therefore, a natural robustness requirement on a statistical functional is the boundedness of its influence function. Whenever the influence function is bounded with respect to x, the corresponding estimator is called B-robust [26].

Proposition 3.

For fixed θ, the influence function of the functional tθc is given by

IF(x;tθc,P0)=tψθ(y,tθ(P0))dP0(y)1·ψθ(x,tθ(P0)). (55)

Proposition 4.

The influence function of the functional Tc is given by

IF(x;Tc,P0)=θg¯(y,θ0)dP0(y)g¯(y,θ0)g¯(y,θ0)dP0(y)1θg¯(y,θ0)dP0(y)1·θg¯(y,θ0)dP0(y)1φ(1)IF(x;tθ0c,P0). (56)

On the basis of Propositions 3 and 4, since xψθ(x,tθ(P0)) is bounded, all the estimators θ^φc are B-robust.

4. Conclusions

We introduced a class of robust Z-estimators for moment condition models. These new estimators can be seen as robust alternatives for the minimum empirical divergence estimators. By using truncated functions based on the multidimensional Huber function, we defined robust estimators of the element that realizes the supremum in the dual form of the divergence, as well as new robust estimators for the parameter of the model. The asymptotic properties were proven, including the consistency and the limit laws. The influence functions for all the proposed estimators are bounded; therefore, these estimators are B-robust. The truncated function that we used to define the new robust Z-estimators contains functions implicitly defined, for which analytic forms are not available. The implementation of the estimation method will be addressed in a future research study. The idea of using the multidimensional Huber function, together with a scale matrix and a shift vector, to create a bounded version of the function corresponding to the estimating equation for the parameter of interest, could be considered in other contexts as well and would lead to new robust Z-estimators. As one of the Referees suggested, some other bounded functions could be used to define new robust Z-estimators for moment condition models. For example, the Tukey biweight function used together with a norm inside, in order to be appropriate to be applied to functions with vector values, could also be considered. Again, the original parameter of interest should remain the solution of the estimating equation based on the new bounded function. Such an idea is interesting to be analysed in future studies, in order to provide new robust versions of minimum empirical divergence estimators or robust Z-estimators in other contexts.

Acknowledgments

We are very grateful to the Referees for their helpful comments and suggestions.

Abbreviations

The following abbreviations are used in this manuscript:

i.i.d. independent and identically distributed
a.c. absolutely continuous
GMM generalized method of moments
CU continuous updating
EL empirical likelihood
ET exponential tilting
GEL generalized empirical likelihood
ETEL exponentially tilted empirical likelihood

Appendix A

Proof of Proposition 1.

Since (θ,t)Ψ(y,θ,t) is continuous, by the uniform law of large numbers, Assumption 1 (a) implies

supθVθ0,tVtθ0Ψ(y,θ,t)dPn(y)Ψ(y,θ,t)dP0(y)0, (A1)

in probability. This result together with Assumption 1 (b) ensures the convergence in probability of the estimators θ^φc and t^θ^φcc toward θ0 and tθ0, respectively. The proof is the same as the one for Theorem 5.9 from [28], p. 46. □

Proof of Proposition 2.

By the definitions of θ^φc and t^θ^φcc, they both satisfy

ψ(y,θ^φc,t^θ^φcc)dPn(y)=0(E1)
θm(y,θ^φc,t^θ^φcc)dPn(y)=0(E2)

Using a Taylor expansion in (E1), there exists (θ˜φc,t˜φc) inside the segment that links (θ^φc,t^θ^φcc) and (θ0,tθ0) such that

0=ψ(y,θ0,tθ0)dPn(y)+(tψ(y,θ0,tθ0)dPn(y)),(θψ(y,θ0,tθ0)dPn(y))·an+12anAnan, (A2)

where

an:=((t^θ^φcctθ0),(θ^φcθ0)), (A3)

and

An:=22tψ(y,θ˜φc,t˜φc)dPn(y)2θtψ(y,θ˜φc,t˜φc)dPn(y)2tθψ(y,θ˜φc,t˜φc)dPn(y)22θψ(y,θ˜φc,t˜φc)dPn(y). (A4)

By Assumption 2 (b), the law of large numbers implies that An=OP(1). Then, using Assumption 2 (a), the last term in (A2) can be written oP(1)an. On the other hand, by Assumption 2 (d), using the law of large numbers, we can write

(tψ(y,θ0,tθ0)dPn(y)),(θψ(y,θ0,tθ0)dPn(y))=(tψ(y,θ0,tθ0)dP0(y)),(θψ(y,θ0,tθ0)dP0(y))+oP(1).

Consequently, (A2) becomes

ψ(y,θ0,tθ0)dPn(y)=(tψ(y,θ0,tθ0)dP0(y))+oP(1),(θψ(y,θ0,tθ0)dP0(y))+oP(1)·an. (A5)

In the same way, using a Taylor expansion in (E2), there exists (θ¯φc,t¯φc) inside the segment that links (θ^φc,t^θ^φcc) and (θ0,tθ0) such that

0=θm(y,θ0,tθ0)dPn(y)+(2θtm(y,θ0,tθ0)dPn(y)),(22θm(y,θ0,tθ0)dPn(y))·an+12anBnan, (A6)

where

Bn:=3θ2tm(y,θ¯φc,t¯φc)dPn(y)32θtm(y,θ¯φc,t¯φc)dPn(y)3θtθm(y,θ¯φc,t˜φc)dPn(y)33θm(y,θ¯φc,t¯φc)dPn(y). (A7)

Similarly, as in (A5), we obtain

θm(y,θ0,tθ0)dPn(y)=(2θtm(y,θ0,tθ0)dP0(y))+oP(1),22θm(y,θ0,tθ0)dP0(y)+oP(1)·an. (A8)

Using (A5) and (A8), we obtain

nan=n(tψ(y,θ0,tθ0)dP0(y))(θψ(y,θ0,tθ0)dP0(y))(2θtm(y,θ0,tθ0)dPn(y))22θm(y,θ0,tθ0)dPn(y)1×ψ(y,θ0,tθ0)dPn(y)θm(y,θ0,tθ0)dPn(y)+oP(1). (A9)

Consider S the (l+1+d)×(l+1+d) matrix:

S:=S11S12S21S22, (A10)

with S11:=(tψ(y,θ0,tθ0)dP0(y)), S12:=(θψ(y,θ0,tθ0)dP0(y)), S21:=(2θtm(y,θ0,tθ0)dP0(y)), and S22:=22θm(y,θ0,tθ0)dP0(y). Through calculations, we have

S21=[0d,θg(y,θ0)dP0(y)], (A11)
S22=[0d,,0d]. (A12)

From (A9), we deduce that

nt^θ^φcctθ0θ^φcθ0=S1nψ(y,θ0,tθ0)dPn(y)0d+oP(1). (A13)

On the other hand, under assumption Assumption 2 (d), using the central limit theorem,

nψ(y,θ0,tθ0)dPn(y)0d (A14)

converges in distribution to a centred multivariate normal variable with covariance matrix:

M:=M11M12M21M22, (A15)

with

M11:=cov[ψ(X,θ0,tθ0)],M12:=0d0d,M21:=000l0l,M22:=0d0d.

Since E[ψ(X,θ0,tθ0)]=0 by the construction of ψ, we obtain

M11=cov[ψ(X,θ0,tθ0)]=ψ(y,θ0,tθ0)ψ(y,θ0,tθ0)dP0(y)=Il+1, (A16)

on the basis of (29) for θ=θ0.

Using then (A13) and the Slutsky theorem, we obtain that

nt^θ^φcctθ0θ^φcθ0 (A17)

converges in distribution to a centred multivariate normal variable with the covariance matrix given by

C=S1M[S1]. (A18)

If we denote

C:=C11C12C21C22, (A19)

through calculation, we obtain

C11=[S111S111S12[S21S111S12]1S21S111]×[S111S111S12[S21S111S12]1S21S111], (A20)
C12=[S111S111S12[S21S111S12]1S21S111]×[[S21S111S12]1S21S111], (A21)
C21=[[S21S111S12]1S21S111]×[S111S111S12[S21S111S12]1S21S111], (A22)
C22=[[S21S111S12]1S21S111]×[[S21S111S12]1S21S111]. (A23)

Proof of Proposition 3.

For the contaminated model P˜εx=(1ε)P0+εδx, whenever it exists, tθc(P˜εx) is defined as the solution of equation:

Hc(Aθ(tθc(P˜εx))[tmθ(y,tθc(P˜εx))τθ(tθc(P˜εx))])dP˜εx(y)=0. (A24)

It follows that

(1ε)Hc(Aθ(tθc(P˜εx))[tmθ(y,tθc(P˜εx))τθ(tθc(P˜εx))])dP0(y)+εHc(Aθ(tθc(P˜εx))[tmθ(x,tθc(P˜εx))τθ(tθc(P˜εx))])=0. (A25)

Derivation with respect to ε in (A25) yields

Hc(Aθ(tθ(P0))[tmθ(y,tθ(P0))τθ(tθ(P0))])dP0(y)+t[Hc(Aθ(t)[tmθ(y,t)τθ(t)])]t=tθ(P0)dP0(y)IF(x;tθc,P0)+Hc(Aθ(tθ(P0))[tmθ(x,tθ(P0))τθ(tθ(P0))]). (A26)

Since the first integral in (A26) equals zero, we obtain

IF(x;tθc,P0)=t[Hc(Aθ(t)[tmθ(y,t)τθ(t)])]t=tθ(P0)dP0(y)1·Hc(Aθ(tθ(P0))[tmθ(x,tθ(P0))τθ(tθ(P0))])=tψθ(y,tθ(P0))dP0(y)1·ψθ(x,tθ(P0)). (A27)

For each θ, the influence function (55) is bounded with respect to x; therefore, the estimators t^θc are B-robust.

Proof of Proposition 4.

For the contaminated model P˜εx=(1ε)P0+εδx, Tc(P˜εx) is defined as the solution of equation:

θm(y,Tc(P˜εx),tc(Tc(P˜εx),P˜εx))dP˜εx(y)=0, (A28)

whenever this solution exists. Then,

(1ε)θm(y,Tc(P˜εx),tc(Tc(P˜εx),P˜εx))dP0(y)+εθm(x,Tc(P˜εx),tc(Tc(P˜εx),P˜εx))=0. (A29)

Derivation with respect to ε in (A29) yields

θm(y,θ0,tc(θ0,P0))dP0(y)+22θm[y,θ,t]θ=θ0,t=tθ0(P0)dP0(y)IF(x;Tc,P0)++2θtm[y,θ,t]θ=θ0,t=tθ0(P0)dP0(y)·θtc(θ0,P0)IF(x;Tc,P0)++IF(x;tθ0c,P0)+θ[m(x,θ,t)]θ=θ0,t=tθ0(P0)=0. (A30)

Some calculations show that

θ[m(x,θ,t)]θ=θ0,t=tθ0(P0)=0and22θ[m(x,θ,t)]θ=θ0,t=tθ0(P0)=0, (A31)

for any x; therefore, (A30) reduces to

2θtm[y,θ,t]θ=θ0,t=tθ0(P0)dP0(y)·θtc(θ0,P0)IF(x;Tc,P0)+IF(x;tθ0c,P0)=0. (A32)

On the other hand,

2θtm[y,θ,t]θ=θ0,t=tθ0(P0)dP0(y)=ψ(φ(1))θg¯(y,θ0)dP0(y)=θg¯(y,θ0)dP0(y),

since ψ(u)=φ1(u).

Taking into account that tc(θ,P0)=t(θ,P0) and t(θ,P0) verifies

tm(y,θ,t(θ,P0))dP0(y)=0, (A33)

and the derivation with respect to θ yields

2tθm(y,θ,t(θ,P0))dP0(y)+22tm(y,θ,t(θ,P0))dP0(y)·θt(θ,P0)=0, (A34)

which implies

θtc(θ0,P0)=θt(θ0,P0)=22tm(y,θ0,t(θ0,P0))dP0(y)12tθm(y,θ0,t(θ0,P0))dP0(y)=φ(1)g¯(y,θ0)g¯(y,θ0)dP0(y)1θg¯(y,θ0)dP0(y),

because

22tm(y,θ0,t(θ0,P0))dP0(y)=1φ(1)g¯(y,θ0)g¯(y,θ0)dP0(y). (A35)

Then, (A32) becomes

θg¯(y,θ0)dP0(y)φ(1)g¯(y,θ0)g¯(y,θ0)dP0(y)1θg¯(y,θ0)dP0(y)IF(x;Tc,P0)+IF(x;tθ0c,P0)=0,

and consequently,

IF(x;Tc,P0)=θg¯(y,θ0)dP0(y)g¯(y,θ0)g¯(y,θ0)dP0(y)1θg¯(y,θ0)dP0(y)1·θg¯(y,θ0)dP0(y)1φ(1)IF(x;tθ0c,P0). (A36)

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Funding Statement

This work was supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS CCCDI—UEFISCDI, Project Number PN-III-P4-ID-PCE-2020-1112, within PNCDI III.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Hansen L.P. Large sample properties of generalized method of moments estimators. Econometrica. 1982;50:1029–1054. doi: 10.2307/1912775. [DOI] [Google Scholar]
  • 2.Hansen L., Heaton J., Yaron A. Finite-sample properties of some alternative gmm estimators. J. Bus. Econ. Stat. 1996;14:262–280. [Google Scholar]
  • 3.Qin J., Lawless J. Empirical likelihood and general estimating equations. Ann. Stat. 1994;22:300–325. doi: 10.1214/aos/1176325370. [DOI] [Google Scholar]
  • 4.Imbens G.W. One-step estimators for over-identified generalized method of moments models. Rev. Econ. Stud. 1997;64:359–383. doi: 10.2307/2971718. [DOI] [Google Scholar]
  • 5.Kitamura Y., Stutzer M. An information-theoretic alternative to generalized method of moments estimation. Econometrica. 1997;65:861–874. doi: 10.2307/2171942. [DOI] [Google Scholar]
  • 6.Newey W.K., Smith R.J. Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica. 2004;72:219–255. doi: 10.1111/j.1468-0262.2004.00482.x. [DOI] [Google Scholar]
  • 7.Schennach S.M. Point estimation with exponentially tilted empirical likelihood. Ann. Stat. 2007;35:634–672. doi: 10.1214/009053606000001208. [DOI] [Google Scholar]
  • 8.Pardo L. Statistical Inference Based on Divergence Measures. Chapmann & Hall; London, UK: 2006. [Google Scholar]
  • 9.Basu A., Shioya H., Park C. Statistical Inference: The Minimum Distance Approach. Chapmann & Hall; London, UK: 2011. [Google Scholar]
  • 10.Pardo L., Martín N. Robust procedures for estimating and testing in the framework of divergence measures. Entropy. 2021;23:430. doi: 10.3390/e23040430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Riani M., Atkinson A.C., Corbellini A., Perrotta D. Robust regression with density power divergence: Theory, comparisons, and data analysis. Entropy. 2020;22:399. doi: 10.3390/e22040399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Broniatowski M., Keziou A. Divergences and duality for estimation and test under moment condition models. J. Stat. Plan. Inference. 2012;142:2554–2573. doi: 10.1016/j.jspi.2012.03.013. [DOI] [Google Scholar]
  • 13.Broniatowski M., Keziou A. Parametric estimation and tests through divergences and the duality technique. J. Multivar. Anal. 2009;100:16–36. doi: 10.1016/j.jmva.2008.03.011. [DOI] [Google Scholar]
  • 14.Toma A., Broniatowski M. Dual divergence estimators and tests: Robustness results. J. Multivar. Anal. 2011;102:20–36. doi: 10.1016/j.jmva.2010.07.010. [DOI] [Google Scholar]
  • 15.Toma A., Leoni-Aubin S. Robust tests based on dual divergence estimators and saddlepoint approximations. J. Multivar. Anal. 2010;101:1143–1155. doi: 10.1016/j.jmva.2009.11.001. [DOI] [Google Scholar]
  • 16.Toma A. Model selection criteria using divergences. Entropy. 2014;16:2686–2698. doi: 10.3390/e16052686. [DOI] [Google Scholar]
  • 17.Toma A. Robustness of dual divergence estimators for models satisfying linear constraints. C. R. Math. Acad. Sci. Paris. 2013;351:311–316. doi: 10.1016/j.crma.2013.02.005. [DOI] [Google Scholar]
  • 18.Ronchetti E., Trojani F. Robust inference with GMM estimators. J. Econom. 2001;101:37–69. doi: 10.1016/S0304-4076(00)00073-7. [DOI] [Google Scholar]
  • 19.Lô S.N., Ronchetti E. Robust small sample accurate inference in moment condition models. Comput. Stat. Data Anal. 2012;56:3182–3197. doi: 10.1016/j.csda.2011.01.020. [DOI] [Google Scholar]
  • 20.Felipe A., Martín N., Miranda P., Pardo L. Testing with exponentially tilted empirical likelihood. Methodol. Comput. Appl. Probab. 2018;20:1319–1358. doi: 10.1007/s11009-018-9620-9. [DOI] [Google Scholar]
  • 21.Keziou A., Toma A. A robust version of the empirical likelihood estimator. Mathematics. 2021;9:829. doi: 10.3390/math9080829. [DOI] [Google Scholar]
  • 22.Keziou A., Toma A. Geometric Science of Information, Proceedings of the 5th International Conference, GSI 2021, Paris, France, 21–23 July 2021. Springer International Publishing; Cham, Switzerland: 2021. Robust Empirical Likelihood. [Google Scholar]
  • 23.Rüschendorf L. On the minimum discrimination information theorem. Stat. Decis. 1984:263–283. [Google Scholar]
  • 24.Cressie N., Read T.R.C. Multinomial goodness-of-fit tests. J. R. Stat. Soc. Ser. B. 1984;46:440–464. doi: 10.1111/j.2517-6161.1984.tb01318.x. [DOI] [Google Scholar]
  • 25.Broniatowski M., Keziou A. Minimization of ϕ divergences on sets of signed measures. Stud. Sci. Math. Hung. 2006;43:403–442. [Google Scholar]
  • 26.Hampel F.R., Ronchetti E., Rousseeuw P.J., Stahel W. Robust Statistics: The Approach Based on Influence Functions. Wiley; New York, NY, USA: 1986. [Google Scholar]
  • 27.Ronchetti E.M., Huber P.J. Robust Statistics. John Wiley & Sons; Hoboken, NJ, USA: 2009. [Google Scholar]
  • 28.van der Vaart A.W. Asymptotic Statistics. Cambridge University Press; Cambridge, UK: 1998. (Cambridge Series in Statistical and Probabilistic Mathematics). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES