Skip to main content
Entropy logoLink to Entropy
. 2022 Jan 13;24(1):123. doi: 10.3390/e24010123

Robust Statistical Inference in Generalized Linear Models Based on Minimum Renyi’s Pseudodistance Estimators

María Jaenada 1,, Leandro Pardo 1,*,
Editor: Philip Broadbridge1
PMCID: PMC8774563  PMID: 35052149

Abstract

Minimum Renyi’s pseudodistance estimators (MRPEs) enjoy good robustness properties without a significant loss of efficiency in general statistical models, and, in particular, for linear regression models (LRMs). In this line, Castilla et al. considered robust Wald-type test statistics in LRMs based on these MRPEs. In this paper, we extend the theory of MRPEs to Generalized Linear Models (GLMs) using independent and nonidentically distributed observations (INIDO). We derive asymptotic properties of the proposed estimators and analyze their influence function to asses their robustness properties. Additionally, we define robust Wald-type test statistics for testing linear hypothesis and theoretically study their asymptotic distribution, as well as their influence function. The performance of the proposed MRPEs and Wald-type test statistics are empirically examined for the Poisson Regression models through a simulation study, focusing on their robustness properties. We finally test the proposed methods in a real dataset related to the treatment of epilepsy, illustrating the superior performance of the robust MRPEs as well as Wald-type tests.

Keywords: generalized linear model, independent and nonidentically distributed observations, minimum Rényi’s pseudodistance estimators, robust Wald-type test statistics for GLMs, influence function for GLMs, poisson regression model

MSC: 62F35, 62J12

1. Introduction

Generalized linear models (GLMs) were first introduced by Nelder and Wedderburn [1] and later expanded upon by McCullagh and Nelder [2]. The GLMs represent a natural extension of the standard linear regression models, which enclose a large variety of response variable distributions, including distributions of count, binary, or positive values. Let Y1,,Yn be independent response variables. The classical GLM assumes that the density function of each random variable Yi belongs to the exponential family, having the form

fy,θi,ϕ=expyθib(θi)a(ϕ)+cy,ϕ, (1)

for i=1,,n, where the functions a(ϕ),b(θi) and cy,ϕ are known. Therefore, the observations are independent but not identically distributed, depending on a location parameter θi,i=1,,n, and a nuisance parameter ϕ. Further, we denote by μi the expectation of the random variable Yi and we assume that there exists a monotone differentiable function, so called link function g, verifying

g(μi)=xiTβ,

with β=(β1,,βk)Rk(k<n) the regression parameter vector. The k×1-vector of explanatory variables, xi, is assumed to be nonrandom, i.e., the design matrix is fixed. Correspondingly, the location parameter depends on the explanatory variables θ=θxTβ the density function given in (1) can be written as fiy,β,ϕ, empathizing its dependence of β and xi.

The maximum likelihood estimator (MLE) and the quasilikelihood estimators were well studied for the GLMs, and it is well known that they are asymptotically efficient but lack robustness in the presence of outliers, which can result in a significant estimation bias. Jaenada and Pardo [3] revised the different robust estimators in the statistical literature and studied the lack of robustness of the MLE as well. Among others, Stefanski et al. [4] studied optimally bounded score functions for the GLM and generalized the results obtained by Krasker and Welsch [5] for classical LRMs. Künsch et al. [6] introduced the so-called conditionally unbiased bounded-influence estimate, and Morgenthaler [7], Cantoni and Ronchetti [8], Bianco and Yohai [9], Croux and Hesbroeck [10], Bianco et al. [11], and Valdora and Yohai [12] continued the development of robust estimators for the GLMs based on general M-estimators. Later, Ghosh and Basu [13] proposed robust estimators for the GLM, based on the density power divergence (DPD) introduced in Basu et al. [14].

There are not many papers considering robust tests for GLMs. In this sense, Basu et al. [15] considered robust Wald-type tests based on the minimum DPD estimator, but assuming random explanatory variables for the GLM. The main purpose of this paper is to introduce new robust Wald-type tests based on the MRPE under fixed (not random) explanatory variables.

Broniatowski et al. [16] presented robust estimators for the parameters of the linear regression model (LRM) with random explanatory variables and Castilla et al. [17] considered Wald-type test statistics, based on MRPE, for the LRM. Toma and Leoni–Aubin [18] defined new robustness and efficient measures based on the RP and Toma et al. [19] considered the MRPE for general parametric models, and constructed a model selection criterion for regression models. The term “Rényi pseudodistance” (RP) was adopted in Broniatowski et al. [16] because of its similarity with the Rényi divergence (Rényi [20]), although this family of divergences was considered previously in Jones et al. [21]. Fujisawa and S. Eguchi [22] used the RP under the name of γ-cross entropy, introduced robust estimators obtained by minimizing the empirical estimate of the γ-cross entropy (or the γ-divergence associated to the γ-cross entropy) and studied their properties. Further, Hirose and Masuda [23] considered the γ likelihood function to find robust estimation. Using the γ-divergence, Kawashima and Fujisawa [24,25] presented robust estimators for sparse regression and sparse GLMs with random covariates. The robustness of all the previous estimators is based on density power weight, fy,θl, which gives a small weight to outliers observations. This idea was also developed by Basu et al. [15] for the minimum DPD estimator and was considered some years ago by Windham [26]. More concretely, Basu et al. [14] considered the density power function multiplied by the score function.

The outline of the paper is as follows: in Section 2, some results in relation to the MRPEs for GLMs, previously obtained in Jaenada and Pardo [3], are presented. Section 3 introduces and studies Wald-type tests based on the MRPE for testing linear null hypothesis for the GLMs. In Section 4, the influence function of the MRPE as well as the influence functions of the Wald-type tests are derived. Finally, we empirically examine the performance of the proposed robust estimators and Wald-type test statistics for the Poisson regression model through a simulation study in Section 5, and we illustrate its applicability with real data sets for binomial and Poisson regression.

2. Asymptotic Distribution of the MRPEs for the GLMs

In this Section, we revise some of the results presented in Jaenada and Pardo [3] in relation to the MRPE. Let Y1,,Yn, be INIDO random variables with density functions with respect to some common dominating measure, g1,,gn respectively. The true densities gi are modeled by the density functions given in (1), belonging to the exponential family. Such densities are denoted by fiy,β,ϕ highlighting its dependence on the regression vector β, the nuisance parameter ϕ and the observation i, i=1,,n. In the following, we assume that the explanatory variables xi, are fixed, and therefore the response variables verify the INIDO set up studied in Castilla et al. [27].

For each of the response variables Yi, the RP between the theoretical density function belonging to the exponential family, fi(y,γ), and the true density underlying the data, gi, can be defined, for α>0 as

Rαfi(y,γ),gi=1α+1logfi(y,γ)α+1dy1αlogfi(y,γ)αgi(y)dy+k, (2)

where

k=1αα+1loggi(y)α+1dy

does not depend on γ=(βT,ϕ)T.

We consider (y1,,yn) a random sample of independent but nonhomogeneous observations of the response variables with fixed predictors (x1,,xn). Since only one observation of each variable Yi is available, a natural estimate of its true density gi is the degenerate distribution at the the observation yi. Consequently, in the following we denote g^i the density function of the degenerate variable at the point yi. Then, substituting in (2) the theoretical and empirical densities, yields to the loss

Rαfi(y,γ),g^i=1α+1logfi(y,γ)α+1dy1αlogfi(Yi,γ)α+k. (3)

If we consider the limit when α tends to zero we get

R0fi(y,γ),g^i=limα0Rαfi(y,γ),g^i=logfi(Yi,γ)+k. (4)

Last expression coincides with the Kullback–Leibler divergence, except for the constant k. More details about Kullback–Leiber divergence can be seen in Pardo [28].

For the seek of simplicity, let us denote

Lαiγ=fi(y,γ)α+1dyαα+1,

and

Vi(Yi,γ)=fi(Yi,γ)αLαiγ.

The expression (3) can be rewritten as

Rαfi(y,γ),g^i=1αlogfi(Yi,γ)αfi(y,γ)α+1dyαα+1+k=1αlogVi(Yi,γ)+k.

Based on the previous idea, we shall define an objective function averaging the RP between all the the RPs. Since minimizing Rαfi(y,γ),g^i in γ is equivalent to maximizing logVi(Yi,γ), we define a loss function averaging those quantities as

Tnα(γ)=1ni=1nfi(Yi,γ)αfi(y,γ)α+1dyαα+1=1ni=1nfi(Yi,γ)αLαiγ=1ni=1nVi(Yi,γ). (5)

Based on (5), we can define the MRPE of the unknown parameter γ, γ^α, by

γ^α=argmaxγΓTnα(γ), (6)

with Tnα(γ) defined in (5)

Tn0(γ)=1ni=1nlogfi(yi,γ)

at α=0. The MRPE coincides with the MLE at α=0, and therefore the proposed family can be considered a natural extension of the classical MLE.

Now, since the MRPE is defined as a maximum, it must annul the first derivatives of the loss function given in (5). The estimating equations of the parameters β and ϕ are given by

1ni=1nVi(Yi,γ)β=0k1ni=1nVi(Yi,γ)ϕ=0. (7)

For the first equation, we have

Vi(Yi,γ)β=1Lαiγ2αfi(Yi,γ)αlogfi(Yi,γ)βLαiγαfi(y,γ)α+1dyαα+11fi(y,γ)α+1logfi(y,γ)βdyfi(Yi,γ)α.

The previous partial derivatives can be simplified as

logfi(Yi,γ)β=YiμiVar(Yi)g(μi)xi=K1iYi,γxi

and

logfi(Yi,γ)ϕ=Yiθib(θi)a(ϕ)2a(ϕ)+cYi,ϕϕ=K2iYi,γ.

See Ghosh and Basu [13] for more details. Now using the simplified expressions, we can write the estimating equation for β as

i=1nxiLαiγMiYi,γNiYi,γ=0k (8)

being

MiYi,γ=fi(Yi,γ)αK1iYi,γ

and

NiYi,γ=fi(Yi,γ)αfi(y,γ)α+1dyfi(y,γ)α+1K1iy,γdy.

Subsequently, the estimating equation for ϕ, is given by

Vi(Yi,γ)ϕ=1Lαiγ2αfi(Yi,γ)αlogfi(Yi,γ)ϕLαiγαfi(y,γ)α+1dyαα+11fi(y,γ)α+1logfi(y,γ)ϕdyfi(Yi,γ)α=1Lαiγ2αfi(Yi,γ)αlogfi(Yi,γ)ϕLαiγαLαiγfi(y,γ)α+1dyfi(y,γ)α+1logfi(y,γ)ϕdyfi(Yi,γ)α.

and thus, the estimating equation for ϕ is given by

i=1n1LαiγMi*Yi,γNi*Yi,γ=0 (9)

being

Mi*Yi,γ=fi(Yi,γ)αK2iYi,γ,

and

Ni*Yi,γ=fi(Yi,γ)αfi(y,γ)α+1dyfi(y,γ)α+1K2iy,γdy.

Under some regularity conditions, Castilla et al. [27] established the consistency and asymptotic normality of the MRPEs under the INIDO setup. Before stating the consistence and asymptotic distribution of the MRPEs for the GLM, let us introduce some useful notation. We define

Sαi=fi(y,β,ϕ)α+1dymjliγ=1fi(y,γ)α+1dyfi(y,γ)α+1Kjiy,γKliy,γdy,mjiγ=1fi(y,γ)α+1dyfi(y,β,ϕ)α+1Kjiy,γdy,ljli(γ)=fi(y,γ)2α+1Lαiγ2Kjiy,γmjiγKliy,γmliγdy, (10)

for all j,l=1,2 and i=1,,n.

Theorem 1.

Let Y1,,Yn be a random sample from the GLM defined in (1). The MRPE γ^α=(β^αT,ϕ^α)T is consistent and its asymptotic distribution is given by

nΩnγ12Ψnγ(β^α,ϕ^α)(β,ϕ)LnN(0k+1,Ik+1),

where X denotes the design matrix, Ik is the k-dimensional identity matrix and the matrices Ψn and Ωn are defined by

Ωnγ=1nXTD11XXTD1211TD12X1TD221,
Ψnγ=1nXTD11*D1*TD1*XXTD12*D1*TD2*11TD12*D1*TD2*X1TD22*D2*TD2*1,

with

Djk=diagljki(γ)i=1,,n,j,k=1,2Djk*=diagmjkiγi=1,,n

and

Dj*=diagmjiγi=1,,n,,j.k=1,2.

Proof. 

The consistency is proved for general statistical models in Castilla et al. [27] and the asymptotic distribution of the MRPEs for GLM is derived in Jaenada and Pardo [3]. □

3. Wald Type Tests for the GLMs

In this section, we define Wald-type tests for linear null hypothesis of the form

H0:MTγ=mvsH1:MTγm (11)

being γ=(βT,ϕ)T,M a (k+1)×r full rank matrix and

m=m1,,mrT (12)

a r-dimensional vector (rk+1). If the nuisance parameter ϕ is known, as with logistic and Poisson regression, the matrix M=Lk×r. Additionally, choosing

M=Lk×r,O1×r

gives rise to a null hypothesis defined by a linear combination of the regression coefficients, β, with ϕ known or unknown. Further, the simple null hypothesis is a particular case when choosing M as the identity matrix of rank k,

H0:β=β0vsH1:ββ0

with m=β0=β10,,βk0T.

In the following we assume that there exist a matrix Aαγ verifying

limnΨnγΩnγ1Ψnγ=Aαγ.

Definition 1.

Let γ^α=(β^αT,ϕ^α)T be the MRPE of γ=(βT,ϕ)T for the GLM. The Wald-type tests, based on the MRPE, for testing (11) are defined by

Wnγ^α=nMTγ^αmTMTΨnγ^α1Ωnγ^αΨnγ^α1M1MTγ^αm. (13)

The following theorem presents the asymptotic distribution of the Wald-type test statistics, Wnγ^α.

Theorem 2.

The Wald-type test Wnγ^α follows asymptotically, under the null hypothesis presented in (11), a chi-square distribution with degrees of freedom equal to the dimension of the vector m in (12)

Under the null hypothesis given in (11) the asymptotic distribution of the Wald-type test statistics is a chi-square distribution with r degrees of freedom.

Proof. 

We know that

n(β^αT,ϕ^α)T(βT,ϕ)TLnN(0k+1,Aαγ1).

Therefore,

nMTγ^αm=nMγ^αγLnN(0k+1,MTAαγ1M).

Now, the result follows taking into account that γ^α is a consistent estimator of γ0.

Based on the previous convergence, the null hypothesis in (11) is rejected, if

Wnγ^α>χr,α2 (14)

being χr,α2 the 100(1α) percentile of a chi-square distribution with r degrees of freedom.

Finally, let γ1 be a parameter point verifying MTγ1m, i.e., γ1 is not on the null hypothesis. The next result establishes that the Wald-type tests given in (14) are consistent (see Fraser [29]).

Theorem 3.

Let γ1 be a parameter point verifying MTγ1m. Then the Wald-type tests given in (14) are consistent, i.e.,

limnPγ1Wnγ^α>χr,α2=1.

Proof. 

See Appendix A. □

Remark 1.

In the proof of the previous Theorem was established the approximate power function of the Wald-type tests defined in (13),

πWn(γ^α)(γ1)1ϕN(0,1)1σ(γ1)χr,α2nWn(γ1)n

where

σ2γ1=lγ^αζζTγ=γ1Aαγ11lγ^αζζγ=γ1

and

lγ^αζ=MTγ^αmTMTAαζ1M1MTγ^αm.

From the above expression, the necessary sample size n for the Wald-type tests to have a predetermined power, π0, is given by n=[n*]+1, with

n*=A+B+A(A+2B)2lγ12(γ1)

being

A=σ2γ1ϕ1(1π0)2,B=2χr,α2lγ1(γ1)

and [·] the integer part.

In accordance with Maronna et al. [30], the breakdown point of the estimators γ^α of a parameter γ is the largest amount of contamination that the data may contain such that γ^α still gives enough information about γ. The derivation of a general breakdown points it is in general not easy, so it may deserve a separate paper where it may be jointly considered the replacement finite-sample breakdown point introduced by Donoho and Huber [31]. Although breakdown point is an important theoretical concept in robust statistics, perhaps is more useful the definition of breakdown point associated to a finite sample: replacement finite-sample break down point. More details can be seen in Section 3.2.5 of Maronna et al. [30].

4. Influence Function

We derive in this section the IF of the MRPEs of the parameters γ=(βT,ϕ)T and Wald-type statistics based on these MRPEs, Wnγ^α. The influence function (IF) of an estimator quantifies the impact of an infinitesimal perturbation in the true distribution of the data on the asymptotic value of the resulting parameter estimate (in terms of the corresponding statistical functional). An estimator is said to be robust if its IF is bounded. If we denote G=G1,,Gn the true distributions underlying the data, the functional TαG and associated to the MRPE for the parameters γ is such that

1ni=1nRαfi(y,TαG),gi(y)=minγ1ni=1nRαfi(y,γ),gi(y).

The IF of a estimator is defined as the limiting standardized bias due to infinitesimal contamination. That is, given a contaminated distribution at the point (yt,xt), Gε=(1ε)G+εΔ(yt,xt) with Δ(yt,xt) the degenerate distribution at (yt,xt), the IF of the estimator γ^α in terms of its associated functional TαG is computed as

IF(yt,xt),TαG=limε0TαGεTαGε.

In the following, let us denote TαG=TαβG,TαϕG, where TαβG and TαϕG are the functionals associated the parameters β and ϕ, respectively. Then, they must satisfy the estimating equations of the MRPE given by

i=1nxiLαi(TαβG,TαϕG)Miyi,(TαβG,TαϕG)Niyi,(TαβG,TαϕG)=0ki=1n1Lαi(TαβG,TαϕG)Mi*yi,(TαβG,TαϕG)Ni*yi,(TαβG,TαϕG)=0 (15)

where the quantities Lαiγ,Miyi,γ,Niyi,γ,Mi*yi,γ and Ni*yi,gamma are defined in Section 2. Now, evaluating the previous equation at the contaminated distribution Gε, implicitly differentiating the estimating equations in ε and evaluating them at ε=0, we can obtain the expression of the IF for the GLM.

We first derive the expression IF of MRPEs at the i0th direction. For this purpose, we consider the contaminated distributions

Gi0,ε=G1,,Gi01,Gi0,ε,Gi0+1,,Gn,

with Gi0,ε=(1ε)Gi0+εΔ(yi0,xi0). Here, only the i0-th component of the vector of distributions is contaminated. If the true density function gi of each variable belongs to the exponential model, we have that

gi(y)=fi(y,γ)ii01εfi(y,γ)+εΔ(yi0,xi0)(y)i=i0.

Accordingly, we define

γεi0=TαG1,,Gi01,Gi0,ε,Gi0+1,,Gn

the MRPE when the true distribution underlying the data is Gi0,ε. Based on Remark 5.2 in Castilla et al. [27] the IF of the MRPE at the i0th direction with (yi0,xi0) the point of contamination is given by

IF((yi0,xi0),Tα,G)=TαGi0,γεεε=0=Ψnγ1fi0(yi0,γ)αfi0(y,γ)α+1dyK1iyi0,γfi0(yi0,γ)αNi0yi0,γK2iyi0,γfi0(yi0,γ)αNi0*yi0,γxi0001.

In a similar manner, the IF in all directions (i.e., all components of the vector of distributions are contaminated) has the following expression

IF((y1,x1),,(yn,xn),Tα,G)=TαGγεεε=0=Ψnγ1i=1nfi(yi,γ)αfi(y,γ)α+1dyK1iyi,γfi(yi,γ)αNiyi,γK2iyi,γfi(yi,γ)αNi*yi,γxi001,

with (y1,x1),,(yn,xn) the point of contamination. We next derive the expression of the IF for the Wald-type tests presented in Section 3. The statistical functional associated with the Wald-type tests for the linear null hypothesis (11) at the distributions G=G1,,Gn, ignoring the constant n, is given by

WαG=MTTαGmTMTAαTαG1M1MTTαGm. (16)

Again, evaluating the Wald-type test functionals at the contaminated distribution Gε and implicitly differentiating the expression, we can get the expression of it IF. In particular, the IF of the Wald-type test statistics at the i0-th direction and the contamination point (yi0,x0) is given by

IF1(yi0,x0),Wα,G=WαGi0,εεε=0=2MTTαGmTMTAαTαG1M1MTIF((yi0,x0),Tα,G).

Evaluating the previous expression at the null hypothesis, MTTαG=m, the IF becomes identically zero,

IF1(yi0,xi0)Wα,G=0k+1.

Therefore, it is necessary to consider the second order IF of the proposed Wald-type tests. Twice differentiating in WαGε, we get

IF2(yi0,xi0),Wα,G=2WαGi0,εε2ε=0=2IF((yi0,xi0),Tα,Fβ)TMMTAαTαG1M1MTIF((yi0,xi0),Tα,G).

Finally, the second order IF of the Wald-type tests in all directions is given by

IF2(y1,x1),,(yn,xn),Wα,G=2WαGεε2ε=0=2IF((y1,x1),,(yn,xn),Tα,G)TMMTAαTαG1M1MT·IF((y1,x1),,(yn,xn),Tα,G).

To asses the robustness of the MRPEs and Wald-type test statistics we must discuss the boundedness of the corresponding IF. The boundedness of the second order IF of the Wald-type test statistics is determined by the boundedness of the IF of the MRPEs. Further, the matrix Ψn(γ) is assumed to be bounded, so the robustness of the estimators only depend on the second factor of the IF. Most standard GLMs enjoy such properties for positives values of α, but the influence function is unbounded at α=0, corresponding with the MLE. As an illustrative example, Figure 1 plots the IF of the MRPEs for the Poisson regression model with different values of α=0,0.5 at one direction. The model is fitted with only one covariate, the parameter ϕ is known for Poisson regression (ϕ=1) and the true regression vector is fixed β=1. As shown, the IF of the MRPEs with positives values of α are bounded, whereas the IF of the MLE is not, indicating it lack of robustness.

Figure 1.

Figure 1

IF of MRPEs with α=0 (left) and α=0.5 (right) of Poisson regression model.

5. Numerical Analysis: Poisson Regression Model

We illustrate the proposed robust method for the Poisson regression model. As pointed out in Section 1 the Poisson regression model belongs to the GLM with known shape parameter ϕ=1, location parameter θi=xiTβ and known functions b(θi)=exp(xiTβ) and c(yi)=log(yi!). Since the nuisance parameter is known, for the seek of simplicity in the following we only use β=γ. In Poisson regression, the mean of the response variable is linked to the linear predictor through the natural logarithm, i.e., μi=exp(xiTβ). Thus, we can apply the previous proposed method to estimate the vector of regression parameters β with objective function given in Equation (5).

The results provided are computed in the software R. The minimization of the objective function is performed using the implemented optim() function, which applies the Nelder–Mead iterative algorithm (Nelder and Mead [32]). Nelder–Mead optimization algorithm is robust although relatively slow. The corresponding objective function Tnα(γ) given in (5) is highly nonlinear and requires the evaluation of nontrivial quantities. Further, the computation of the Wald-type test statistics defined in (13) requires to evaluate the covariance matrix of the MRPEs, involving nontrivial integrals. Some simplified expressions of the main quantities defined throughout the paper for the Poisson regression model, such as Lαi(β),K1i(y,β),Ni(y,β),m1i(β),m11i(β) or l11i(β), are given in the Appendix B. There is no closed expression for these quantities, and they need to be approximated numerically. Since the minimization is iteratively performed, computing such expressions at each step of the algorithm and for each observation may entail an increased computational burden. Nonetheless, the complexity is not significant for low-dimensional data. On the other hand, the optimum in (5) need not to be uniquely defined, since the objective function may have several local minima. Then, the choice of the initial value of the iterative algorithm is crucial. Ideally, a good initial point should be consistent and robust. In our results the MLE is used as initial estimate for the algorithm.

We analyze the performance of the proposed methods in Poisson regression through a simulation study. We asses the behavior of the MRPE under the sparse Poisson regression model with k=12 covariates but only 3 significant variables. We set the 12-dimensional regression parameter β=(1.8,1,0,0,1.5,0,0) and we generate the explanatory variables, xi, from the standard uniform distribution with variance-covariance matrix having Toeplitz structure, with the (j,l)-th element being 0.5|jl|,j,l=1,,p. The response variables are generated from the Poisson regression model with mean μi=xiTβ,YiP(μi). To evaluate the robustness of the proposed estimators, we contaminate the responses using a perturbed distribution of the form (1b)P(μi)+bP(2μi), where b is a realization of a Bernoulli variable with parameter ε so called the contamination level. That is, the distribution of the contaminated responses lies in a small neighbourhood of the assumed model. We repeat the process R=1000 for each value of α.

Figure 2 presents the mean squared error of the estimate (MSE), MSE=||β^αβ||2, (left) and the MSE on the prediction (right) against contamination level on data for different values of α=0,0.1,0.3,0.5 and 0.7. The sample size is fixed at n=200 and the MSE on the prediction is calculated using n=200 new observations following the true model. As shown, greater values of α correspond to more robust estimators, revealing the role of the tuning parameter on the robustness gain. Most strikingly, the MSE grows linearly for the MLE, while the proposed estimators manage to maintain a low error in all contaminated scenarios.

Figure 2.

Figure 2

Mean Squared Error (MSE) on estimation (left) and prediction (right) against contamination level on data.

Furthermore, it is to be expected that the error of the estimate decreases with larger samples sizes. In this regard, Figure 3 shows the MSE for different values of α=0,0.1,0.3,0.5 and 0.7, against the sample size in the absence of contamination (left) and under 5% of contamination. Our proposed estimators are more robust than the classical MLE with almost all contaminated scenarios, since the MSE committed is lower for all positives values of α than for α=0 (corresponding to the MLE), except for too small sample sizes. Conversely, the MLE is, as expected, the most efficient estimator in absence of contamination, closely to our proposed estimators with α=0.1,0.3, highlighting the importance of α in controlling the trade-off between efficiency and robustness. In this regard, values of α about 0.3 perform the best taking into account the low loss of efficiency and the gain in robustness. Finally, note that small sample sizes adversely affect to greater values of α.

Figure 3.

Figure 3

MSE in estimation of β in absence of contamination (left) and under 5% of contamination level in data (right) with different values of α against sample size for Poisson regression model.

On the other hand, one could be interested on testing the significance of the selected variables. For this purpose, we simplify the true model and we examine the performance of the proposed Wald-type test statistics under different true coefficients values. In particular, let us consider a Poisson regression model with only two covariates, generated from the uniform distribution as before, and the linear null hypothesis

H0:β2=0. (17)

That is, we are interested in assessing the significance of the second variable. The sample size if fixed at n=200 and the true value of the component of the regression vector is set β1=1. We study the power of the tests under increasing signal of the second parameter β2 and increasing contamination level. Here, the model is contaminated by perturbing the true distribution with (1b)Pμi+bPμ˜i, where μi=xiTβ is the mean of the Poisson variable in the absence of contamination, μ˜i=xiTβ˜ is the contaminated mean, with β˜=(1,0), and b is a realization of a Bernoulli variable with probability of success ε. Table 1 presents the rejection rate of the Wald-type test statistics for different true values of β2 under different contaminated scenarios. As expected, stronger signals produce higher power for all Wald-type test. Moreover, the power of the Wald-type test statistics based on the MLE decreases when increasing the contamination, whereas the power of the statistics based on the MRPEs with positives values of α keeps sufficiently high. Then, our proposed robust estimators are able to detect the significance of the variable even in heavily contaminated scenarios.

Table 1.

Rejection rate of Wald-type test statistics based on MRPEs with different true values of β2 and contamination levels.

β2 α Contamination Level
0 5% 10% 15% 20% 25%
0.3 0 0.332 0.264 0.227 0.187 0.157 0.141
0.1 0.435 0.376 0.328 0.285 0.251 0.223
0.3 0.557 0.511 0.483 0.416 0.390 0.360
0.5 0.617 0.563 0.533 0.493 0.467 0.427
0.7 0.638 0.590 0.568 0.536 0.513 0.476
0.5 0 0.756 0.730 0.683 0.621 0.551 0.493
0.1 0.833 0.798 0.775 0.736 0.681 0.622
0.3 0.885 0.870 0.864 0.829 0.792 0.752
0.5 0.895 0.891 0.886 0.867 0.842 0.814
0.7 0.901 0.897 0.893 0.879 0.854 0.832
0.7 0 0.971 0.979 0.968 0.948 0.915 0.862
0.1 0.980 0.988 0.983 0.973 0.962 0.932
0.3 0.988 0.995 0.992 0.987 0.985 0.969
0.5 0.989 0.995 0.995 0.992 0.992 0.977
0.7 0.989 0.995 0.993 0.995 0.990 0.983

6. Real Data Applications

6.1. Example I: Poisson Regression Regression

We finally apply our proposed estimators in a real dataset arising from Crohn’s disease. The data were first studied in Lô and Ronchetti [33] to asses the adverse events of a drug. The clinical study included 117 patients affected by the disease, for whom information was recorded for 7 explanatory variables: BMI (body mass index), HEIGHT, COUNTRY (one of the two countries where the patient lives), SEX, AGE, WEIGHT, and TREAT (the drug taken by the patient in factor form: placebo, Dose 1, Dose 2), in addition to the response variable AE (number of adverse events). Lô and Ronchetti [33] considered a Poisson regression model for the Crohn data and determined that only variables Dose 1, BMI, HEIGHT, SEX, AGE, and COUNTRY may be essentially significant. Further, they flagged observations 23rd, 49th, and 51st to be highly influential on the classical analysis. Table 2 presents the estimated coefficient of the explanatory variable when fitting the Poisson regression model. Robust methods suggest higher coefficients for the variables BMI and AGE, whereas fewer values for the coefficients of the categorical variables COUNTRY, SEX, Dose 1.

Table 2.

Estimated coefficients for Crohn’s disease data for different values of α with original data and clean data (after removing influential observations).

Intercept BMI Height Age Country Sex Dose 1
Original Data
MLE (α= 0) 6.261 0.026 −0.037 0.012 −0.394 −0.646 −0.533
α= 0.1 5.197 0.037 −0.033 0.014 −0.489 −0.800 −0.469
α= 0.3 4.798 0.058 −0.036 0.021 −0.545 −1.284 −0.832
α= 0.5 4.391 0.067 −0.037 0.028 −0.557 −1.535 −1.036
α= 0.7 5.699 0.067 −0.047 0.036 −0.737 −1.759 −1.157

Following the discussion in Lô and Ronchetti [33], classical tests may not select variable AGE to be significant. Then, we propose testing the significance of that variable using Wald-type test statics based on different values α. Table 3 shows the p-values of the corresponding tests with null hypothesis H0: AGE = 0, with the original data and after removing the outlying observations.

Table 3.

p-values of test with null hypothesis H0: AGE = 0 with original and clean data (after removing influential observations).

Original Data Clean Data
MLE (α= 0) 0.059 0.011
α= 0.1 0.018 0.004
α= 0.3 0.001 0.000
α= 0.5 0.000 0.000
α= 0.7 0.000 0.000

The MLE rejects the significance of the variable AGE when the original data are used, whereas the Wald-type test statistics with positives values of α indicate strong evidence against the null hypothesis. In contrast, if the influential observations are removed, all Wald-type test statistics agree in the significance of the variable. This example illustrates the robustness of the proposed statistics.

6.2. Example II: Binomial Regression

We finally illustrate the applicability of the MRPE for robust inference in the binomial regression model. We examine the damaged carrots dataset, first studied in Phelps [34] and later discussed by Cantoni and Ronchetti [8] and Ghosh and Basu [13] to illustrate robust procedures for binomial regression. The data contain 24 samples, among which the 14th observation was flagged as an outlier in the y-space but not a leverage point. The data are issued from a soil experiment and give the proportion of carrots showing insect damage in a trial with three blocks and eight dose levels of insecticide. The explanatory variables are the logarithm transform of the dose (Logdose) and two dummy variables for Blocks 1 and 2.

Binomial regression is a natural extension of the logistic regression when the response variable Y does not follow a Bernoulli distribution but a Binomial distribution counting the number of successes in a series of m independent Bernoulli trials. Binomial regression model belongs to the GLM with known shape parameter ϕ=1, location parameter θi=xiTβ and functions b(θi)=mlog1+exp(xiTβ) and c(yi)=log(myi). The mean of the response variable is then linked to the linear predictor through the logit function, i.e.,

logμimμi=xiTβ.

Table 4 presents the estimated coefficients of the regression vector for the carrots data using the MLE and robust MRPEs when the model is fitted with the original data and the model fitted without the outlying observation. The results provided are computed in the same manner as in Section 5, adapting the corresponding quantities in Equation (5) for the binomial model. All integrals involved were numerically approximated, and the MLE is used as initial estimate for the optimization algorithm. The influence of observation 14 stands out when using the MLE; the estimated coefficients are remarkably different when fitting the model with and without observation 14. In contrast, all methods estimate similar coefficients after removing the outlying observation, coinciding with the robust estimates for moderately high values of the tuning parameter α.

Table 4.

Estimated coefficients for damaged carrots data for different values of α with original data and clean data (after outliers removal).

Intercept Logdose B1 B2
Original Data
MLE (α=0) 1.480 −1.817 0.542 0.843
α= 0.1 1.729 −1.949 0.527 0.755
α= 0.3 2.017 −2.100 0.479 0.652
α= 0.5 2.090 −2.134 0.386 0.625
α= 0.7 2.150 −2.161 0.258 0.615
Clean Data
MLE (α=0) 2.141 −2.179 0.546 0.636
α= 0.1 2.126 −2.167 0.529 0.633
α= 0.3 2.105 −2.149 0.479 0.627
α= 0.5 2.108 −2.144 0.385 0.621
α= 0.7 2.154 −2.163 0.257 0.614

7. Conclussions

In this paper, we presented the MRPE and Wald-type test statistics for GLMs. The proposed MRPEs and statistics have appealing robustness properties where the data are contaminated due to outliers or leverage points. MRPEs are consistent and asymptotically normal and represent an attractive alternative to the classical nonrobust methods. Additionally, robust Wald-type test statistics, based on the MRPEs, were developed. Through the study of the IFs and the development of an extensive simulation study, we proved their robustness from a theoretical and practical point of view, respectively. In particular, we illustrated the superior performance of the MRPEs and the corresponding Wald-type tests for the Poisson regression model.

Acknowledgments

We are very grateful to the referees and associate editor for their helpful comments and suggestions. This research is supported by the Spanish Grants PGC2018-095194-B-100 (L. Pardo and M. Jaenada) and FPU/018240 (M. Jaenada). M.Jaenada and L. Pardo are members of the Instituto de Matematica Interdisciplinar, Complutense University of Madrid.

Abbreviations

The following abbreviations are used in this manuscript:

DPD Density Power Divergence
IF Influence Function
GLM Genelarized Linear Model
LRM Linear Regression Model
MLE Maximum Likelihood Estimator
MRPE Minimum Rényi Pseudodistance Estimator
RP Rényi Pseudodistance

Appendix A. Proof of Theorem 3

Let us define

lηζ=MTηmTMTAαζ1M1MTηm

so the Wald-type test statistic is such that

nlγ^αγ^α=Wnγ^α.

We know that γ^αPnγ1 and therefore lγ^αγ1 and lγ1γ1 have the same asymptotic distribution. A first order Taylor expansion of gζ=lγ^αζ at γ^α around γ1 gives,

lγ^αγ^α=lγ^αγ1+lγ^αζζTγ=γ1γ^αγ1+opγ^αγ1.

Based on the asymptotic distribution of γ^α we have

nopγ^αγ1=op(1)

therefore

nlγ^αγ^αlγ1γ1andnlγ^αζζTγ=γ1γ^αγ1

have asymptotically the same distribution, i.e.,

nlγ^αγ^αlγ1θ1LnN0k+1,lγ^αζζTγ=γ1Aαγ11lγ^αζζγ=γ1.

Now, we shall denote,

σ2γ1=lγ^αζζTγ=γ1Aαγ11lγ^αζζγ=γ1.

Then, we have,

Pγ1Wnγ^α>χr,α2=Pγ1Wnγ^αnlγ1γ1>χr,α2nlγ1γ1=Pγ1nσγ1lγ^αγ^αlγ1γ1>1σγ1χr,α2nnlγ1γ11ϕN(0,1)1σγ1χr,α2nnlγ1γ1,

where ϕN(0,1)(t) represents the distribution function of a standard normal distribution evaluated at t. Finally,

limnPγ1Wnγ^α>χr,α2=1.

Appendix B. Poisson Regression Model

We derive here some explicit expression for the particular case of the Poisson regression. Following the discussion in Section 5, we denote here γ=β since the nuisance parameter is known, ϕ=1. The Poisson distribution with parameter exiTβ is given by

fi(y,β)=1y!eexiTβeyxiTβ,y=0,1,.

Differentiating its logarithm with respect to the regression vector, we get

logfi(y,β)β=(yexiTβ)xiT.

so we can write

K1i(y,β)=yexiTβ.

Further, we have that

Niy,β=fi(y,β)αy=0fi(y,β)α+1y=0fi(y,β)α+1yexiTβ.

so the estimating equations of the Poisson regression model are given by

i=1n1Lαiβfi(yi,β)αyiexiTβNiyi,βxi=0k. (A1)

For α=0, we have

Niyi,β=0andLαiβ=1

so the estimating equations are given by

i=1nyiexiTβxi=0k,

yielding to the maximum likelihood estimating equations.

On the other hand, the asymptotic distribution of β^α is given by

nXTD11X121nXTD11*D1*TD1*Xβ^αβLnN(0k,Ik)

being

D11=diagl11iβ,

with

l11i(β)=1Lαiβ2y=0fi(y,β)2α+1K1i(y,β)m1iβ

and

m1iβ=1y=0fi(y,β)α+1y=0fi(y,β)α+1yexiTβ.

Finally

D11*=diag(m11iβ)=1y=0fi(y,β)α+1y=0fi(y,β)α+1yexiTβ2.

Author Contributions

Conceptualization, M.J. and L.P.; methodology, M.J. and L.P.; software, M.J. and L.P.; validation, M.J. and L.P.; formal analysis, M.J. and L.P.; investigation, M.J. and L.P.; resources, M.J. and L.P.; data curation, M.J. and L.P.; writing—original draft preparation, M.J. and L.P.; writing—review and editing, M.J. and L.P.; visualization, M.J. and L.P.; supervision, M.J. and L.P.; project administration, M.J. and L.P.; funding acquisition, M.J. and L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Spanish Grants PGC2018-095194-B-100 (L. Pardo and M. Jaenada) and FPU/018240 (M. Jaenada).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The real datasets are publicly available on the R package robustbase in CRAN under the names of CrohnD (Poisson regression example) and carrots (binomial regression example).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Nelder J.A., Wedderburn R.W.M. Generalized linear models. J. R. Stat. Soc. 1972;135:370–384. doi: 10.2307/2344614. [DOI] [Google Scholar]
  • 2.McCullagh P., Nelder J.A. Monographs on Statistics and Applied Probability. Chapman and Hall; London, UK: 1983. Generalized Linear Models. [Google Scholar]
  • 3.Jaenada M., Pardo L. Data Analysis and Related Applications: Theory and Practice. Wiley; Athens, Greece: 2021. The minimum Renyi’s Pseudodistances estimators for Generalized Linear Models. Proceeding of the ASMDA. [Google Scholar]
  • 4.Stefanski L.A., Carroll R.J., Ruppert D. Optimally bounded score functions for generalized linear models with applications to logistic regression. Biometrika. 1986;73:413–424. doi: 10.2307/2336218. [DOI] [Google Scholar]
  • 5.Krasker W.S., Welsch R.E. Efficient bounded-influence regression estimation. J. Am. Stat. Assoc. 1982;77:595–604. doi: 10.1080/01621459.1982.10477855. [DOI] [Google Scholar]
  • 6.Künsch H.R., Stefanski L.A., Carroll R.J. Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J. Am. Stat. Assoc. 1989;84:460–466. [Google Scholar]
  • 7.Morgenthaler S. Least-absolute-deviations fits for generalized linear models. Biometrika. 1992;79:747–754. doi: 10.1093/biomet/79.4.747. [DOI] [Google Scholar]
  • 8.Cantoni E., Ronchetti E. Robust inference for generalized linear models. J. Am. Stat. Assoc. 2001;96:1022–1030. doi: 10.1198/016214501753209004. [DOI] [Google Scholar]
  • 9.Bianco A.M., Yohai V.J. Robust Statistics, Data Analysis, and Computer Intensive Methods. Springer; New York, NY, USA: 1996. Robust estimation in the logistic regression model; pp. 17–34. [Google Scholar]
  • 10.Croux C., Haesbroeck G. Implementing the Bianco and Yohai estimator for logistic regression. Comput. Stat. Data Anal. 2003;44:273–295. doi: 10.1016/S0167-9473(03)00042-2. [DOI] [Google Scholar]
  • 11.Bianco A.M., Boent G., Rodrigues I.M. Robust tests in generalized linear models with missing responses. Comput. Stat. Data Anal. 2013;65:80–97. doi: 10.1016/j.csda.2012.05.008. [DOI] [Google Scholar]
  • 12.Valdora M., Yohai V.J. Robust estimators for generalized linear models. J. Stat. Plan. Inference. 2014;146:31–48. doi: 10.1016/j.jspi.2013.09.016. [DOI] [Google Scholar]
  • 13.Ghosh A., Basu A. Robust estimation in generalized linear models: The density power divergence approach. Test. 2016;25:269–290. doi: 10.1007/s11749-015-0445-3. [DOI] [Google Scholar]
  • 14.Basu A., Harris I.R., Hjort N.L., Jones M.C. Robust and efficient estimation by minimising a density power divergence. Biometrika. 1998;85:549–559. doi: 10.1093/biomet/85.3.549. [DOI] [Google Scholar]
  • 15.Basu A., Ghosh A., Mandal A., Martin N., Pardo L. Robust Wald-type tests in GLM with random design based on minimum density power divergence estimators. Stat. Method Appl. 2021;3:933–1005. doi: 10.1007/s10260-020-00544-4. [DOI] [Google Scholar]
  • 16.Broniatowski M., Toma A., Vajda I. Decomposable pseudodistances and applications in statistical estimation. J. Stat. Plan. Inference. 2012;142:2574–2585. doi: 10.1016/j.jspi.2012.03.019. [DOI] [Google Scholar]
  • 17.Castilla E., Martín N., Muñoz S., Pardo L. Robust Wald-type tests based on Minimum Rényi Pseudodistance Estimators for the Multiple Regression Model. J. Stat. Comput. Simul. 2020;14:2592–2613. doi: 10.1080/00949655.2020.1787410. [DOI] [Google Scholar]
  • 18.Toma A., Leoni-Aubin S. Optimal robust M-estimators using Rényi pseudodistances. J. Multivar. Anal. 2013;115:259–273. doi: 10.1016/j.jmva.2012.10.003. [DOI] [Google Scholar]
  • 19.Toma A., Karagrigoriou A., Trentou P. Robust model selection criteria based on pseudodistances. Entropy. 2020;22:304. doi: 10.3390/e22030304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Rényi A. Proceeding of the 4th Symposium on Probability and Statistics. University of California Press; Berkely, CA, USA: 1961. On measures of entropy and information; pp. 547–561. [Google Scholar]
  • 21.Jones M.C., Hjort N.L., Harris I.R., Basu A. A comparison of related density-based minimum divergence estimators. Biometrika. 2001;88:865–873. doi: 10.1093/biomet/88.3.865. [DOI] [Google Scholar]
  • 22.Fujisawa H., Eguchi S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 2008;99:2053–2081. doi: 10.1016/j.jmva.2008.02.004. [DOI] [Google Scholar]
  • 23.Hirose K., Masuda H. Robust relative error estimation. Entropy. 2018;20:632. doi: 10.3390/e20090632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kawashima T., Fujisawa H. Robust and sparse regression via γ-divergence. Entropy. 2017;19:608. doi: 10.3390/e19110608. [DOI] [Google Scholar]
  • 25.Kawashima T., Fujisawa H. Robust and sparse regression in generalized linear model by stochastic optimization. Jpn. J. Stat. Data Sci. 2019;2:465–489. doi: 10.1007/s42081-019-00049-9. [DOI] [Google Scholar]
  • 26.Windham M.P. Robustifying model fitting. J. R. Stat. Soc. Ser. B. 1995;57:599–609. doi: 10.1111/j.2517-6161.1995.tb02050.x. [DOI] [Google Scholar]
  • 27.Castilla E., Jaenada M., Pardo L. Estimation and testing on independent not identically distributed observations based on Rényi’s pseudodistances. arXiv. 20212102.12282 [Google Scholar]
  • 28.Pardo L. Statistical Inference Based on Divergence Measures. Chapman and Hall/CRC; Boca Raton, FL, USA: 2018. [Google Scholar]
  • 29.Fraser D.A.S. Non parametric Methods in Statistics. John Wiley & Sons; New York, NY, USA: 1957. [Google Scholar]
  • 30.Maronna R.A., Martin R.D., Yohai V.J. Robust Statistics Theory and Methods. John Wiley & Sons. Inc.; Hoboken, NJ, USA: 2006. [Google Scholar]
  • 31.Donoho D.L., Huber P.J. A Festschrift for Erich L. Lehmann. CRC Press; Boca Raton, FL, USA: 1983. The notion of breakdown point. [Google Scholar]
  • 32.Nelder J.A., Mead R. A simplex method for function minimization. Comput. J. 1965;7:308–313. doi: 10.1093/comjnl/7.4.308. [DOI] [Google Scholar]
  • 33.Lô S.N., Ronchetti E. Robust and accurate inference for generalized linear models. J. Multivar. Anal. 2009;100:2126–2136. doi: 10.1016/j.jmva.2009.06.012. [DOI] [Google Scholar]
  • 34.Phelps K. Use of the Complementary Log-Log Function to Describe Dose Response Relationships in Insecticide Evaluation Field Trials. In: Gilchrist R., editor. Lecture Notes in Statistics, No. 14.: Proceedings of the International Conference on Generalized Linear Models. Springer; Berlin, Germany: 1982. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The real datasets are publicly available on the R package robustbase in CRAN under the names of CrohnD (Poisson regression example) and carrots (binomial regression example).


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES