Skip to main content
Entropy logoLink to Entropy
. 2022 Mar 13;24(3):399. doi: 10.3390/e24030399

Information–Theoretic Aspects of Location Parameter Estimation under Skew–Normal Settings

Javier E Contreras-Reyes 1
Editor: Takuya Yamano1
PMCID: PMC8947508  PMID: 35327910

Abstract

In several applications, the assumption of normality is often violated in data with some level of skewness, so skewness affects the mean’s estimation. The class of skew–normal distributions is considered, given their flexibility for modeling data with asymmetry parameter. In this paper, we considered two location parameter (μ) estimation methods in the skew–normal setting, where the coefficient of variation and the skewness parameter are known. Specifically, the least square estimator (LSE) and the best unbiased estimator (BUE) for μ are considered. The properties for BUE (which dominates LSE) using classic theorems of information theory are explored, which provides a way to measure the uncertainty of location parameter estimations. Specifically, inequalities based on convexity property enable obtaining lower and upper bounds for differential entropy and Fisher information. Some simulations illustrate the behavior of differential entropy and Fisher information bounds.

Keywords: skew–normal distribution, location parameter, skewness, differential entropy, Fisher information, Cramér–Rao bound, convexity

1. Introduction

A typical problem in statistical inference is estimating the parameters from a data sample [1], especially if the data have some level of skewness. Therefore, the estimation of these parameters is affected by asymmetry. Recent research addressed data asymmetry with the class of skew–normal distributions, given their flexibility for modeling data with the skewness (asymmetry/symmetry) parameter [2]. In particular, Ref. [3] focused on estimating location parameter (μ), assuming that the coefficient of variation and skewness parameter are known. Specifically, they presented the least square estimator (LSE) and the best unbiased estimator (BUE) for μ. The precision of the location parameter estimation is directly influenced by skewness [4] and, hence, affects the confidence intervals and sample size [5,6].

Given that complex parametric distributions with several parameters are often considered [2], the information measures (entropies and/or divergences) play an important role in quantifying uncertainty provided by a random process about itself, and it is sufficient to study the reproduction of a marginal process through a noiseless system. One main application is related to the selection of models and detection of the number of clusters [7], or the interpretation of physical phenomena [8,9]. However, the use of entropies and/or divergences is widely considered to compare estimations [1]. For example, Ref. [10] considered the Kullback–Leibler (KL) divergence as a method to compare sample correlation matrices to an application in financial markets, assuming two multivariate normal densities. Using the estimated parameters based on maximum likelihood estimation, Ref. [11,12,13] considered the KL divergence for an asymptotic test to evaluate the data skewness and/or bimodality.

Given that precision was evaluated with confidence intervals in [5], the quantification of uncertainty for location parameter estimation under skew–normal settings motivated this study. The properties for MSE and (emphasizing) BUE, using classic theorems and properties of information theory are explored, which enable measuring the uncertainty of location parameter estimations based on differential entropy and Fisher information [1]. The Cramér–Rao inequality [14] linked Fisher information with the variance of an unbiased estimator, which is considered to find a lower bound for Fisher information. In addition, considering a stochastic representation [15] of a skew–normal random variable, the convexity property of Fisher information is also used to find an upper bound for Fisher information.

This paper is organized as follows: some properties and inferential aspects based on information theory are presented in Section 2. In Section 3, the computation and description of information–theoretic theorems related to location parameter estimation of skew–normal distribution are presented. In Section 4, some simulations illustrate the usefulness of the results. Final remarks conclude the paper in Section 5.

2. Information-Theoretic Aspects

In this section, some main theorems and properties of information theory are described. Specifically, these properties are based on differential entropy and Fisher information.

Definition 1.

Let X be a random variable with support in R and continuous probability density function (pdf), f(x;θ), which depends on parameter θ. The differential entropy of X [1] is defined by

H(X)=Elogf(X;θ)=Rf(x;θ)logf(x;θ)dx,

where notation E[g(X)]=Rg(x)f(x;θ)dx was used.

Differential entropy depends only on the pdf of the random variable. In the following theorem, the scaling property of differential entropy is presented.

Theorem 1.

For any real constant a, the  differential entropy of aX Theorem 8.6.4 of [1] is given by

H(aX)=H(X)+log|a|.

In particular, for two random variables, the following differential entropy bounds hold.

Theorem 2.

Let X and Y be two independent random variables. Suppose Z=DX+Y, where “=D” denotes equality in distribution, then

  • (i) 
    H(X)+H(Y)+log22H(Z)H(X)+H(Y).
  • (ii) 
    For any constant ρ such that 0ρ1,
    ρH(X)+(1ρ)H(Y)H(ρX+1ρY).

Proof. 

For part (i), consider first the general case for X1,X2,,Xn independent and identically distributed (i.i.d.) random variables see Equations (5) and (6) of [16], then

HX1HX1+X22H1n1i=1n1XiH1ni=1nXi.

Considering the latter inequality for two variables, X and Y, and the scaling property of Theorem 1, we obtain 2H(X+Y)log2H(X)+H(Y), yielding the left side of the inequality. For the right side, see [17]. The inequality of part (ii) is proved in Theorem 7 of [14].    □

The inequality of Theorem 2 (ii) is based on the convexity property, and allows obtaining a lower bound for differential entropy Theorem 8.6.5 of [1].

Theorem 3.

Let X be a random variable with zero mean and finite σ2, then

H(X)12log(2πeσ2),

and equality is achieved if, and only if, XN(0,σ2).

Theorem 3, also known as the maximum entropy principle, implies that Gaussian distribution maximizes the differential entropy over all distributions with the same variance. This theorem has several implications for information theory, mainly when the differential entropy of an unknown distribution is hard to obtain. Thus, this upper bound is a good alternative. Another consequence is the relationship between estimation error and differential entropy, which includes the Cramér–Rao bound as described next. First, the Fisher information for continuous densities needs to be defined as follows.

Definition 2.

Let X be a random variable with support in R and continuous density function f(x;θ), which depends on parameter θ, so Rf(x;θ)dx=1. The Fisher information of X [1] is defined by

J(X)=Exlogf(x;θ)2=Rxf(x;θ)21f(x;θ)dx. (1)

The Fisher information is a measure of the minimum error in estimating a parameter θ of a distribution. Classical definitions of Fisher information considered differentiation with respect to θ to define J(θ); however, by considering a parametric form as f(xθ;θ), differentiation with respect to x is equivalent to differentiation with respect to θ as in Equation (1) [1]. The following inequality links Fisher information and variance.

Theorem 4.

Let X=(X1,X2,,Xn) be a sample of n random variables drawn i.i.d. f(x;θ), the mean-squared error of an unbiased estimator T(X) of parameter θ is lower bounded by the reciprocal of the Fisher information Theorem 11.10.1 of [1]:

Var[T(X)]1J(X), (2)

where J(X) is defined in Equation (1) and, if the inequality is achieved, T(X) is efficient.

Theorem 4, also known as the Cramér–Rao inequality, allows determining the best estimator of θ to obtain a lower bound for Fisher information. The Cramér–Rao inequality was first planned for any estimator T(X) (not necessarily unbiased) of θ in terms of mean-squared error, in this case

E[{T(X)θ}2]1+θBias(θ)2J(X)+Bias(θ)2,

where Bias(θ)=E[T(X)θ]; see Equation (11).290 of [1]. Clearly, if T(X) is an unbiased estimator of θ, Theorem 4 is a particular case of the latter inequality. Inequality (2) was obtained through the Cauchy–Schwarz inequality on the variance of all unbiased estimators. The following inequality, also known as the Fisher information inequality, is based on the convexity property and is useful to obtain an upper bound for Fisher information.

Theorem 5.

For any two independent random variables X and Y, and any constant ρ such that 0ρ1, then

J(ρX+1ρY)ρJ(X)+(1ρ)J(Y).

Proof. 

See proof of Theorem 13 in [14].    □

3. Location Parameter Estimation

The skew–normal distribution is an extension of the normal one, allowing for the presence of skewness.

Definition 3.

X is called a skew–normal random variable [15] and denoted as XSN1(μ,σ2,λ) if it has pdf

f(x;θ)=2σϕxμσΦλxμσ,xR,θ=(μ,σ2,λ);

with location μR, scale σ2R, and shape λR parameters. In addition, ϕ(x) is the pdf of the standardized normal distribution with 0 mean and variance 1, denoted as N(0,1), and Φ(x) is the corresponding cumulative distribution function (cdf) of the standardized normal distribution.

Random variable X is represented by the following stochastic representation:

X=dμ+σ(δ|U0|+1δ2U), (3)

where δ=λ1+λ2, and U0 and UN(0,1) are independently distributed; see Equation (2.14) of [15].

Additionally, X has a representation based on a link between differential entropy and Fisher information, due to de Bruijn’s identity. By matching the stochastic representation (3) with Equation (20) of [16], it is possible to assign Y=μ+σδ|U0| with fixed δ. Then,

H(Y)=12log(2πe)+λ(1+λ2)J(X)1dλ,

where an approximation for Fisher information J(X) appears in the proof of Proposition 5 below (with n=1 observation).

Definition 4.

X is called a multivariate skew–normal random vector [18] and denoted as XSNn(μ,Σ,λ) if it has pdf

fn(x;θ)=2ϕn(x;μ,Σ)Φ[λΣ1/2(xμ)],xRn,θ=(μ,Σ,λ),

with location vector μRn, scale matrix ΣRn×n, and skewness vector λRn parameters. In addition, ϕn(x;μ,Σ) is the n-dimensional normal pdf with location parameter μ and scale parameter Σ.

Let X=(X1,,Xn)SNn(μ,Σ,λ), with μ=1nμ, Σ=σ2In, and λ=1nλ, where 1n=(1,,1)Rn and In denotes the n×n-identity matrix. Following [3] and Corollary 2.2 of [5], the following properties hold.

Property 1.

XiSN1(μ,σ2,λ*), i=1,,n, with λ*=λ1+(n1)λ2.

Property 1 indicates that X1,,Xn is a random sample with identically distributed but random variables not independent from a univariate skew–normal population with location μ, scale σ2, and shape λ* parameters.

Property 2.

E[Xi]=μ+σbδ* and Var[Xi]=σ21nb2δ*2, with b=2π and δ*=λ1+nλ2.

Property 3.

X¯=1ni=1nXiSN1(μ,σ2n,nλ).

Property 4.

(n1σ2)S2χn12 with S2=1n1i=1n(XiX¯)2, where χn12 denotes the chi-square distribution with n1 degrees of freedom, and sample mean X¯ and sample variance S2 are independent.

3.1. Least Square Estimator

Assuming that the coefficient of variation τ=|σ/μ| and shape parameter λ are known, Theorem 4.1 of [3] provides the least square estimator for μ and its variance, given by

TLSE(X)=ωX¯, (4)
Var[TLSE(X)]=ω2(1nδn2)σ2n,ω=nn+ττ+nδn1+δnτ, (5)

where δn=bδ*, and δ* is defined in Property 2. The least square estimator for μ was obtained by minimizing the MSE of ncX¯ with respect to a constant c. The MSE of TLSE(X) is

MSE[TLSE(X)]=σ2n+μ2+2μδnσω22μω(μ+σδn)+μ2. (6)

Proposition 1.

Let X=(X1,,Xn)SNn(μ1n,σ2In,1nλ), with known τ and λ. Thus,

  • (i) 
    H(TLSE(X))=12log2πeσ2nω2HN(η),HN(η)=E[log{2Φ(ηW)}],WSN1(0,1,η),η=σλ.
  • (ii) 
    H(TLSE(X))12log2πeσ2nω2(1nδn2).

Proof. 

Part (i) follows straightforwardly from Theorem 1, Property 3, (4) and Proposition 2.1 of [19] (for the univariate case). Part (ii) is straightforward from Theorem 3 and (5).    □

Differential entropy of TLSE(X) corresponds to the difference of the normal differential entropy and a term called negentropy, HN(η), that depends on σ and λ parameters. Additionally, note that part (ii) yields the upper bound for HN(η) of part (i), HN(η)12log(1nδn2).

As a particular case of Proposition 1, it is possible to obtain the differential entropy of sample mean X¯ by choosing ω=1:

H(X¯)=12log2πeσ2nE[log{2Φ(ηW)}]; (7)

its respective upper bound

H(X¯)12log2πeσ2n(1nδn2);

and, from Equation (6), its respective MSE

MSE[X¯]=σ2n. (8)

3.2. Best Unbiased Estimator

Assuming that the coefficient of variation τ=|σ/μ| and shape parameter λ are known, Theorem 5.1 of [3] provides the best unbiased estimator (BUE) for μ, given by

TBUE(X)=(1α)d1(X)+αd2(X),d1(X)=X¯1+δnτ,d2(X)=cnn1S, (9)
cn=12τ2Γn12Γn2, (10)
α=1(1+δnτ)[(n1)cn]2, (11)

where Γ(x) denotes the usual gamma function and S is defined in Property 4.

Remark 1.

Equation (10) can be approximated using an asymptotic expression for the gamma function given by Γ(x+a)2πxx+a1/2ex, a<, as |x| [19]. Then,

cn1nτ2, (12)

as n. Since the exact form (10) can be undefined for large samples (n>200), approximation (1) is very useful for these cases. Note that from (11) and (1), δn,α0 as n, which implies that estimator TBUE(X) is only influenced by d1(X) for large samples.

From Properties 3 and 4, Ref. [3] also proved that

d1(X)SN1μ1+δnτ,σ2n(1+δnτ)2,nλ, (13)
Var[d1(X)]=(μτ)2(1nδn2)n(1+δnτ)2, (14)
Var[d2(X)]=2(μτ)2112(n1)(τcn)2. (15)

Given that Cov(d1,d2)=0, from Equations (9), (14) and (15), we obtain

Var[TBUE(X)]=(1α)2Var[d1(X)]+α2Var[d2(X)]. (16)

The following proposition provides two upper bounds of differential entropy for TBUE(X) based on Theorem 3.

Proposition 2.

Let X=(X1,,Xn)SNn(μ1n,σ2In,1nλ), with known τ and λ. Thus,

  • (i) 
    H(TBUE(X))12log4πenσ2α(1α)1+δnτ2112(n1)(τcn)2,
  • (ii) 
    H(TBUE(X))12log2πeVar[TBUE(X)]}.

Proof. 

From Theorem 3 and Equations (14) and (15), the differential entropies of d1(X) and d2(X) are, respectively, upper bounded by

H(d1(X))12log2πeσ2n(1+δnτ)2, (17)
H(d2(X))12log2πe2σ2112(n1)(τcn)2. (18)

Considering the right side on the inequality of Theorem 2(i), with Z=X+Y, X=(1α)d1(X) and Y=αd2(X) (thus Cov(X,Y)=0), we obtain

H(TBUE(X))H((1α)d1(X))+H(αd2(X))=H(d1(X))+H(d2(X))+log|(1α)α|,

where Theorem 1 is applied later. Then, Equations (17) and (18) yield part (i). On the other hand, by considering directly Theorem 3 on TBUE(X), Equation (16) implies part (ii).    □

The following proposition provides two lower bounds of differential entropy for TBUE(X).

Proposition 3.

Let X=(X1,,Xn)SNn(μ1n,σ2In,1nλ), with known τ (0<τ<1) and λ. Thus

  • (i) 
    H(d1(X))+H(d2(X))+log(2α2(1α)2)2H(TBUE(X)),
  • (ii) 
    (1α2)H(d1(X))+α2H(d2(X))H(TBUE(X));

where

H(d1(X))=12log2πeσ2n(1+δnτ)2HN(η1),HN(η1)=E[log{2Φ(η1W1)}],W1SN1(0,1,η1),η1=|λσ1+δnτ|;

and

H(d2(X))=log|α|Γn12n2τΓn2n1(n2)Γn124(2cn2)n12ψn12+log(2cn2)+Γ(n+1)cn2Γn12n+2τΓn2n+3.

Proof. 

Differential entropy of d1(X) is straightforward from evaluating (13) on Proposition 2.1 of [19] (for the univariate case). Given that distribution of d2(X) is unknown, Ref. [3] provided its pdf

fd2(x;μ,σ)=2τΓn2n1Γn12nxn2ex22cn2. (19)

Through Equations (19) and (3.381.4) of [20], the moments of d2 are given by

Ed2[Xm]=2Γ(m+n1)Γn122m+n2τΓn22m+n1,m=0,1,; (20)

and, using Equation (4.352.1) of [20], the moment of logx is

Ed2[logX]=0fd2(x;μ,σ)logxdx,=2τΓn2n1Γn12n0xn2ex22cn2logxdx,=(2cn2)n12Γn124ψn12+log(2cn2), (21)

where ψ(x)=ddxlog{Γ(x)} is the digamma function. Therefore, by definition (1), the differential entropy of d2(X) is computed as

H(d2(X))=0fd2(x;μ,σ)logfd2(x;μ,σ)dx,=log2τΓn2n1|α|Γn12n(n2)0fd2(x;μ,σ)logxdxEd2[logX]+12cn20fd2(x;μ,σ)x2dxEd2[X2].

Thus, Equations (20) and (21) are evaluated in the latter expression to obtain H(d2(X)). By assuming Z=X+Y in Theorem 2(i) (Cov(X,Y)=0), with X=(1α)d1(X) and Y=αd2(X), the inequality of part (i) is obtained.

By assuming Z=X+Y in Theorem 2(ii), with X=d2(X), Y=d1(X) (thus Cov(X,Y)=0), and ρ=α2, and since d1(X) and d2(X) are two unbiased estimators of μ [3], the inequality of part (ii) is obtained.    □

The following proposition provides a lower bound for Fisher information of parameter μ based on TBUE(X).

Proposition 4.

Let X=(X1,,Xn)SNn(μ1n,σ2In,1nλ), with known τ and λ. Thus,

J(μ)(1α)2σ2(1nδn2)n(1+δnτ)2+α22σ2112(n1)(τcn)21.

Proof. 

Considering that TBUE(X) is an unbiased estimator of μ, from the Crámer–Rao inequality of Theorem 4 and Equations (14)–(16), we obtain

J(μ)1(1α)2Var[d1(X)]+α2Var[d2(X)],

yielding the result.    □

The following Proposition provides an upper bound of Fisher information for parameter μ based on TBUE(X) and the convexity property.

Proposition 5.

Let X=(X1,,Xn)SNn(μ1n,σ2In,1nλ), with known τ (0<τ<1) and λ. Thus

J(μ)(1α2)J(d1(X))+α2J(d2(X)),

where

J(d1(X))1+n(bλ)21+2nb4λ2,J(d2(X))=2(n2)(n22n2)Γ(n2)12cn2n3τΓn2n1Γn12n+1cn4,

Proof. 

By assuming Z=X+Y in Theorem 5, with X=d2(X), Y=d1(X) (thus Cov(X,Y)=0), and ρ=α2, and since d1(X) and d2(X) are two unbiased estimators of μ [3], we obtain J(μ)α2J(d2(X))+(1α2)J(d1(X)). Note that condition 0<τ<1 ensures that 0α21.

For J(d1(X)), the steps of Section 3.2 of [9] were considered. By  Equations (1) and (13), and the change of variable z=(xμ*)/σ*, with μ*=μ/(1+δnτ), σ*=σ/n(1+δnτ)2 and λ*=nλ, J(d1(X)) can be computed as

J(d1(X))=xf(x;θ)21f(x;θ)dx=f(z;λ*)[z+λ*ζ(λ*z)]2dz=z2f(z;λ*)dz2λ*zζ(λ*z)f(z;λ*)dz+[λ*]2ζ(λ*z)2f(z;λ*)dz=z2f(z;λ*)dz4λ*zϕ(λ*z)ϕ(z)dz+2[λ*]2ϕ(z)ϕ(λ*z)2Φ(λ*z)dz, (22)

where ζ(x)=ϕ(x)/Φ(x) is the zeta function. From Equation (22), the first and second terms are the second moment of a standardized skew–normal random variable (E[Z2]=1) and the first moment of a standardized normal random variable (E[R]=0, RN(0,1)), respectively. The third term is

ϕ(z)ϕ(λ*z)2Φ(λ*z)dz=0ϕ(z)ϕ(λ*z)2Φ(λ*z)dz+0ϕ(z)ϕ(λ*z)21Φ(λ*z)dz=0ϕ(z)ϕ(λ*z)2Φ(λ*z)[1Φ(λ*z)]dz

The following approximation of normal densities (see p. 83 of [15]),

ϕ(y)Φ(y)[1Φ(y)]πb2ϕ(b2y),yR,

and some basic algebraic operations of normal densities are useful to approximate the third term of Equation (22) as

0ϕ(z)ϕ(λ*z)2Φ(λ*z)[1Φ(λ*z)]dzπ2b40ϕ(z)ϕ(b2λ*z)2dz=π2b42π1+2b4[λ*]20ϕ(z;0,{1+2b4[λ*]2}1)dz=πb441+2b4[λ*]2.

Given that πb42=b2 and λ*=nλ, we obtain

J(d1(X))1+n(bλ)21+2nb4λ2.

Using Equation (19), J(d2(X)) is computed as

J(d2(X))=0xfd2(x;μ,σ)21fd2(x;μ,σ)dx=0fd2(x;μ,σ)[(n2)x11cn2]2dx=(n2)20x2fd2(x;μ,σ)dx2(n2)cn20x1fd2(x;μ,σ)dx+1cn4=2(n2)2τΓn2n1Γn12n0xn4e12cn2x2dxM14(n2)cn2τΓn2n1Γn12n0xn3e12cn2x2dxM2+1cn4=2(n2)(n22n2)Γ(n2)12cn2n3τΓn2n1Γn12n+1cn4,

where Equation (3.381.4) of [20] is applied to solve integrals M1 and M2.    □

Remark 2.

Considering the same argument as in Remark (1), it can be noted that inequalities of Propositions 4 and 5 are only affected by J(d1(X)), i.e., for large samples, we obtain

1Var[d1(X)]J(μ)J(d1(X)).

4. Simulations

All location parameter estimators, variances, Fisher information and differential entropies were calculated with R software [21]. Samples based on skew–normal random variables were drawn based on stochastic representation (3) and with the rsn function of sn package. All R codes used in this paper are available upon request from the corresponding author.

In general, τ takes a value between 0 and 1. If τ is close to 0, the sample has low variability, and if it is close to 1, the sample has high variability and mean loss reliability. For example, if τ>0.3, the mean is less representative of the sample. Sometimes, if μ is close to zero, τ takes high values (high variability) and could exceed unity. Therefore, for illustrative purposes, in all simulations, a coefficient of variation set of τ=0.1,,1 is considered. Additionally considered are positive asymmetry parameters λ=0.1,,5, sample sizes n=10 and 250, and theoretical location parameters μ=0.1, 0.5 and 1. For the computation of information measures, σ is replaced by τ|μ| and location parameters μ are evaluated by their respective estimators.

The MSE of TLSE(X) is given in Equation (6), and MSE of TBUE(X) is the variance of the (unbiased) estimator (see Equation (16)). Without loss of generality, τ=1 is considered in Figure 1 because the same pattern is repeated for values of τ between 0 and 1. Comparing the MSE of both estimators, Figure 1 shows for all cases that differences between MSEs tend to increase for large values of μ, and MSEs turn around a specific value when the sample size increases. Moreover, MSEs of the unbiased estimator are less than those obtained by LSE, i.e., BUE dominates LSE Equation (11.263) of [1]. Therefore, the analysis focuses on BUE in the next section.

Figure 1.

Figure 1

Mean square errors (MSE) for TLSE(X) [blue dots] and TBUE(X) [red dots] considering τ=1 and several skewness λ and location μ parameters in the simulations.

Behavior of differential entropy bounds given in Propositions 2 and 3 is illustrated in Figure 2 as 3D plots. Without loss of generality, τ(0,1] is considered in Figure 2 because the same pattern is repeated for values of τ>1, i.e., entropies keep increasing. The upper bound corresponds to the minimum value between bounds given in Proposition 2(i) and (ii), which is the one given in (ii). Thus, the upper bound of H(TBUE(X)) is determined by the variance (or MSE) of the estimator. In contrast, the lower bounds correspond to the maximum value between bounds given in Proposition 3 (i) and (ii) [17].

Figure 2.

Figure 2

Differential entropy bounds for TBUE(X) considering n=10 and 250, μ=0.1, 0.5 and 1; and several skewness λ and coefficient of variation τ parameters in the simulations.

Sample sizes (n=250) imply that α0 and lower bounds only depend on H(d1(X)). For small sample sizes (n=10), α could be an intermediate value of the interval (0,1), thus, lower bounds depend on H(d1(X)) and H(d2(X)). For n=10, the surfaces are rough, given the randomness of bounds produced by the small sample. When λ0 (symmetry condition), the bounds decay to negative values. This is analogous to considering the skew–normal density as a non-stationary process [15], when λ is near zero, so the Hurst exponent decreases abruptly [8]. On the other hand, for n=250, surfaces are soft and bounds increase slightly for large λ. For all cases, information increases when τ tends to 1 because it produces more variability in samples.

For practical purposes, the average between bounds can be considered to provide an approximation of differential entropy [7] in similar form to average lengths of the confidence interval [3]. Given that all lower bounds of differential entropy depend on the entropy of d1(X), which depends on variance and sample size, they could take negative values and tend to zero when τ tends to 1. Therefore, the difference between lower and upper bounds could increase and turn out an inadequate approximation if the lower bound is negative. For the latter reason, the Fisher information considers only positive values, as studied next.

The Fisher information bounds given in Propositions 4 and 5 are illustrated in Figure 3 as 3D and 2D plots, respectively. As in the differential entropy case, and without loss of generality, τ(0,1] is considered in Figure 3 because the same pattern is repeated for values of τ>1, i.e., entropies keep decreasing. Following the Cramér–Rao theorem, the variance of BUE corresponds to the reciprocal of the Fisher information. In contrast, the lower bound corresponds to a combination of the Fisher information of d1(X) and d2(X).

Figure 3.

Figure 3

Fisher information lower bounds for TBUE(X) considering n=250, μ=1, 2.5 and 5; and several skewness λ and coefficient of variation τ parameters in the simulations. The fourth panel shows the upper bounds for TBUE(X) considering n=100,,1000 and several skewness parameters λ.

As in the differential entropy case, large sample sizes (n=250) imply that α0 and lower bounds only depend on J(d1(X)), as mentioned in Remark 2. For small sample sizes (n=10), α could be an intermediate value of the interval (0,1), thus lower bounds depend on J(d1(X)) and J(d2(X)). When τ0 (low variability condition), the lower bounds take the highest values. This reciprocal relationship is determined by the Cramér–Rao theorem: more variability, less Fisher information. In addition, the 2D plot shows that the smallest upper bounds of J(TBUE(X)) are produced when λ0 [9]. Given that upper bounds do not depend on τ and μ because the skew–normal densities are standardized, these measures are illustrated with respect to n and λ. In addition, when λ and n increase simultaneously, the upper bounds of Fisher information take the largest values.

5. Concluding Remarks

In this paper, some properties of the best unbiased estimator proposed by [3] were presented, using classic theorems of information theory, which provide a way to measure the uncertainty of location parameter estimations. Given that BUE dominates LSE, this paper focused on this estimator. Inequalities based on differential entropy and Fisher information allowed obtaining lower and upper bounds for these measures. Some simulations illustrated the behavior of differential entropy and Fisher information bounds.

Classical theorems of information theory considered the obtained additional properties of unbiased location parameter estimators. However, these theorems could be applied to other estimators, such as Bayesian [22] (as long as the prior pdf density is known), shrinkage [23], or bootstrap-based [24] ones. The assumption of the sample that came from a multivariate skew–normal distribution is too strong and not always applicable in the real world, so the properties revised here could be extended to more complex densities, for example, those that assess bimodality and heavy tails in data [7,11,13,19]. On the other hand, and given that Fisher information bounds under skew–normal settings were considered in this study, further work could focus on developing time-dependent Fisher information for skew–normal density [25], which could be applied to real data in survival analysis.

Acknowledgments

The author thanks the editor and three anonymous referees for their helpful comments and suggestions.

Funding

This research was funded by FONDECYT (Chile) grant number 11190116.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares that there is no conflict of interest in the publication of this paper.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Cover T.M., Thomas J.A. Elements of Information Theory. 2nd ed. Wiley & Son, Inc.; New York, NY, USA: 2006. [Google Scholar]
  • 2.Adcock C., Azzalini A. A selective overview of skew-elliptical and related distributions and of their applications. Symmetry. 2020;12:118. doi: 10.3390/sym12010118. [DOI] [Google Scholar]
  • 3.Wang Z.Y., Wang C., Wang T.H. Estimation of location parameter in the skew normal setting with known coefficient of variation and skewness. Int. J. Intel. Technol. Appl. Stat. 2016;9:191–208. [Google Scholar]
  • 4.Trafimow D., Wang T., Wang C. From a sampling precision perspective, skewness is a friend and not an enemy! Educ. Psychol. Meas. 2019;79:129–150. doi: 10.1177/0013164418764801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wang C., Wang T., Trafimow D., Myüz H.A. Necessary sample sizes for specified closeness and confidence of matched data under the skew normal setting. Comm. Stat. Simul. Comput. 2019 doi: 10.1080/03610918.2019.1661473. in press . [DOI] [Google Scholar]
  • 6.Wang C., Wang T., Trafimow D., Talordphop K. Estimating the location parameter under skew normal settings: Is violating the independence assumption good or bad? Soft Comput. 2021;25:7795–7802. doi: 10.1007/s00500-021-05679-4. [DOI] [Google Scholar]
  • 7.Abid S.H., Quaez U.J., Contreras-Reyes J.E. An information-theoretic approach for multivariate skew-t distributions and applications. Mathematics. 2021;9:146. doi: 10.3390/math9020146. [DOI] [Google Scholar]
  • 8.Contreras-Reyes J.E. Analyzing fish condition factor index through skew-gaussian information theory quantifiers. Fluct. Noise Lett. 2016;15:1650013. doi: 10.1142/S0219477516500139. [DOI] [Google Scholar]
  • 9.Contreras-Reyes J.E. Fisher information and uncertainty principle for skew-gaussian random variables. Fluct. Noise Lett. 2021;20:21500395. doi: 10.1142/S0219477521500395. [DOI] [Google Scholar]
  • 10.Tumminello M., Lillo F., Mantegna R.N. Correlation, hierarchies, and networks in financial markets. J. Econ. Behav. Organ. 2010;75:40–58. doi: 10.1016/j.jebo.2010.01.004. [DOI] [Google Scholar]
  • 11.Contreras-Reyes J.E. An asymptotic test for bimodality using the Kullback–Leibler divergence. Symmetry. 2020;12:1013. doi: 10.3390/sym12061013. [DOI] [Google Scholar]
  • 12.Contreras-Reyes J.E., Kahrari F., Cortés D.D. On the modified skew-normal-Cauchy distribution: Properties, inference and applications. Comm. Stat. Theor. Meth. 2021;50:3615–3631. doi: 10.1080/03610926.2019.1708942. [DOI] [Google Scholar]
  • 13.Contreras-Reyes J.E., Maleki M., Cortés D.D. Skew-Reflected-Gompertz information quantifiers with application to sea surface temperature records. Mathematics. 2019;7:403. doi: 10.3390/math7050403. [DOI] [Google Scholar]
  • 14.Dembo A., Cover T.M., Thomas J.A. Information theoretic inequalities. IEEE Trans. Infor. Theor. 1991;37:1501–1518. doi: 10.1109/18.104312. [DOI] [Google Scholar]
  • 15.Azzalini A. The Skew-Normal and Related Families. Volume 3 Cambridge University Press; Cambridge, UK: 2013. [Google Scholar]
  • 16.Madiman M., Barron A. Generalized entropy power inequalities and monotonicity properties of information. IEEE Trans. Infor. Theor. 2007;53:2317–2329. doi: 10.1109/TIT.2007.899484. [DOI] [Google Scholar]
  • 17.Xie Y. 2012. [(accessed on 15 January 2022)]. Sum of Two Independent Random Variables. ECE587, Information Theory. Available online: https://www2.isye.gatech.edu/~yxie77/ece587/SumRV.pdf. [Google Scholar]
  • 18.Azzalini A., Dalla-Valle A. The multivariate skew-normal distribution. Biometrika. 1996;83:715–726. doi: 10.1093/biomet/83.4.715. [DOI] [Google Scholar]
  • 19.Contreras-Reyes J.E. Asymptotic form of the Kullback–Leibler divergence for multivariate asymmetric heavy-tailed distributions. Phys. A Stat. Mech. Its Appl. 2014;395:200–208. doi: 10.1016/j.physa.2013.10.035. [DOI] [Google Scholar]
  • 20.Gradshteyn I.S., Ryzhik I.M. Table of Integrals, Series, and Products. 7th ed. Academic Press; Cambridge, MA, USA: Elsevier; London, UK: 2007. [Google Scholar]
  • 21.R Core Team . A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2021. [(accessed on 15 January 2022)]. Available online: http://www.R-project.org. [Google Scholar]
  • 22.Bayes C.L., Branco M.D. Bayesian inference for the skewness parameter of the scalar skew-normal distribution. Braz. J. Prob. Stat. 2007;21:141–163. [Google Scholar]
  • 23.Kubokawa T., Strawderman W.E., Yuasa R. Shrinkage estimation of location parameters in a multivariate skew-normal distribution. Comm. Stat. Theor. Meth. 2020;49:2008–2024. doi: 10.1080/03610926.2019.1568481. [DOI] [Google Scholar]
  • 24.Ye R., Fang B., Wang Z., Luo K., Ge W. Bootstrap inference on the Behrens–Fisher-type problem for the skew-normal population under dependent samples. Comm. Stat. Theor. Meth. 2021 doi: 10.1080/03610926.2021.1980045. in press . [DOI] [Google Scholar]
  • 25.Kharazmi O., Asadi M. On the time-dependent Fisher information of a density function. Braz. J. Prob. Stat. 2018;32:795–814. doi: 10.1214/17-BJPS366. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES