Abstract
In this paper, we focus on extended informational measures based on a convex function : entropies, extended Fisher information, and generalized moments. Both the generalization of the Fisher information and the moments rely on the definition of an escort distribution linked to the (entropic) functional . We revisit the usual maximum entropy principle—more precisely its inverse problem, starting from the distribution and constraints, which leads to the introduction of state-dependent -entropies. Then, we examine interrelations between the extended informational measures and generalize relationships such the Cramér–Rao inequality and the de Bruijn identity in this broader context. In this particular framework, the maximum entropy distributions play a central role. Of course, all the results derived in the paper include the usual ones as special cases.
Keywords: ϕ-entropy, state-dependent ϕ-entropy, (inverse) maximum ϕ-entropy problem, ϕ-escort distributions, ϕ-Fisher information, ϕ-moments, generalized Cramér–Rao inequality, ϕ-heat equation, generalized de Bruijn
1. Introduction
Since the pioneer works of von Neumann [1], Shannon [2], Boltzmann, Maxwell, Planck, and Gibbs [3,4,5,6,7,8,9], many investigations were devoted to the generalization of the so-called Shannon entropy and its associated measures [10,11,12,13,14,15,16,17,18,19,20,21,22]. If the Shannon measures are compelling, especially in the communication domain, for compression purposes, many generalizations proposed later on have also showed promising interpretations and applications (Panter–Dite formula in quantification where the Rényi or Havrda–Charvát entropy emerges [23,24,25], encoding penalizing long codewords where the Rényi entropy appears [26,27], for instance). The great majority of the extended entropies found in the literature belongs to a very general class of entropic measures called -entropies [13,19,20,28,29,30]. Such a general class (or more precisely the subclass of -entropies) can be traced back to the work of Burbea and Rao [28]. They offer not only a general framework to study general properties shared by special entropies, but they also offer many potential applications as described for instance in [30]. Note that if a large amount of work deals with divergences, entropies occur as special cases when one takes a uniform reference measure.
In the framework of these generalized entropies, the so-called maximum entropy principle takes a special place. This principle, advocated by Jaynes, states that the statistical distribution that describes a system in equilibrium maximizes the entropy while satisfying the system’s physical constraints (e.g., the center of mass and energy) [31,32,33,34,35]. In other words, it is the less informative law given the constraints of the system. In the Bayesian approach, dealing with the stochastic modeling of a parameter, such a principle (or a minimum divergence principle) is often used to choose a prior distribution for the parameter [22,36,37,38,39]. It also finds its counterpart in communication, clustering, pattern recognition, problems, among many others [32,33,40,41,42,43]. In statistics, some goodness-of-fit tests are based on entropic criteria derived from the same idea of constrained maximal entropic law [44,45,46,47,48,49]. The principle behind such entropic tests lies in the Bregman divergence, measuring a kind of distance between probability distributions, i.e., the empirical distribution given by data and the distribution we assume for the data (reference). It appears that if the empirical distribution and the reference share the same moments, and if the latter is of maximum entropy with these moments as constraints, the divergence reduces to a difference of entropy. In a large number of works using the maximum entropy principle, the entropy used is the Shannon entropy. However, if for some reason, a generalized entropy is considered, the approach used in the Shannon case does not fundamentally change [50,51,52,53].
One can consider the inverse problem which consists in finding the moment constraints leading to the observed distribution as a maximal entropy distribution [50]. Kesavan and Kapur also envisaged a second inverse problem, where both the distribution and the moments are given. The question is thus to determine the entropy so that the distribution is its maximizer. As a matter of fact, dealing with the Shannon entropy, whatever the constraints considered, the maximum entropy distribution falls in the exponential family [33,34,52,54]. Remind that the exponential family is the set of parametric densities (with respect to a measure independent on the parameter) of the form where is the sufficient statistics [39,55,56,57,58,59,60]. When , the family is said to be natural and is the partition function, the log-partition function being the cumulants generating function. Now, resolving the maximum entropy problem given later on by Equation (6) in the context of the Shannon entropy, it appears indeed that the maximum entropy distribution falls in the natural exponential family where the sufficient statistics is given by the moment constraints. Considering more general entropies allows to escape from this limitation. Moreover, if the Shannon entropy (or the Gibbs entropy in physics) is well adapted to the study of systems in the equilibrium (or in the thermodynamic limit), extended entropies allow a finer description of systems out of equilibrium [17,61,62,63,64,65], exhibiting their importance. While the problem was considered mainly in the discrete setting by Kesavan and Kapur in [50], we will recall it in the general framework of the -entropies probability densities with respect to any reference measure, and make a further step considering an extended class of these entropies. Resolving the inverse problem can find applications in goodness-of-fit tests for instance, allowing to design entropies adapted to such tests, in the same line as that of the approaches mentioned above [44,45,46,47,48,49].
While the entropy is a widely used tool for quantifying information (or uncertainty) attached to a random variable or to a probability distribution, other quantities are used as well, such as the moments of the variable (giving information, for instance, on center of mass, dispersion, skewness, or impulsive character), or the Fisher information. In particular, the Fisher information appears in the context of estimation [66,67], in Bayesian inference through the Jeffreys prior [39,68], but also for complex physical systems descriptions [67,69,70,71,72,73].
Although coming from different worlds (information theory and communication, estimation, statistics, and physics), these informational quantities are linked by well-known relations such as the Cramér–Rao inequality, the de Bruijn identity, and the Stam inequality [34,74,75,76]. These relationships have been proved very useful in various areas, for instance, in communications [34,74,75], in estimation [66], or in physics [77,78], among others. When generalized entropies are considered, it is natural to question the other informational measures’ generalization and the associated identities or inequalities. This question gave birth to a large amount of work and is still an active field of research [28,79,80,81,82,83,84,85,86,87,88,89,90]. For instance, the Cramér–Rao inequality is very important as it gives the ultimate precision in terms of mean square error of an estimator of a parameter (i.e., the minimal error we can achieve). However, there is no reason for choosing a quadratic error in general. This choice is often made as it allows to simplify algebra or to derive estimators quite easily (e.g., of minimum mean square error). One may wish to choose other error criteria (mean of another norm of the error) and/or to stress parts of the distribution of the data in the mathematical average. It is thus of high interest to be able to derive Cramér–Rao inequalities in a context as broad as possible.
In this paper, we show that it is possible to build a whole framework, which associates a target maximum entropy distribution to generalized entropies, generalized moments, and generalized Fisher information. In this setting, we derive generalized inequalities and identities relating these quantities, which are all linked in some sense to the maximum entropy distribution.
The paper is organized as follows. In Section 2, we recall the definition of the generalized -entropy. Thus, we come back to the maximum entropy problem in this general settings. Following the sketch of [50], we present a sufficient condition linking the entropic functional and the maximizing distribution, allowing to both solve the direct and the inverse problems. When the sufficient conditions linking the entropic function and the distribution cannot be satisfied, the problem can be solved by introducing state-dependent generalized entropies, which is the purpose of Section 3. In Section 4, we introduce informational quantities associated to the generalized entropies of the previous sections, such as a generalized escort distribution, generalized moments, and generalized Fisher information. These generalized informational quantities allow to extend the usual informational relations such as the Cramér–Rao inequality, relations precisely saturated (or valid) for the generalized maximum entropy distribution. Finally, in Section 5, we show that the extended quantities allows to obtain an extended de Bruijn identity, provided the distribution follows a nonlinear heat equation. Some examples of -entropies solving the inverse maximum entropy problem are provided in a short series of appendices, showing, in particular, that the usual quantities are recovered as particular cases (Gaussian distribution, Shannon entropy, Fisher information, and variance).
In the following, we will define a series of generalized information quantities relative to a probability density defined with respect to a given reference measure (e.g., the Lebesgue measure when dealing with continuous random variables, discrete measure for discrete-state random variables, etc.). Therefore, rigorously, all these quantities depend on the particular choice of this reference measure. However, for simplicity, we will omit to mention this dependence in the notations along the paper.
2. ϕ-Entropies—Direct and Inverse Maximum Entropy Problems
The direct problem, i.e., finding the probability distribution of maximum entropy given moments constraints, is a common problem and can find application, for instance, in the Bayesian framework, searching for prior probability distribution as less informative as possible, given some moments [22,36,37,38,39]. It also finds many other applications, as mentioned in the introduction.
Let us first recall the definition of the generalized -entropies introduced by Csiszàr in terms of divergence, and by Burbea and Rao in terms of entropy:
Definition 1
(-entropy [28]). Let be a convex function defined on a convex set . Then, if f is a probability distribution defined with respect to a general measure μ on a set such that , when this quantity exists,
(1) is the ϕ-entropy of f.
The -entropy is defined by where h is a nondecreasing function. The definition is extended by allowing to be concave, together with h nonincreasing [13,19,20,29,30]. If, additionally, h is concave, then the entropy functional is concave.
As we are interested in the maximum entropy problem, and because h is monotone, we can restrict our study to the -entropies. Additionally, we will assume that is strictly convex and differentiable.
A related quantity is the Bregman divergence associated with convex function :
Definition 2
(Bregman divergence and functional Bregman divergence [22,91]). With the same assumptions as in Definition 1, the Bregman divergence associated with ϕ defined on a convex set is given by the function defined on ,
(2) Applied to two functions , , the functional Bregman divergence writes
(3) A direct consequence of the strict convexity of ϕ is the non-negativity of the (functional) Bregman divergence: and , with equality if and only if and almost everywhere respectively.
From its positivity and equality only when the distributions are (almost everywhere) equal, this divergence defines a kind of distance (it is not, being non-symmetrical) where serves as a reference.
More generally, the Bregman divergence is defined for multivariate convex functions, where the derivative is replaced by gradient operator [91]. Extensions for convex function of functions also exist, where the derivative is in the sense of Gâteau [92]. Such general extensions are not useful for our purposes; thus, we restrict ourselves to the above definition where .
2.1. Maximum Entropy Principle: The Direct Problem
Let us first recall the maximum entropy problem that consists in searching for the distribution maximizing the -entropy (1) subject to constraints on some moments with , . This direct problem writes
| (4) |
with
| (5) |
where and (normalization constraint), . We are faced to a strictly concave optimization problem (the functional to maximize is concave w.r.t. f and the constraints are linear w.r.t. f, so that the functional restricted to a linear subspace is still concave). Therefore, the solution exists and is unique. A technique to solve the problem can be to use the classical Lagrange multipliers technique and to solve the Euler–Lagrange equation from the variational problem, but this approach requires mild conditions [50,51,53,93,94,95]. In the following proposition, we recall a sufficient condition relating f and so that f is the problem’s solution. This result is proven without the use of the Lagrange technique.
Proposition 1
(Maximal -entropy solution [50]). Suppose that there exists a probability distribution satisfying
(6) for some . Then, f is the unique solution of the maximal entropy problem (4).
Proof.
Suppose that distribution f satisfies Equation (6) and consider any distribution . The functional Bregman divergence between f and g writes
where we used the fact that g and f are both probability distributions with the same moments . By non-negativity of the Bregman functional divergence, we finally get that
for all distributions g with the same moments as f, with equality if and only if almost everywhere. In other words, this shows that if f satisfies Equation (6), then it is the desired solution. □
Therefore, given an entropic functional and moments constraints , Equation (6) leads the the maximum entropy distribution . This distribution is parameterized by the s or, equivalently, by the moments s.
Note that the reciprocal is not necessarily true, i.e., the maximum entropy distribution does not necessarily satisfies Equation (6) (i.e., Equation (6) has not necessarily a solution), as shown, for instance, in [53]. However, the reciprocal is true (i.e., Equation (6) has a solution) when is a compact [95] or for any provided that is locally bounded on [96].
2.2. Maximum Entropy Principle: The Inverse Problems
As stated in the introduction, two inverse problems can be considered starting from a given distribution f. These problems were considered by Kesavan and Kapur in [50] in the discrete framework.
The first inverse problem consists in searching for the adequate moments so that a desired distribution f is the maximum entropy distribution of a given -entropy. This amounts to find functions and coefficients satisfying Equation (6). This is not always an easy task, and even not always possible. For instance, it is well known that given moment constraints, the maximum Shannon entropy distribution falls in the exponential family [33,34,52,54]. Therefore, if f does not belong to this family, the problem has no solution.
The second inverse problem consists in designing the entropy itself, given a target distribution f and given the s. In other words, given a distribution f, Equation (6) may allow to determine the entropic functional so that f is its maximizer. As mentioned in the introduction, solving this inverse problem can find applications, for instance, in goodness-of-fit tests. In such tests, we would like to determine if data fit a given distribution, say f. A natural criterion of fit between an empirical distribution and distribution f can be a Bregman divergence, where distribution f serves as a reference. As shown in the proof of Proposition 1, when both distributions (empirical, reference) share the same moments and when reference f is of maximum entropy subject to these moments, the divergence turns to be a difference of entropy and approaches in the line of [44,45,46,47,48,49] can be applied. Distribution f and some moments being given/fixed, the problem is thus to determine the adequate entropy so that f is of maximum entropy. This is precisely the inverse problem we deal with now.
As for the direct problem, in the second inverse problem, the solution is parameterized by the s. Here, required properties on will shape the domain the s live in. In particular, must satisfy:
the domain of definition of must include ; this will be satisfied by construction;
from the strict convexity property of , must be strictly increasing.
Therefore, because must be strictly increasing, it is clear that solving Equation (6) requires the following two conditions:
-
(C1)
and must have the same variations, i.e., is increasing (resp. decreasing, constant) where f is increasing (resp. decreasing, constant);
-
(C2)and must have the same level sets,
.
For instance, in the univariate case, for one moment constraint,
for , must be negative and must be decreasing,
for or , must be negative and must be even and unimodal.
Under conditions (C1) and (C2), the solutions of Equation (6) are given by
| (7) |
where can be multivalued. However, even if is multivalued, because of condition (C2), is defined univocally.
Equation (7) provides thus an effective way to solve the inverse problem. However, there exist situations where there does not exist any set of s such that conditions (C1)–(C2) are satisfied (e.g., with f not even). In such a case, we look for a solution for in a larger class, i.e., by extending the definition of the -entropy. This will be the purpose of Section 3. Before focusing on this, let us illustrate the previous result on some examples.
2.3. Second Inverse Maximum Entropy Problem: Some Examples
To illustrate the previous subsection, let us analyze briefly three examples: the famous Gaussian distribution (Example 1), the q-Gaussian distribution also intensively studied (Example 2), and the arcsine distribution (Example 3). The Gaussian, q-Gaussian, and arcsine distributions will serve as a guideline all along the paper. The details of the calculus, together with a deeper study related to the sequel of the paper, are presented in the appendix. Other examples are also given in this appendix. In both three examples, except in the next section, we consider the second-order moment constraint .
Example 1.
Let us consider the well-known Gaussian distribution , defined over , and let us search for the ϕ-entropy so that the Gaussian is its maximizer subject to the constraint . To satisfy condition (C1) we must have , whereas condition (C2) is always satisfied. Rapid calculations, detailed in Appendix A.1, and a reparameterization of the s, give the entropic functional
This is nothing but the Shannon entropy, up to the scaling factor α, and a shift (to avoid the divergence of the entropy when is unbounded, one will take ). One thus recovers the long outstanding fact that the Gaussian is the maximum Shannon entropy distribution with the second order moment constraint.
Example 2.
Let us consider the q-Gaussian distribution, also known as Tsallis distribution or Student distribution [97,98], , where and is the normalization coefficient, defined over when or over when , and let search for the ϕ-entropy so that the q-Gaussian is its maximizer with the constraint . Here, again, condition (C1) is satisfied if and only if , whereas condition (C2) is always satisfied. Rapid calculations detailed in Appendix A.2 lead to the entropic functional, after a reparameterization of the s, as,
where q is thus an additional parameter of the family. This entropy is nothing but the Havrda–Charvát or Daróczy or Tsallis entropy [12,14,17,97], up to the scaling factor α, and a shift (here also, to avoid the divergence of the entropy when is unbounded, one will take ). This entropy is also closely related to the Rényi entropy [10] via a one-to-one logarithmic mapping. One recovers the also well known fact that the q-Gaussian is the maximum Havrda–Charvát–Rényi–Tsallis entropy distribution with the second order moment constraint [97]. In the limit case , the distribution tends to the Gaussian, whereas the Havrda–Charvát–Rényi–Tsallis entropy tends to the Shannon entropy.
Example 3.
Consider the arcsine distribution, where , defined over and let us determine the entropic functionals ϕ so that is the maximum ϕ-entropy distribution subject to the constraint . Condition (C2) is always satisfied and now, to fulfill condition (C1) we must impose . Some algebra detailed in Appendix A.4.1 leads to the entropic functional, after a reparameterization of the s,
(again, to avoid the divergence of the entropy, one can adjust parameter γ). This entropy is unusual and, due to its form, is potentially finite only for densities defined over a bounded support and that are divergent in its boundary (integrable divergence).
3. State-Dependent Entropic Functionals and Minimization Revisited
In order to follow asymmetries of the distribution f and address the limitation raised by conditions (C1) and (C2), we propose to allow the entropic functional to also depend on the state variable x. Indeed, imagine, for instance, that, for two values , the probability distribution is such that , but, at the same time, (for any set of s). In such a situation, one cannot find a function so as to satisfy condition (C2). Choosing a functional depending both on and x can allow to have so that we expect it could compensate for the fact that, with a usual entropic functional, condition (C2) cannot be satisfied. In the same vein, imposing a particular form for , we also expect to be able to treat the case where condition (C1) cannot be satisfied with a usual entropic functional. Let us first define the hence extended state-dependent -entropy, before demonstrating that such a extension allows indeed to reach our goal.
Definition 3
(State-dependent -entropy). Let such that for any , function is a convex function on the closed convex set . Then, if f is a probability distribution defined with respect to a general measure μ on set and such that ,
(8) will be called state-dependent ϕ-entropy of f. As is convex, then the entropy functional is concave. A particular case arises when, for a given partition of , functional ϕ writes
(9) where denotes the indicator of set A. This functional can be viewed as a “-extension” over of a multiform function defined on , with k branches and the associated ϕ-entropy will be called -multiform ϕ-entropy.
As in the previous section, we restrict our study to functionals strictly convex and differentiable with respect to y.
Following the lines of Section 2, a generalized Bregman divergence can be associated to under the form , and a generalized functional Bregman divergence .
With these extended quantities, the direct problem becomes
| (10) |
Although the entropic functional is now state-dependent, the approach adopted before can be applied here, leading to
Proposition 2
(Maximum state-dependent -entropy solution). Suppose that there exists a probability distribution f satisfying
(11) for some , then f is the unique solution of the extended maximum entropy problem (10).
If ϕ is chosen in the -multiform ϕ-entropy class, this sufficient condition writes
(12)
Proof.
The proof follows the steps of Proposition 1, using the generalized functional Bregman divergence instead of the usual one. □
Resolving Equation (11) is not possible in all generality. However, the sufficient condition (12) can be rewritten as
| (13) |
Therefore, if there exists (at least) a set of s such that condition (C1) is satisfied (but not necessarily (C2)), one can always
design a partition so that (C2) is satisfied in each (at least, such that f is either strictly monotonic, or constant, on ) and
- determine as in Equation (7) in each , that is
where is the (possibly multivalued) inverse of f on . By the way, when is such that is monotonic on it ensures that is univalued.(14)
In short, in the case where only condition (C1) is satisfied, one can obtain an extended entropic functional of -multiform class so that Equation (13) provides an effective way to solve the inverse problem in the state-dependent entropic functional context. This is given by Equation (14).
Note, however, that it still may happen that there is no set of s allowing to satisfy (C1). In this harder context, the problem remains solvable when the moments are defined as partial moments like , and and when there exists on a set of s such that (C1) and (C2) hold. The solution still writes as in Equation (14), but where now n, the s and the s are replaced by , the s and s, respectively,
| (15) |
Let us now come back to the arcsine example , defined over (Example 3) of the previous section, when now we constraint the first order moment or partial first order moments.
Example 4.
Let us now consider this arcsine distribution, constrained uniformly by . Clearly, neither condition (C1) nor condition (C2) can be satisfied. Note that the arcsine distribution is a one-to-one function on each set and that partitions . Therefore, considering multiform entropic functionals with this partition allows to overcome the issue on condition (C2), but that on condition (C1) remains. If we ignore this issue and apply Equation (14), after a reparameterization of the s, we obtain with where s is thus an additional parameter of the family. It appears that whereas these functionals are defined for , one can extend them continuously and with a continuous derivative for any imposing , which finally leads to the family
However, the functional are no more convex (see Appendix A.4.2 for more details).
Example 5.
If now we impose the partial constraint , and search for the ϕ-entropy so that is the maximizer subject to these constraints, condition (C1) can be now satisfied on each by imposing the given Equation (15) to be positive. We then obtain the associated multiform entropic functional, after a reparameterization of the s, as with with and where s is thus an additional parameter of the family. In this case, the entropic functionals can be considered for any by imposing , and one can check that the obtained functions are of class . This finally leads to the family
In addition, remarkably, the entropic functional can be made univalued by choosing and . In fact, such a choice is equivalent to considering the constraint which respects the symmetries of the distribution and allows to recover a classical ϕ-entropy (see Appendix A.4.2 for more details).
At a first glance, the solutions of Examples 4 and 5 seem to be identical. In fact, they drastically differ. Indeed, let us emphasize that the problem has one constraint in the first case, but two in the second case. The consequence is that four parameters parameterize the first solution and , while five parameters and parameterize the second solution. This difference is not insignificant: the first case cannot be viewed as a special case of the second one, because must be positive, which cannot be possible with only parameter as rule the . For the first example, the solution does not lead to a convex function, because this would contradict the required condition (C1) on the parts . Coming back to the direct problem, the “-like-entropy” defined with is no more concave (indeed, it is no more an entropy in the sense of Definition 1). As such, the maximum -entropy problem is no more concave: one cannot guarantee the uniqueness and even the existence of a maximum so that there is no guarantee that the arcsine distribution would be a maximizer. Indeed, Equation (6) coming from the Euler-Lagrange equation (see paragraph previous to Proposition 1), one can just conclude that the arcsine is a critical point (either extremal, or inflection point) of the identified -like-entropy.
In Section 2 and Section 3, we established general entropies with a given maximizer. In what follows, we will complete the information theoretical setting by introducing generalized escort distributions, generalized moments, and generalized Fisher information associated to the same entropic functional. We will then explore some of their relationships. Indeed, as mentioned in the introduction, the Cramér–Rao inequality is very important as it gives the ultimate precision in terms of mean square error of an estimator of a parameter. Aswe would like to escape from the usual quadratic loss (that has often mathematical motivation but not physical one, and that even can not exist) and/or to stress parts of the distribution of the data so has to penalize for instance large errors depending of the tails of the distribution, it is thus of high interest to be able to derive Cramér–Rao inequalities in a broader framework, which can find natural applications in the estimation domain.
4. -Escort Distribution, -Moments, -Fisher Information, Generalized Cramér–Rao Inequalities
In this section, we begin by introducing the above-mentioned informational quantities. We will then show that generalizations of the celebrated Cramér–Rao inequalities hold and link the generalized moments and Fisher information. Furthermore, the lower bound of the inequalities are saturated precisely by maximal -entropy distributions. To derive such generalizations of this inequality, we thus need to precisely define the above mentioned generalization of the moments and of the Fisher information that will lower bound the moment (e.g., of any estimator of a parameter). The proposed generalizations are based on the notion of escort distribution we first need to introduce.
Escort distributions have been introduced as an operational tool in the context of multifractals [99,100], with interesting connections with the standard thermodynamics [101] and with source coding [26,27]. In our context, we also define (generalized) escort distributions associated with a particular convex function , and show how they pop up naturally. It is then possible to define generalized moments with respect to these escort distributions. Such distributions were previously introduced dealing with Rényi entropies and took the form as we will see later on. When , the effect is to stress the head of the distribution, i.e., to penalize more the errors where the data fall in the head of the distribution. At the opposite, when , the tails of the distributions are stressed. As we will see later on in the proof of the generalized Cramér–Rao inequality, any form as an escort distribution can be chosen. However, as for the usual nonparametric Cramér–Rao inequality, one may wish the inequality to be saturated for the maximum entropy distribution, which fixes the form of the escort distribution as follows.
Definition 4
(-escort). Let such that for any function is a strictly convex twice differentiable function defined on the closed convex set . Then, if f is a probability distribution defined with respect to a general measure μ on a set such that , and such that
(16) we define by
(17) the ϕ-escort density with respect to measure μ, associated to density f.
Note that from the strict convexity of with respect to its second argument, this probability density is well defined and is strictly positive. We can note that, with the above definition, the -escort distribution will tend to stress the parts of the distribution where has a small “curvature.” Moreover, coming back to the previous examples, one can see the following.
Example 1 (cont.).
In the context of the Shannon entropy, entropy for which the Gaussian is the maximal entropy law for the second order moment constraint, , the ϕ-escort density associated to f restricts to density f itself.
Example 2 (cont.).
In the Rényi–Tsallis context, entropy for which the q-Gaussian is the maximal entropy law for the second-order moment constraint , and which recovers the escort distributions used in the Rényi–Tsallis context up to a duality transformation [101].
Example 3 (cont.).
For the entropy that is maximal for the arcsine distribution under the second order moment constraint, , and which is nothing more than an escort distributions used in the Rényi–Tsallis context. Indeed, although the arcsine distribution does not fall in the q-Gaussian family, its form is very similar to a q-Gaussian distribution (with ) where the “scaling” parameter would not be related to the exponent q. It is thus not surprising to recover an escort distribution associated to this family.
Definition 5
(-moments). Under the assumptions of Definition 4, with equipped with a norm , we define the -moment of a random variable X associated to distribution f by
(18) if this quantity exists.
This definition goes further than the usual definition of variance as a measure of dispersion, both by generalizing the exponent, the norm, and by taking the mean with respect to an escort distribution. Thanks to the escort distribution, one can stress special parts of the distribution (heads, tails, parts where has a small curvature that is with a small informational content in a sense). Here, again, any escort distribution could have been chosen, but, as pointed out previously, that of the definition allows to saturate the Cramér–Rao inequality we will derive in a while for the maximum entropy distribution. Note that, in the particular case of the Euclidean norm and , the second-order moment statistics are indeed contained in the second-order moments matrix given by the mathematical mean of . In such a context, the definition above coincides with the trace of this second order moment matrix and represents the total power of X.
This said, for our three examples, we have the following.
Example 1 (cont.).
In the context of the Shannon entropy, the -moments are the usual moments of .
Example 2 (cont.).
In the Rényi–Tsallis context the generalized moments introduced in [61,102] are recovered.
Example 3 (cont.).
For , one also naturally finds generalized moments with the same form as those introduced in [61,102] (see the items related to the escort distributions).
The Fisher information’s importance is well known in estimation theory: the estimation error of a parameter is bounded by the inverse of the Fisher information associated with this distribution [34,66]. The Fisher information is also used as a method of inference and understanding in statistical physics and biology, as promoted by Frieden [67] and has been generalized in the Rényi–Tsallis context in a series of papers [81,84,86,87,88,89,103,104]. In the following, we generalize these definitions a step further in our -entropy context.
Definition 6
(Nonparametric -Fisher information). With the same assumption as in Definition 4, denoting by the dual norm (the norm induced in the dual space that gives here [105,106]), for any differentiable density f, we define the quantity
(19) if this quantity exists, as the nonparametric -Fisher information of f.
Note that the Fisher information can be viewed as local, as it is sensitive to the variation of a distribution, rather than to the distribution itself. As for the generalized moments, through the power other moments for the gradient of f than the second one can be considered, so that more or less weight can be put in the variations of the distribution. Moreover, as for the case of generalized moments, any escort distribution could have been chosen, but, again this choice is dictated by our wish to saturate the Cramér–Rao inequality for the maximum entropy distribution.
Note also that when is state-independent, , as for the usual Fisher information, this quantity is shift-invariant, i.e., for one has . This property is unfortunately lost in the state-dependent context. Furthermore, whereas the Fisher information have scaling properties , this is lost for , except when is a power (which corresponds either to the Shannon or Rényi–Tsallis entropy).
Definition 7
(Parametric -Fisher information). Let us consider the same assumptions as in Definition 4, and a density f parameterized by where set Θ is equipped with a norm and with the corresponding dual norm denoted . Assume that f is differentiable with respect to θ. We define by
(20) as the parametric -Fisher information of f.
Note that, as for the usual Fisher information, when the norms on and on are the same, the nonparametric and parametric information coincide when is a location parameter.
Note that in the classical setting, the information on X in the sense of Fisher is given by the so-called Fisher information matrix, which is the mathematical mean of . Taking the trace of the Fisher information matrix, one obtains what is often called Fisher information (without the term “matrix”), which is nothing but the expectation of [58,67,107]. This is in the line of the above definitions. Extending these definitions to obtain a matrix would have been possible by averaging over the -escort distribution the element-wise power of matrix , but the trace of this matrix does not coincide anymore with the above definition. Moreover, it is not obvious that it will allow a generalization of the matrix form of the Cramér–Rao inequality we will see in the following. Such a matrix extended Fisher information is left as a perspective.
For our three examples, we have the following.
Example 1 (cont.).
In the Shannon entropy context, when the norm is the Euclidean norm and , the nonparametric and parametric information -Fisher give the usual nonparametric and parametric Fisher information, respectively.
Example 2 (cont.).
Similarly, in the Rényi–Tsallis context, the generalizations proposed in [87,88,89] are recovered.
Example 3 (cont.).
For , one also naturally finds, the generalizations proposed in [87,88,89] (see the items related to the escort distributions).
We have now the quantities that allow to generalize the Cramér–Rao inequalities as follows.
Proposition 3
(Nonparametric -Cramér–Rao inequality). Assume that a differentiable probability density function with respect to a measure μ, defined on a domain , admits an -moment and an -Fisher information with and its Hölder-conjugated, , and that vanishes at the boundary of . Thus, density f satisfies the extended Cramér–Rao inequality
(21) When ϕ is state-independent, , the equality occurs when f is the maximal ϕ entropy distribution subject to the moment constraint .
Proof.
The approach follows [89], starting from the differentiable probability density f (derivative denoted ), as vanishes in the boundaries of X from the divergence theorem one has
Now, for the first term, we use the facts that and that f is a density to achieve
for any function g non-zero on . Now, noting that , we obtain from the work in [89] (Lemma 2)
The proof ends by choosing the -escort density associated to density f. Note now that, again from [89] (Lemma 2), the equality is obtained when
where is a negative constant. Consider now the case where is state-independent. Thus, , that gives
This last equation has precisely the form Equation (6) of Proposition 1. □
Analyzing minutely the proof, it is clear that both in the generalized moments and the generalized Fisher information, any escort distribution g can be chosen (being identical for both quantities), including the probability distribution itself. The saturation will be achieved for the distribution f satisfying , but the -escort distribution Definition 4 is the only choice which allows to recover maximal -entropy as the saturating distribution; of course with the same as in the escort distribution, and with the moment constraint similar to that of the inequality but averaged over the distribution itself.
An obvious consequence of the proposition is that the probability density that minimizes the -Fisher information subject to the moment constraint coincides with the maximal -entropy distribution subject to the same moment constraint.
In the problem of estimation, the purpose is to determine a function in order to estimate an unknown parameter . In such a context, the Cramér–Rao inequality allows to lower bound the variance of the estimator thanks to the parametric Fisher information. The idea is thus to extend this to bound any order mean error using our generalized Fisher information.
Proposition 4
(Parametric -Cramér–Rao inequality). Let f be a probability density function with respect to a general measure μ defined over a set , where f is parameterized by a parameter , and satisfies the conditions of Definition 7. Assume that both μ and do not depend on θ, that f is a jointly measurable function of x and θ which is integrable with respect to x and absolutely continuous with respect to θ, and that the derivatives of f with respect to each component of θ are locally integrable. Thus, for any estimator of θ that does not depend on θ, we have
(22) where
(23) is the bias of the estimator and α and are Hölder conjugated. When ϕ is state-independent, , the equality occurs when f is the maximal ϕ entropy distribution subject to the moment constraint .
Proof.
The proof follows again that of [89], and starts by evaluating the divergence of the bias. The regularity conditions in the statement of the theorem enable to interchange integration with respect to x and differentiation with respect to , so that
Note then that and that being independent on , one has . Thus, f being a probability density, the equality becomes
for any density g non-zero on . The proof ends with the very same steps that in Proposition 4 using [89] (Lemma 2). □
In the classical setting, in the multivariate context (), the Cramér–Rao inequality takes a matrix form, stating that the difference of the second order moment matrix of the estimation error of an estimator with the inverse Fisher information matrix is positive definite [34,58,66,67,108,109]. Several scalar forms can be derived, for instance by taking the determinant, the trace, and/or by mean of trace [58,66,67,108] or determinant/trace inequalities [110]. Typically, by mean of the trace, the scalar equivalent of the above results are recovered. Conversely, extending our result in a matrix context is not immediate and left as a perspective.
For our three examples, Propositions 3 and 4 lead to what follows.
Example 1 (cont.).
The usual parametric and nonparametric Cramér–Rao inequality are recovered in the usual Shannon context , using the Euclidean norm and . The bound in the nonparametric context is saturated for the maximal entropy law, namely, the Gaussian.
Example 2 (cont.).
In the Rényi–Tsallis context, the generalizations proposed in [87,88,89] are recovered and, again, when , the bound is saturated in the nonparametric context for the q-Gaussian, maximal entropy law under the second order moment constraint.
Example 3 (cont.).
For , again, one finds inequalities with the same form as those of the generalizations proposed in [87,88,89] (see the items related to the escort distributions).
Beyond the mathematical aspect of these relations, they may have great interest to assess an estimator when the usual variance/mean square error does not exist. Moreover, the escort distribution is also a way to emphasize some part of a distribution. For instance, in the Rényi–Tsallis context, one can see that in either the tails or the head of the distribution are emphasized. Playing with q is a way to penalize either the tails, or the head of the distribution in the estimation process.
5. -Heat Equation and Extended de Bruijn Identity
An important relation connecting the Shannon entropy H, coming from the “information world”, with the Fisher information I, living in the “estimation world”, is given by the de Bruijn identity and it is closely linked to the Gaussian distribution. Considering a noisy random variable where N is a zero-mean d-dimensional standard Gaussian random vector and X a d-dimensional random vector independent of N, and of support independent on parameter , then
where stands for the probability distribution of . This identity is a critical ingredient in proving the entropy power and Stam inequalities [34]. The de Bruijn identity has applications in communication by characterizing a channel face to noise [34,76,111,112] or in mismatch estimation [113]. It is involved in the Entropy Power Inequality, which itself is involved in an informational proof of the central limit theorem [114,115,116]. Extending the de Bruijn identity is thus of great interest as, for instance, it may allow to characterize more general communication channels in the same line than that in [117] or for non-additive noise or to give rise to generalized central limit theorem [115,116].
The starting point to establish the de Bruijn identity is the heat equation satisfied by the probability distribution , , where stands for the Laplacian operator [118].
Let us consider probability distributions f parameterized by a parameter , satisfying what we will call generalized ϕ-heat equation,
| (24) |
for some , possibly dependent on but not on x, and where is a convex twice differentiable function defined over a set .
When is scalar, this equation is an instance of what are known as quasilinear parabolic equations [119] (§ 8.8) and arises in various physical problems.
Proposition 5
(Extended de Bruijn identity). Let f be a probability distribution with respect to a measure μ. Suppose that f is parameterized by a parameter , and is defined over a set . Assume that both and μ do not depend on θ, and that f satisfies the nonlinear ϕ-heat equation Equation (24) for a twice differentiable convex function ϕ. Assume that is absolutely integrable and locally integrable with respect to θ, and that the function vanishes at the boundary of . Thus, distribution f satisfies the extended de Bruijn identity, relating the ϕ-entropy of f and its nonparametric -Fisher information as follows,
(25) with is the normalization constant given Equation (16).
Proof.
From the definition of the -entropy, the smoothness of the assumption enables to use the Leibnitz’ rule and differentiate under the integral,
where the second line comes from the -heat equation and where the third line comes from the product derivation rule.
Now, from the divergence theorem, the first term of the right hand side reduces to the integral of on the boundary of , that vanishes from the assumption of the proposition, while the second term of the right hand side gives the right hand side of (25) from and the -Fisher information given by Equations (16) and (17) and Definition 6. □
As for the Cramér–Rao inequality, in the classical settings there exist matrix variants of the de Bruijn identity, the scalar form being a special one [115,117].
Coming back to the special examples we presented all along the paper:
Example 1 (cont.).
In the Shannon entropy context, for and , the standard heat equation is recovered and the usual de Bruijn identity is recovered.
Example 2 (cont.).
The case where was intensively studied in [90] and the results of the paper are naturally recovered. In particular, the generalized ϕ-heat equation appears in anomalous diffusion in porous medium [90,119,120,121,122].
Example 3 (cont.).
For , once again one finds the same form for the generalized heat equation than in [90,120,121], and therefore the same form of the generalized de Bruijn identity of [90] (see the items related to the escort distributions).
6. Concluding Remarks
In this paper, we extended as far as possible the identities and inequalities which link the classical informational quantities—Shannon entropy, Fisher information, moments, etc., in the framework of the -entropies. Our first result concerns the inverse maximum entropy problem, starting with a probability distribution and constraints and searching for which entropy the distribution is the maximizer. If such a study was already tackled, it is extended here in a much more general context. We used general reference measures—not necessarily discrete or of Lebesgue. We also considered the case where the distribution and constraints do not share the same symmetries, which leads to state-dependent entropic functionals. Our second result is the generalization of the Cramér–Rao inequality in the same setting: to this end, a generalized Fisher information and generalized moments are introduced, both based on a convex function (and a so-called -escort distribution). The Cramér–Rao inequality is saturated precisely for the maximum -entropy distribution with the same moment constraints, linking all information quantities together. Finally, our third result is the statement of a generalized de Bruijn identity, linking the -entropy rate and the -Fisher information of a distribution satisfying an extended heat equation, called -heat equation.
As a direct perspective, the extensions of the generalized moments and Fisher information in terms of matrix, and matrix form of the generalized Cramér–Rao inequalities and de Bruijn identities are still open problems. Several ways to define matrix moments and Fisher information may be considered, such as in a term-wise manner as evoked in this paper. However, deriving matrix forms of the inequalities and identities does not seem trivial, and neither does obtaining the scalar form, for instance, through trace operator. Moreover, as the de Bruijn identity can be closely related to the generalized Price’s theorem [123,124,125], studying the connections between the extended de Bruijn and this theorem, or generalizing following the work of [125] is also of great interest.
Furthermore, two important inequalities are still lacking: The first one is the entropy power inequality (EPI), which states that the entropy power (exponential of twice the entropy) of the sum of two continuous independent random variables is higher than the sum of the individual entropy powers (In fact, there exist other equivalent versions which can be found, e.g., in [34,75,107,126,127,128].). The second one is the Stam inequality which lower bounds the product of the entropy power and the Fisher information. For the former, despite many efforts, the literature on extended version only considers special cases. For instance, some extensions in the classical settings exist for discrete variables but are somewhat limited [129,130,131]. In the continuous framework, the EPI was also extended to the class of the Rényi entropy (log of a -entropy with ) [132]. Note that variants of the EPI also exist in the context where one of the variables is Gaussian. This is equivalent to the convexity property of with N the entropy power and Y a Gaussian noise independent on X [133,134,135,136,137]; property also extended in the context of the Rényi entropy [132,138,139,140]. An important property that plays a key role in the inequality is the fact that the Rényi entropy is invariant to an affine transform of unit determinant and monotonic under convolution, a property which seems lost in the very general setting considered here. This fact leaves little room to extend the EPI in our general settings. Concerning the Stam inequality, at a first glance, the fact that the proof is based on the EPI seems to close any hope to extend it to the -entropy framework. However, it was remarkably extended to the Rényi entropy, based on the Gagliardo–Nirenberg inequality [84,86,87,141]. Nevertheless, a key property is that both the entropy power and the extended Fisher information have scaling properties that are lost in the general setting of the -entropies. A possible way to overcome the (apparent) limits just evoked could be to mimic alternative proofs such as those based on optimal transport [142]. This approach precisely drops off any use of Young or Sobolev-like inequalities. As far as we see, there is thus a little room for extensions in the settings of the paper. Both the extension of the EPI and the Stam inequality are left as perspectives.
Another perspective lies in the estimation of the generalized moments from data (or from estimates). Such a possibility would confer an operational role to our Cramér–Rao inequality, i.e., by computing the estimator’s generalized moments and comparing them to the bound. A difficulty resides in the presence of the -escort distribution which forbids empirical or Monte Carlo approaches. The escort distribution needs to be estimated. This problem seems not far from the estimation of entropies from data and plug-in approaches used in such problems can thus be considered, like kernel approaches [143,144,145], nearest neighbor approaches [145,146], or minimal spanning tree approaches [42]. Of course, this perspective goes far beyond the scope of this paper.
Acknowledgments
The authors wish to warmly thank the three reviewers who gave a careful reading of this manuscript. Their very valuable remarks and suggestions led to the improvement of the manuscript and opened various perspectives.
Appendix A. Inverse Maximum Entropy Problem and Associated Inequalities: Some Examples
In this appendix, we will now derive in details several examples of the maximum entropy inverse problem. In each case, we provide the quantities and inequalities associated with the entropic functional , as derived in the text. In the sequel, for sake of simplicity, we restrict our examples to the univariate context .
Appendix A.1. Normal Distribution and Second-Order Moment
For a normal distribution and second-order moment constraint
we begin by computing the inverse of , which yields . Note that is multivalued, but is univalued. Injecting the expression of into Equation (7) we obtain
where the requirement is necessary to satisfy condition (C1), condition (C2) being already satisfied because and share the same symmetries. This gives, after a reparameterization of the s,
The judicious choice leads to function
which gives nothing but the Shannon entropy, as expected,
where is now the support of f (overall, the obtained family of entropy is the Shannon one up to a scaling and a shift).
Now, leads to the escort distribution Definition 4 as so that, as expected, the moments Definition 5 are the usual moments of order . When and the usual Euclidean norm is considered, the -Fisher information Definitions 6 and 7 are the usual Fisher information and the usual Cramér–Rao inequalities Propositions 3 and 4 are recovered for . Finally, for , the usual Euclidean norm, the -heat equation Equation (24) turns to be the heat equation, satisfied by the Gaussian, so that the usual de Bruijn identity is naturally recovered from Proposition 5.
Appendix A.2. q-Gaussian Distribution and Second-Order Moment
For q-Gaussian distribution, also known as Tsallis distribution, Student-t, and Student-r [97,98], and a second-order moment constraint, we have
where , and is a normalization coefficient. The support of is when and when .
The inverse of gives . Note that, again, is multivalued, but is univalued. Injecting the expression of into Equation (7) we get
where the requirement is necessary to satisfy condition (C1), condition (C2) being satisfied since and share the same symmetries. This gives, after a reparameterization of the s,
Note that the inverse of is defined over but, without contradiction, the domain of definition of the entropic functional can be extended to .
Then, a judicious choice of parameters is that yields
and an associated entropy is then
where is now the support of f. This entropy is nothing but the Havrda–Charvát–Tsallis entropy [12,14,17,97] (overall, we obtain this entropy up to a scaling and a shift).
Then, , so that, from Definition 4, and then from Definitions 5–7, respectively, we obtain and as, respectively, the q-moment of order and the -Fisher information defined previously in [84,85,86,87,88,89] (with the symmetric q index given here by ). The extended Cramér–Rao inequality proved in [84,88,89] is then recovered from Propositions 3 and 4, and the generalized de Bruijn identity of [90] is also recovered from Equation (24) and Proposition 5.
Note that when :, tends to the Gaussian distribution. It appears that tends to the Shannon entropy, to the usual Fisher information and to the usual moments (both considering the Euclidean norm): all the settings related to the Gaussian distribution are naturally recovered.
Appendix A.3. q-Exponential Distribution and First-Order Moment
The same entropy functional can readily be obtained for the so-called q-exponential and a first-order moment constraint:
where is a normalization coefficient. It suffices to follow the very same steps as above, leading again to the Havrda–Charvát–Tsallis entropy, the q-moments of order and the -Fisher information defined previously in [84,85,86,87,88,89] (with the symmetric q index given here by ) as for the q-Gaussian distribution and to the extended Cramér–Rao inequality proved in [88,89] as well.
Now when :, tends to the exponential distribution, known to be of maximum Shannon entropy on under the first order moment constraint [34]. Again tends to the Shannon entropy, to the usual Fisher information and to the usual moments (both considering the Euclidean norm): all the settings related to the exponential distribution are naturally recovered.
Appendix A.4. The Arcsine Distribution
The arcsine distribution is a special case of the beta distribution with shaping parameter and appears in various application, e.g., see in [98]. We consider here the centered and scaled version of this distribution which writes
where . The inverse distributions on and are
Let us now consider again either a second order moment as the constraint, or (partial) first-order moment(s).
Appendix A.4.1. Second-Order Moment
When the second-order moment is constrained, condition (C2) is satisfied, so that, injecting the expression of into Equation (7) one immediately obtains
where the requirement is necessary to satisfy condition (C1). After a reparameterization of the s, the family of entropy functionals is then
Although the inverse of the arcsine distribution does no exist for , the entropy functional can be defined over .
Note that this entropy can be viewed as Havrda–Charvát–Tsallis entropy for , so that all the generalizations (escort, moments, Cramér–Rao inequality, de Bruijn identity) set out in Appendix A.2 are recovered in the limit .
Appendix A.4.2. (Partial) First-Order Moment(s)
As the distribution has not the same variation as , condition (C1) cannot be satisfied. Therefore, either we turn out to consider the arcsine distribution as a critical point (extremal, inflection point) of a non-concave “entropy”, or as a maximum entropy when constraints are of the type
Now, dealing, respectively, with the partial-moment constraints and with the uniform constraint , we obtain from Equations (14) and (15), respectively,
where the sign is absorbed in the factors in the first case. Dealing with the partial moments, one must impose
to satisfy condition (C1). At the opposite, condition (C1) cannot be satisfied for the second case (one would have to impose on ). After a reparameterization of the s, one obtains the branches of the entropic functional under the form with and with , and the branches for the non-convex case with .
In this case, s appears as an additional parameter of this family of the -entropy.
In both cases, the entropic functionals are defined for because of the domain where is invertible. However, in the first case, one can extend the domain to , ensuring both the continuity of the entropic functional and its derivative at (and thus everywhere), by vanishing the derivative of the entropic functional at , which imposes . This is also possible for the functionals setting condition . This leads, respectively, to
and the branches for the non-convex case
Remarkably, in the first case, an univalued entropic functional can be obtain imposing both . Looking more attentively to this choice, one observe that it corresponds to the one obtained for the moment constraint , which have the same symmetries as .
The uniform function is represented Figure A1 for .
Figure A1.
Univalued entropic functional derived from the arcsine distribution with partial constraints .
Appendix A.5. The Logistic Distribution
In this case, one can write the distribution under the form
This distribution, which resembles the normal distribution but has heavier tails, has been used in various applications, e.g., see in [98]. One can then check that over each interval
the inverse distribution writes
Let us now focus on a second-order constraint, that respects the symmetry of the distribution, and on first-order constraint(s) that do(es) not respect the symmetry.
Appendix A.5.1. Second Order Moment Constraint
In this case, injecting the expression of into Equation (7), we immediately obtain
where is required to satisfy condition (C1). After a reparameterization, we thus obtain the family of entropy functionals with with .
Here, again, s is an additional parameter for this family of -entropies.
The entropy functional is defined for due to the domain is invertible. To evaluate the -entropy for a given distribution f, one can play with parameter s so as to restrict, if possible, to be on . However, one can also extend the functional to while remaining of class by vanishing the derivative at . This imposes and leads to the entropy functional
depicted Figure A2a for .
Appendix A.5.2. (Partial) First-Order Moment(s) Constraint(s)
As and do no share the same symmetries, one cannot interpret the logistic distribution as a maximum entropy constraint by the first order moment. However, constraining the partial means over and using multiform entropies allow such an interpretation, while the alternative is to relax the concavity property of the entropy—but again, one would only be able to ensure that the distribution from which it comes is a critical point. To be more precise, one chooses
We thus obtain from Equations (14) and (15) respectively, over each set , the branches
where the sign is absorbed on for the first case. Dealing with the partial moments, to satisfy condition (C1) one must impose
At the opposite, condition (C1) cannot be satisfied for the second case (one would have to impose on ). After a reparameterization of the s, one obtains the branches of the entropic functional under the form with where and the branches for the non-convex case with .
Once again, s appears as an additional parameter for these families of entropies.
In both cases, even if the inverse of restricts u to be lower than 1, one can either play with parameter s to allow to compute the -entropy of any distribution f, or to extend the entropic functionals to by vanishing the derivative at . This imposes and thus the entropic functional,
and the branches for the non-convex case
Remarkably, in the first case, an univalued entropic functional can be obtained by imposing both . Here also, such a choice is equivalent to considering the constraint , which allows to respect the symmetries of the distribution and to recover a classical -entropy.
The uniform function is represented Figure A2b for .
Figure A2.
Entropy functional derived from the logistic distribution: (a) with and (b) with .
Appendix A.6. The Gamma Distribution and (Partial) P-Order Moment(s)
As a very special case, let us finally consider the gamma distribution expressed as
Parameter is known as the shape parameter of the law, while is a scaling parameter. This distribution appears in various applications, as described, for instance, in [147].
Let us focus on the case for which the distribution is non-monotonous, unimodal, where the mode is located at , and where .
Here, again, it cannot be a maximizer of a -entropy subject to a moment of order constraint as and do not share the same symmetries. Therefore, we shall again consider partial moments as constraints,
or interpret as a critical point of a -like entropy with a constraint on the moment
Inverting leads to the equation
As expected, this equation has two solutions. These solutions can be expressed thanks to the multivalued Lambert-W function W defined by , i.e., W is the inverse function of [148] (§ 1), leading to the inverse functions
where k denotes the branch of the Lambert-W function. gives the principal branch and is related here to the entropy part on , while gives the secondary branch, related to .
Applying (15) to obtain the branches of the functionals of the multiform entropy, one has thus to integrate
where, to ensure the convexity of the ,
The same approach allows to design , with a unique instead of the s and without restriction on .
First, let us reparameterize the s so as to absorb the factor into so that one can write formally
Obtaining a closed-form expression for the integral term is not an easy task. However, relation [148] (Equation (3.2)) suggests that a way to make the integration is to search for it under the form of a series
Therefore, to obtain a recursion on the , we proceed as follows: (i) we differentiate both side, (ii) we use the relation given above applied to , (iii) we thus multiply both side of the obtained equality by , and finally (iv) we equal the coefficients of the terms in . The can thus be evaluated explicitly, and we recognize in the series the confluent hypergeometric (or Kummer’s) function [149] (Equation (13.1.2)) or [150] (Equation (9.210-1)) (up to a factor and an additive constant), so that
One can check that these functions are indeed the ones we search for. To this end, (i) one derives the previous expression, (ii) one notes that from [148] (Equation (3.2)) we have , (iii) one finally uses the relation [149] (13.4.11) together with [149] (13.1.2).
Again, are additional parameters for this family of entropies.
Then, from the domain of definition of the inverse of , u is restricted to , which can be compensated for by playing with parameter r (remind that ). At the opposite, noting that , to extend the entropic functionals to functions on , one would have to impose to vanish the derivatives at . This is impossible because from , one cannot impose . Moreover, even a convex extension relaxing the condition is impossible since we would have to impose to insure the increase of both the s on .
We can however choose the coefficients so as to impose special conditions at the boundary(ies) of the domain of definition. As an example, we may wish to vanish the at (e.g., to ensure the convergence of the integral of , unbounded). To this end, one can evaluate the values of the in the boundaries of the domain.
From [148] (Equation (3.1)), we have and from [149] (Equation (13.1.2)) , so that
Then, (see [148] (Figure 1 or Equation (4.18))) so that, (i) from the asymptotic [149] (Equation (13.1.4)) of the confluent hypergeometric function for a large argument, (ii) using for , we obtain
Finally, from we immediately have
and
Interestingly, at , the Gamma distribution reduces to the exponential law. It is well known that it is a maximum Shannon entropy distribution [34] subject to the first order moment constraint. From the results above, one can notice that when one has
Therefore, in accordance
The constraints degenerate to a single uniform constraint ;
In this limit, conditions (C1) and (C2) are both satisfied.
The entropic functional becomes state-independent (uniform), where only the branch remains.
One can determine the limit entropic functional using [151] (Th. 3.2) that states for any ,
We apply this theorem to the positive real t given by
(see domain where u lives), which thus gives, from ,
As a consequence, the left hand side tends uniformly to 0 when and one can see that goes also uniformly to 0 as , which allows to obtain
As a conclusion, from the continuity of w.r.t. both its parameters and its variable, we have
but the domain can be expanded to .
Finally, for , using [149] (13.6.14) which states that , we obtain after simple algebra
which is nothing but than the Shannon entropic functional, as expected.
In passing, because is bounded on the considered domain, one has immediately
but remember that, at the limit, this entropic branch disappears from the multiform entropy (i.e., the entropy becomes uniform).
The behavior of the multivalued function is represented Figure A3 for , respectively, and with the choices . In (a), so as to emphasize the behavior of the nonlinear term, we represent . In (b) is depicted which, with the chosen parameters, tends to (Shannon entropic functional) when , together with this limit.
Figure A3.
Multiform entropy functional derived from the gamma distribution with the partial moment constraints (), for . (a): (); (b): with , , and Shannon entropic functional (thin line).
Author Contributions
The authors contributed equally to this work. Both authors have read and agreed to the published version of the manuscript.
Funding
This research was partially funded by a grant from the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Footnotes
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Von Neumann J. Thermodynamik quantenmechanischer Gesamtheiten. Nachr. Ges. Wiss. Gött. 1927;1:273–291. [Google Scholar]
- 2.Shannon C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948;27:623–656. doi: 10.1002/j.1538-7305.1948.tb00917.x. [DOI] [Google Scholar]
- 3.Boltzmann L. Lectures on Gas Theory (Translated by S. G. Brush) Dover; Leipzig, Germany: 1964. [Google Scholar]
- 4.Boltzmann L. Vorlesungen Über Gastheorie—I. Verlag von Johann Ambrosius Barth; Leipzig, Germany: 1896. [Google Scholar]
- 5.Boltzmann L. Vorlesungen Über Gastheorie—II. Verlag von Johann Ambrosius Barth; Leipzig, Germany: 1898. [Google Scholar]
- 6.Planck M. Eight Lectures on Theoretical Physics. Columbia University Press; New York, NY, USA: 2015. [Google Scholar]
- 7.Maxwell J.C. The Scientific Papers of James Clerk Maxwell. Volume 2 Dover; New York, NY, USA: 1952. [Google Scholar]
- 8.Jaynes E.T. Gibbs vs Boltzmann Entropies. Am. J. Phys. 1965;33:391–398. doi: 10.1119/1.1971557. [DOI] [Google Scholar]
- 9.Müller I., Müller W.H. Fundamentals of Thermodynamics and Applications. With Historical Annotations and Many Citations from Avogadro to Zermelo. Springer; Berlin/Heidelberg, Germany: 2009. [DOI] [Google Scholar]
- 10.Rényi A. On measures of entropy and information. Proc. Berkeley Symp. Math. Stat. Probab. 1961;1:547–561. [Google Scholar]
- 11.Varma R.S. Generalization of Rényi’s Entropy of Order α. J. Math. Sci. 1966;1:34–48. [Google Scholar]
- 12.Havrda J., Charvát F. Quantification Method of Classification Processes: Concept of Structural α-Entropy. Kybernetika. 1967;3:30–35. [Google Scholar]
- 13.Csiszàr I. Information-Type Measures of Difference of Probability Distributions and Indirect Observations. Stud. Sci. Math. Hung. 1967;2:299–318. [Google Scholar]
- 14.Daróczy Z. Generalized Information Functions. Inf. Control. 1970;16:36–51. doi: 10.1016/S0019-9958(70)80040-7. [DOI] [Google Scholar]
- 15.Aczél J., Daróczy Z. On Measures of Information and Their Characterizations. Academic Press; New York, NY, USA: 1975. [Google Scholar]
- 16.Daróczy Z., Járai A. On the measurable solution of a functional equation arising in information theory. Acta Math. Acad. Sci. Hung. 1979;34:105–116. doi: 10.1007/BF01902599. [DOI] [Google Scholar]
- 17.Tsallis C. Possible Generalization of Boltzmann-Gibbs Statistics. J. Stat. Phys. 1988;52:479–487. doi: 10.1007/BF01016429. [DOI] [Google Scholar]
- 18.Salicrú M. Funciones de entropía asociada a medidas de Csiszár. Qüestiió. 1987;11:3–12. [Google Scholar]
- 19.Salicrú M., Menéndez M.L., Morales D., Pardo L. Asymptotic distribution of (h,ϕ)-entropies. Commun. Stat. Theory Methods. 1993;22:2015–2031. doi: 10.1080/03610929308831131. [DOI] [Google Scholar]
- 20.Salicrú M. Measures of information associated with Csiszár’s divergences. Kybernetica. 1994;30:563–573. [Google Scholar]
- 21.Liese F., Vajda I. On Divergence and Informations in Statistics and Information Theory. IEEE Trans. Inf. Theory. 2006;52:4394–4412. doi: 10.1109/TIT.2006.881731. [DOI] [Google Scholar]
- 22.Basseville M. Divergence measures for statistical data processing—An annotated bibliography. Signal Process. 2013;93:621–633. doi: 10.1016/j.sigpro.2012.09.003. [DOI] [Google Scholar]
- 23.Panter P.F., Dite W. Quantization distortion in pulse-count modulation with nonuniform spacing of levels. Proc. IRE. 1951;39:44–48. doi: 10.1109/JRPROC.1951.230419. [DOI] [Google Scholar]
- 24.Lloyd S.P. Least Squares Quantization in PCM. IEEE Trans. Inf. Theory. 1982;28:129–137. doi: 10.1109/TIT.1982.1056489. [DOI] [Google Scholar]
- 25.Gersho A., Gray R.M. Vector Quantization and Signal Compression. Kluwer; Boston, MA, USA: 1992. [DOI] [Google Scholar]
- 26.Campbell L.L. A coding theorem and Rényi’s entropy. Inf. Control. 1965;8:423–429. doi: 10.1016/S0019-9958(65)90332-3. [DOI] [Google Scholar]
- 27.Bercher J.F. Source coding with escort distributions and Rényi entropy bounds. Phys. Lett. A. 2009;373:3235–3238. doi: 10.1016/j.physleta.2009.07.015. [DOI] [Google Scholar]
- 28.Burbea J., Rao C.R. On the Convexity of Some Divergence Measures Based on Entropy Functions. IEEE Trans. Inf. Theory. 1982;28:489–495. doi: 10.1109/TIT.1982.1056497. [DOI] [Google Scholar]
- 29.Menéndez M.L., Morales D., Pardo L., Salicrú M. (h,Φ)-entropy differential metric. Appl. Math. 1997;42:81–98. doi: 10.1023/A:1022214326758. [DOI] [Google Scholar]
- 30.Pardo L. Statistical Inference Based on Divergence Measures. Chapman & Hall; Boca Raton, FL, USA: 2006. [Google Scholar]
- 31.Jaynes E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957;106:620–630. doi: 10.1103/PhysRev.106.620. [DOI] [Google Scholar]
- 32.Kapur J.N. Maximum Entropy Model in Sciences and Engineering. Wiley Eastern Limited; New Dehli, India: 1989. [Google Scholar]
- 33.Arndt C. Information Measures: Information and Its Description in Sciences and Engeniering. Springer; Berlin/Heidelberg, Germany: 2001. [DOI] [Google Scholar]
- 34.Cover T.M., Thomas J.A. Elements of Information Theory. 2nd ed. John Wiley & Sons; Hoboken, NJ, USA: 2006. [Google Scholar]
- 35.Gokhale D.V. Maximum entropy characterizations of some distributions. In: Patil S.K., Ord J.K., editors. A Modern Course on Statistical Distributions in Scientific Work. Volume III. Reidel; Dordrecht, The Netherlands: 1975. pp. 299–304. [DOI] [Google Scholar]
- 36.Jaynes E.T. Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 1968;4:227–241. doi: 10.1109/TSSC.1968.300117. [DOI] [Google Scholar]
- 37.Csiszàr I. Why Least Squares and Maximum Entropy? An Axiomatic Approach to Inference for Linear Inverse Problems. Ann. Stat. 1991;19:2031–2066. doi: 10.1214/aos/1176348385. [DOI] [Google Scholar]
- 38.Frigyik B.A., Srivastava S., Gupta M.R. Functional Bregman Divergence and Bayesian Estimation of Distributions. IEEE Trans. Infom. Theory. 2008;54:5130–5139. doi: 10.1109/TIT.2008.929943. [DOI] [Google Scholar]
- 39.Robert C.P. The Bayesian Choice. From Decision-Theoretic Foundations to Computational Implementation. 2nd ed. Springer; New York, NY, USA: 2007. [Google Scholar]
- 40.Jaynes E.T. On the rational of maximum-entropy methods. Proc. IEEE. 1982;70:939–952. doi: 10.1109/PROC.1982.12425. [DOI] [Google Scholar]
- 41.Jones L.K., Byrne C.L. General Entropy Criteria for Inverse Problems, with Applications to Data Compression, Pattern Classification, and Cluster Analysis. IEEE Trans. Inf. Theory. 1990;36:23–30. doi: 10.1109/18.50370. [DOI] [Google Scholar]
- 42.Hero III A.O., Ma B., Michel O.J.J., Gorman J. Application of Entropic Spanning Graphs. IEEE Signal Process. Mag. 2002;19:85–95. doi: 10.1109/MSP.2002.1028355. [DOI] [Google Scholar]
- 43.Park S.Y., Bera A.K. Maximum entropy autoregressive conditional heteroskedasticity model. J. Econom. 2009;150:219–230. doi: 10.1016/j.jeconom.2008.12.014. [DOI] [Google Scholar]
- 44.Vasicek O. A Test for Normality Based on Sample Entropy. J. R. Stat. Soc. B. 1976;38:54–59. doi: 10.1111/j.2517-6161.1976.tb01566.x. [DOI] [Google Scholar]
- 45.Gokhale D. On entropy-based goodness-of-fit tests. Comput. Stat. Data Anal. 1983;1:157–165. doi: 10.1016/0167-9473(83)90087-7. [DOI] [Google Scholar]
- 46.Song K.S. Goodness-of-fit tests based on Kullback-Leibler discrimination information. IEEE Trans. Inf. Theory. 2002;48:1103–1117. doi: 10.1109/18.995548. [DOI] [Google Scholar]
- 47.Lequesne J. A goodness-of-fit test of Student distributions based on Rényi entropy. In: Djafari A., Barbaresco F., Barbaresco F., editors. AIP Conference Proceedings of the 34th International Workshop on Bayesian Inference and Maximum Entropy Methods (MaxEnt’14) Volume 1641. American Institute of Physics; College Park, MD, USA: 2014. pp. 487–494. [DOI] [Google Scholar]
- 48.Lequesne J. Ph.D. Thesis. Université de Caen Basse-Normandie; Caen, France: 2015. Tests Statistiques Basés sur la Théorie de L’information, Applications en Biologie et en Démographie. [Google Scholar]
- 49.Girardin V., Regnault P. Escort distributions minimizing the Kullback-Leibler divergence for a large deviations principle and tests of entropy level. Ann. Inst. Stat. Math. 2015;68:439–468. doi: 10.1007/s10463-014-0501-x. [DOI] [Google Scholar]
- 50.Kesavan H.K., Kapur J.N. The Generalized Maximum Entropy Principle. IEEE Trans. Syst. Man Cybern. 1989;19:1042–1052. doi: 10.1109/21.44019. [DOI] [Google Scholar]
- 51.Borwein J.M., Lewis A.S. Duality Relationships for Entropy-Like Minimization Problems. SIAM J. Control Optim. 1991;29:325–338. doi: 10.1137/0329017. [DOI] [Google Scholar]
- 52.Borwein J.M., Lewis A.S. Convergence of best entropy estimates. SIAM J. Optim. 1991;1:191–205. doi: 10.1137/0801014. [DOI] [Google Scholar]
- 53.Borwein J.M., Lewis A.S. Partially-finite programming in L1 and the existence of maximum entropy estimates. SIAM J. Optim. 1993;3:248–267. doi: 10.1137/0803012. [DOI] [Google Scholar]
- 54.Mézard M., Montanari A. Information, Physics, and Computation. Oxford University Press; New York, NY, USA: 2009. [Google Scholar]
- 55.Darmois G. Sur les lois de probabilités à estimation exhaustive. C. R. l’Acadéie Sci. 1935;200:1265–1966. [Google Scholar]
- 56.Koopman B.O. On distributions admitting a sufficient statistic. Trans. Am. Math. Soc. 1936;39:399–409. doi: 10.1090/S0002-9947-1936-1501854-3. [DOI] [Google Scholar]
- 57.Pitman E.J.G. Sufficient statistics and intrinsic accuracy. Math. Proc. Camb. Philos. Soc. 1936;32:567–579. doi: 10.1017/S0305004100019307. [DOI] [Google Scholar]
- 58.Lehmann E.L., Casella G. Theory of Point Estimation. 2nd ed. Springer; New York, NY, USA: 1998. [Google Scholar]
- 59.Mukhopadhyay N. Probability and Statistical Inference. 5th ed. Volume 162 Marcel Dekker; New York, NY, USA: 2000. Statistics: Textbooks and Monographs. [Google Scholar]
- 60.Rao C.R. Linear Statistical Inference and Its Applications. John Wiley & Sons; New York, NY, USA: 2001. [Google Scholar]
- 61.Tsallis C., Mendes R.M., Plastino A.R. The role of constraints within generalized nonextensive statistics. Physica A. 1998;261:534–554. doi: 10.1016/S0378-4371(98)00437-3. [DOI] [Google Scholar]
- 62.Tsallis C. Nonextensive Statistics: Theoretical, Experimental and Computational Evidences and Connections. Braz. J. Phys. 1999;29:1–35. doi: 10.1590/S0103-97331999000100002. [DOI] [Google Scholar]
- 63.Tsallis C. Introduction to Nonextensive Statistical Mechanics—Approaching a Complex World. Springer; New York, NY, USA: 2009. [DOI] [Google Scholar]
- 64.Essex C., Schulzsky C., Franz A., Hoffmann K.H. Tsallis and Rényi entropies in fractional diffusion and entropy production. Physica A. 2000;284:299–308. doi: 10.1016/S0378-4371(00)00174-6. [DOI] [Google Scholar]
- 65.Parvan A.S., Biró T.S. Extensive Rényi statistics from non-extensive entropy. Phys. Lett. A. 2005;340:375–387. doi: 10.1016/j.physleta.2005.04.036. [DOI] [Google Scholar]
- 66.Kay S.M. Fundamentals for Statistical Signal Processing: Estimation Theory. Volume 1 Prentice Hall; Upper Saddle River, NJ, USA: 1993. [Google Scholar]
- 67.Frieden B.R. Science from Fisher Information: A Unification. Cambridge University Press; Cambridge, UK: 2004. [Google Scholar]
- 68.Jeffrey H. An Invariant Form for the Prior Probability in Estimation Problems. Proc. R. Soc. A. 1946;186:453–461. doi: 10.1098/rspa.1946.0056. [DOI] [PubMed] [Google Scholar]
- 69.Vignat C., Bercher J.F. Analysis of signals in the Fisher-Shannon information plane. Phys. Lett. A. 2003;312:27–33. doi: 10.1016/S0375-9601(03)00570-X. [DOI] [Google Scholar]
- 70.Romera E., Angulo J.C., Dehesa J.S. Fisher entropy and uncertainty like relationships in many-body systems. Phys. Rev. A. 1999;59:4064–4067. doi: 10.1103/PhysRevA.59.4064. [DOI] [Google Scholar]
- 71.Romera E., Sánchez-Moreno P., Dehesa J.S. Uncertainty relation for Fisher information of D-dimensional single-particle systems with central potentials. J. Math. Phys. 2006;47:103504. doi: 10.1063/1.2357998. [DOI] [Google Scholar]
- 72.Sánchez-Moreno P., González-Férez R., Dehesa J.S. Improvement of the Heisenberg and Fisher-information-based uncertainty relations for D-dimensional potentials. New J. Phys. 2006;8:330. doi: 10.1088/1367-2630/8/12/330. [DOI] [Google Scholar]
- 73.Toranzo I.V., Lopez-Rosa S., Esquivel R., Dehesa J.S. Heisenberg-like and Fisher-information uncertainties relations for N-fermion d-dimensional systems. Phys. Rev. A. 2015 doi: 10.1103/PhysRevA.91.062122. in press. [DOI] [Google Scholar]
- 74.Stam A.J. Some Inequalities Satisfied by the Quantities of Information of Fisher and Shannon. Inf. Control. 1959;2:101–112. doi: 10.1016/S0019-9958(59)90348-1. [DOI] [Google Scholar]
- 75.Dembo A., Cover T.M., Thomas J.A. Information Theoretic Inequalities. IEEE Trans. Inf. Theory. 1991;37:1501–1518. doi: 10.1109/18.104312. [DOI] [Google Scholar]
- 76.Guo D., Shamai S., Verdú S. Mutual Information and Minimum Mean-Square Error in Gaussian Channels. IEEE Trans. Inf. Theory. 2005;51:1261–1282. doi: 10.1109/TIT.2005.844072. [DOI] [Google Scholar]
- 77.Folland G.B., Sitaram A. The uncertainty principle: A mathematical survey. J. Fourier Anal. Appl. 1997;3:207–233. doi: 10.1007/BF02649110. [DOI] [Google Scholar]
- 78.Sen K.D. Statistical Complexity. Application in Electronic Structure. Springer; New York, NY, USA: 2011. [DOI] [Google Scholar]
- 79.Vajda I. Transactions of the 6th Prague Conference on Information Theory, Statistics, Decision Functions and Random Processes. Academia; Prague, Czech Republic: 1973. χα-divergence and generalized Fisher’s information; pp. 873–886. [Google Scholar]
- 80.Boekee D.E. An extension of the Fisher information measure. Topics in information theory. In: Csiszàr I., Elias. P., editors. Proceedings 2nd Colloquium on Information Theory. Volume 16. North-Holland; Keszthely, Hungary: 1977. pp. 113–123. (Series: Colloquia Mathematica Societatis János Bolyai). [Google Scholar]
- 81.Hammad P. Mesure d’ordre α de l’information au sens de Fisher. Rev. Stat. Appl. 1978;26:73–84. [Google Scholar]
- 82.Boekee D.E., Van der Lubbe J.C.A. The R-Norm Information Measure. Inf. Control. 1980;45:136–155. doi: 10.1016/S0019-9958(80)90292-2. [DOI] [Google Scholar]
- 83.Lutwak E., Yang D., Zhang G. Moment-Entropy Inequalities. Ann. Probab. 2004;32:757–774. doi: 10.1214/aop/1079021463. [DOI] [Google Scholar]
- 84.Lutwak E., Yang D., Zhang G. Cramér-Rao and Moment-Entropy Inequalities for Rényi Entropy and Generalized Fisher Information. IEEE Trans. Inf. Theory. 2005;51:473–478. doi: 10.1109/TIT.2004.840871. [DOI] [Google Scholar]
- 85.Lutwak E., Yang D., Zhang G. Moment-Entropy Inequalities for a Random Vector. IEEE Trans. Inf. Theory. 2007;53:1603–1607. doi: 10.1109/TIT.2007.892780. [DOI] [Google Scholar]
- 86.Lutwak E., Lv S., Yang D., Zhang G. Extension of Fisher Information and Stam’s Inequality. IEEE Trans. Inf. Theory. 2012;58:1319–1327. doi: 10.1109/TIT.2011.2177563. [DOI] [Google Scholar]
- 87.Bercher J.F. On a (β,q)-generalized Fisher information and inequalities invoving q-Gaussian distributions. J. Math. Phys. 2012;53:063303. doi: 10.1063/1.4726197. [DOI] [Google Scholar]
- 88.Bercher J.F. On generalized Cramér-Rao inequalities, generalized Fisher information and characterizations of generalized q-Gaussian distributions. J. Phys. A. 2012;45:255303. doi: 10.1088/1751-8113/45/25/255303. [DOI] [Google Scholar]
- 89.Bercher J.F. On multidimensional generalized Cramér-Rao inequalities, uncertainty relations and characterizations of generalized q-Gaussian distributions. J. Phys. A. 2013;46:095303. doi: 10.1088/1751-8113/46/9/095303. [DOI] [Google Scholar]
- 90.Bercher J.F. Some properties of generalized Fisher information in the context of nonextensive thermostatistics. Physica A. 2013;392:3140–3154. doi: 10.1016/j.physa.2013.03.062. [DOI] [Google Scholar]
- 91.Bregman L.M. The relaxation method of finding the common point of convex sets and its application to the solution of problem in convex programming. USSR Comput. Math. Math. Phys. 1967;7:200–217. doi: 10.1016/0041-5553(67)90040-7. [DOI] [Google Scholar]
- 92.Nielsen F., Nock R. Generalizing Skew Jensen Divergences and Bregman Divergences With Comparative Convexity. IEEE Signal Process. Lett. 2017;24:1123–1127. doi: 10.1109/LSP.2017.2712195. [DOI] [Google Scholar]
- 93.Ben-Tal A., Bornwein J.M., Teboulle M. Spectral Estimation via Convex Programming. In: Phillips F.Y., Rousseau J.J., editors. Systems and Management Science by Extremal Methods. Springer US; New York, NY, USA: 1992. [DOI] [Google Scholar]
- 94.Teboulle M., Vajda I. Convergence of Best ϕ-Entropy Estimates. IEEE Trans. Inf. Theory. 1993;39:297–301. doi: 10.1109/18.179378. [DOI] [Google Scholar]
- 95.Girardin V. Méthodes de réalisation de produit scalaire et de problème de moments avec maximisation d’entropie. Stud. Math. 1997;124:199–213. doi: 10.4064/sm-124-3-199-213. [DOI] [Google Scholar]
- 96.Girardin V. Relative Entropy and Spectral Constraints: Some Invariance Properties of the ARMA Class. J. Time Ser. Anal. 2007;28:844–866. doi: 10.1111/j.1467-9892.2007.00535.x. [DOI] [Google Scholar]
- 97.Costa J.A., Hero III A.O., Vignat C. On Solutions to Multivariate Maximum α-Entropy Problems. In: Rangarajan A., Figueiredo M.A.T., Zerubia J., editors. 4th International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR) Volume 2683. Springer; Lisbon, Portugal: 2003. pp. 211–226. Lecture Notes in Computer, Sciences. [Google Scholar]
- 98.Johnson N.L., Kotz S., Balakrishnan N. Continuous Univariate Distributions. 2nd ed. Volume 2 John Wiley & Sons; New York, NY, USA: 1995. [Google Scholar]
- 99.Chhabra A., Jensen R.V. Direct determination of the f(α) singularity spectrum. Phys. Rev. Lett. 1989;62:1327. doi: 10.1103/PhysRevLett.62.1327. [DOI] [PubMed] [Google Scholar]
- 100.Beck C., Schögl F. Thermodynamics of Chaotic Systems: An Introduction. Cambridge University Press; Cambridge, UK: 1993. [DOI] [Google Scholar]
- 101.Naudts J. Generalized Thermostatistics. Springer; London, UK: 2011. [DOI] [Google Scholar]
- 102.Martínez S., Nicolás F., Pennini F., Plastino A. Tsallis’ entropy maximization procedure revisited. Physica A. 2000;286:489–502. doi: 10.1016/S0378-4371(00)00359-9. [DOI] [Google Scholar]
- 103.Chimento L.P., Pennini F., Plastino A. Naudts-like duality and the extreme Fisher information principle. Phys. Rev. E. 2000;62:7462–7465. doi: 10.1103/PhysRevE.62.7462. [DOI] [PubMed] [Google Scholar]
- 104.Casas M., Chimento L., Pennini F., Plastino A., Plastino A.R. Fisher information in a Tsallis non-extensive environment. Chaos Solitons Fractals. 2002;13:451–459. doi: 10.1016/S0960-0779(01)00027-3. [DOI] [Google Scholar]
- 105.Rudin W. Functional Analysis. 2nd ed. McGraw-Hill; New York, NY, USA: 1991. [Google Scholar]
- 106.Morrison T.J. Functional Analysis. An Introduction to Banach Space Theory. John Wiley & Sons; New York, NY, USA: 2000. [Google Scholar]
- 107.Rioul O. Information Theoretic Proofs of Entropy Power Inequalities. IEEE Trans. Inf. Theory. 2011;57:33–55. doi: 10.1109/TIT.2010.2090193. [DOI] [Google Scholar]
- 108.Rao C.R., Wishart J. Minimum variance and the estimation of several parameters. Math. Proc. Camb. Philos. Soc. 1947;43:280–283. doi: 10.1017/S0305004100023471. [DOI] [Google Scholar]
- 109.Van den Bos A. Parameter Estimation for Scientists and Engineers. John Wiley & Sons; Hoboken, NJ, USA: 2007. [Google Scholar]
- 110.Magnus J.R., Neudecker H. Matrix Differential Calculus with Applications in Statistics and Econometrics. 3rd ed. John Wiley & Sons; New York, NY, USA: 1999. [Google Scholar]
- 111.Guo D., Shamai S., Verdú S. Additive Non-Gaussian Noise Channels: Mutual Information and Conditional Mean Estimation. IEEE Int. Symp. Inf. Theory. 2005:719–723. doi: 10.1109/ISIT.2005.1523430. [DOI] [Google Scholar]
- 112.Palomar D.P., Verdú S. Gradient of Mutual Information in Linear Vector Gaussian Channels. IEEE Trans. Inf. Theory. 2006;52:141–154. doi: 10.1109/TIT.2005.860424. [DOI] [Google Scholar]
- 113.Verdú S. Mismatched Estimation and Relative Entropy. IEEE Trans. Inf. Theory. 2010;56:3712–3720. doi: 10.1109/TIT.2010.2050800. [DOI] [Google Scholar]
- 114.Barron A.R. Entropy and the Central Limit Theorem. Ann. Probab. 1986;14:336–342. doi: 10.1214/aop/1176992632. [DOI] [Google Scholar]
- 115.Johnson O. Information Theory and the Central Limit Theorem. Imperial College Press; London, UK: 2004. [Google Scholar]
- 116.Madiman M., Barron A. Generalized Entropy Power Inequalities and Monotonicity Properties of Information. IEEE Trans. Inf. Theory. 2007;53:2317–2329. doi: 10.1109/TIT.2007.899484. [DOI] [Google Scholar]
- 117.Toranzo I.V., Zozor S., Brossier J.M. Generalization of the de Bruijn identity to general ϕ-entropies and ϕ-Fisher informations. IEEE Trans. Inf. Theory. 2018;64:6743–6758. doi: 10.1109/TIT.2017.2771209. [DOI] [Google Scholar]
- 118.Widder D.V. The Heat Equation. Academic Press; New York, NY, USA: 1975. [Google Scholar]
- 119.Roubíček T. Nonlinear Partial Differential Equations with Applications. Birkhaäuser; Basel, Switzerland: 2005. [Google Scholar]
- 120.Tsallis C., Lenzi E.K. Anomalous diffusion: Nonlinear fractional Fokker-Planck equation. Chem. Phys. 2002;284:341–347. doi: 10.1016/S0301-0104(02)00557-8. [DOI] [Google Scholar]
- 121.Vázquez J.L. Smoothing and Decay Estimates for Nonlinear Diffusion Equations—Equation of Porous Medium Type. Oxford University Press; New York, NY, USA: 2006. [Google Scholar]
- 122.Gilding B.H., Kersner R. Travelling Waves in Nonlinear Diffusion-Convection Reaction. Springer; Basel, Switzerland: 2004. [DOI] [Google Scholar]
- 123.Price R. A Useful Theorem for Nonlinear Devices Having Gaussian Inputs. IEEE Trans. Inf. Theory. 1958;4:69–72. doi: 10.1109/TIT.1958.1057444. [DOI] [Google Scholar]
- 124.Pawula R. A modified version of Price’s theorem. IEEE Trans. Inf. Theory. 1967;13:285–288. doi: 10.1109/TIT.1967.1054014. [DOI] [Google Scholar]
- 125.Riba J., de Cabrera F. A Proof of de Bruijn Identity based on Generalized Price’s Theorem. IEEE Int. Symp. Inf. Theory. 2019:2509–2513. doi: 10.1109/isit.2019.8849368. [DOI] [Google Scholar]
- 126.Lieb E.H. Proof of an Entropy Conjecture of Wehrl. Commun. Math. Phys. 1978;62:35–41. doi: 10.1007/BF01940328. [DOI] [Google Scholar]
- 127.Costa M., Cover T. On the Similarity of the Entropy Power Inequality and the Brunn-Minkowski Inequality. IEEE Trans. Inf. Theory. 1984;30:837–839. doi: 10.1109/TIT.1984.1056983. [DOI] [Google Scholar]
- 128.Carlen E.A., Soffer A. Entropy Production by Block Variable Summation and Central Limit Theorems. Commun. Math. Phys. 1991;140:339–371. doi: 10.1007/BF02099503. [DOI] [Google Scholar]
- 129.Harremoës P., Vignat C. An Entropy Power Inequality for the Binomial Family. J. Inequalities Pure Appl. Math. 2003;4:93. [Google Scholar]
- 130.Johnson O., Yu Y. Monotonicity, Thinning, and Discrete Versions of the Entropy Power Inequality. IEEE Trans. Inf. Theory. 2010;56:5387–5395. doi: 10.1109/TIT.2010.2070570. [DOI] [Google Scholar]
- 131.Haghighatshoar S., Abbe E., Telatar I.E. A New Entropy Power Inequality for Integer-Valued Random Variables. IEEE Trans. Inf. Theory. 2014;60:3787–3796. doi: 10.1109/TIT.2014.2317181. [DOI] [Google Scholar]
- 132.Bobkov S.G., Chistyakov G.P. Entropy Power Inequality for the Rényi Entropy. IEEE Trans. Inf. Theory. 2015;61:708–714. doi: 10.1109/TIT.2014.2383379. [DOI] [Google Scholar]
- 133.Costa M. A New Entropy Power Inequality. IEEE Trans. Inf. Theory. 1985;31:751–760. doi: 10.1109/TIT.1985.1057105. [DOI] [Google Scholar]
- 134.Dembo A. Simple Proof of the Concavity of the Entropy Power with Respect to Added Gaussian Noise. IEEE Trans. Inf. Theory. 1989;35:887–888. doi: 10.1109/18.32166. [DOI] [Google Scholar]
- 135.Villani C. A Short Proof of the “Concavity of Entropy Power”. IEEE Trans. Inf. Theory. 2000;46:1695–1696. doi: 10.1109/18.850718. [DOI] [Google Scholar]
- 136.Toscani G. Heat Equation and Convolution Inequalities. Milan J. Math. 2014;82:183–212. doi: 10.1007/s00032-014-0219-5. [DOI] [Google Scholar]
- 137.Toscani G. A Strengthened Entropy Power Inequality for Log-Concave Densities. IEEE Trans. Inf. Theory. 2015;61:6550–6559. doi: 10.1109/TIT.2015.2495302. [DOI] [Google Scholar]
- 138.Ram E., Sason I. On Rényi Entropy Power Inequalities. IEEE Trans. Inf. Theory. 2016;62:6800–6815. doi: 10.1109/TIT.2016.2616135. [DOI] [Google Scholar]
- 139.Bobkov S.G., Marsiglietti A. Variants of the Entropy Power Inequality. IEEE Trans. Inf. Theory. 2017;63:7747–7752. doi: 10.1109/TIT.2017.2764487. [DOI] [Google Scholar]
- 140.Savaré G., Toscani G. The Concavity of Rényi Entropy Power. IEEE Trans. Inf. Theory. 2014;60:2687–2693. doi: 10.1109/TIT.2014.2309341. [DOI] [Google Scholar]
- 141.Zozor S., Puertas-Centeno D., Dehesa J.S. On Generalized Stam Inequalities and Fisher–Rényi Complexity Measures. Entropy. 2017;19:493. doi: 10.3390/e19090493. [DOI] [Google Scholar]
- 142.Rioul O. Yet Another Proof of the Entropy Power Inequality. IEEE Trans. Inf. Theory. 2017;63:3595–3599. doi: 10.1109/TIT.2017.2676093. [DOI] [Google Scholar]
- 143.Rosenblatt M. Remarks on Some Nonparametric Estimates of a Density Function. Ann. Math. Stat. 1956;27:832–837. doi: 10.1214/aoms/1177728190. [DOI] [Google Scholar]
- 144.Parzen E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat. 1962;33:1065–1076. doi: 10.1214/aoms/1177704472. [DOI] [Google Scholar]
- 145.Beirlant J., Dudewicz E.J., Györfi L., van der Meulen E.C. Nonparametric Entropy Estimation: An Overview. Int. J. Math. Stat. Sci. 1997;6:17–39. [Google Scholar]
- 146.Leonenko N., Pronzato L., Savani V. A Class of Rényi Information Estimators for Multidimensional Densities. Ann. Stat. 2008;36:2153–2182. doi: 10.1214/07-AOS539. [DOI] [Google Scholar]
- 147.Johnson N.L., Kotz S., Balakrishnan N. Continuous Univariate Distributions. 2nd ed. Volume 1 John Wiley & Sons; New York, NY, USA: 1995. [Google Scholar]
- 148.Corless R.M., Gonnet G.H., Hare D.E.G., Jeffrey D.J., Knuth D.E. On the Lambert W Function. Adv. Comput. Math. 1996;5:329–359. doi: 10.1007/BF02124750. [DOI] [Google Scholar]
- 149.Abramowitz M., Stegun I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. 9th ed. Dover; New York, NY, USA: 1970. [Google Scholar]
- 150.Gradshteyn I.S., Ryzhik I.M. Table of Integrals, Series, and Products. 8th ed. Academic Press; San Diego, CA, USA: 2015. [Google Scholar]
- 151.Alzahrani F., Salem A. Sharp bounds for the Lambert W function. Integral Transform. Spec. Funct. 2018;29:971–978. doi: 10.1080/10652469.2018.1528247. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Not applicable.



