Skip to main content
Entropy logoLink to Entropy
. 2024 Jun 30;26(7):565. doi: 10.3390/e26070565

Statistical Divergence and Paths Thereof to Socioeconomic Inequality and to Renewal Processes

Iddo Eliazar 1
Editor: Nikolai Leonenko1
PMCID: PMC11276393  PMID: 39056927

Abstract

This paper establishes a general framework for measuring statistical divergence. Namely, with regard to a pair of random variables that share a common range of values: quantifying the distance of the statistical distribution of one random variable from that of the other. The general framework is then applied to the topics of socioeconomic inequality and renewal processes. The general framework and its applications are shown to yield and to relate to the following: f-divergence, Hellinger divergence, Renyi divergence, and Kullback–Leibler divergence (also known as relative entropy); the Lorenz curve and socioeconomic inequality indices; the Gini index and its generalizations; the divergence of renewal processes from the Poisson process; and the divergence of anomalous relaxation from regular relaxation. Presenting a ‘fresh’ perspective on statistical divergence, this paper offers its readers a simple and transparent construction of statistical-divergence gauges, as well as novel paths that lead from statistical divergence to the aforementioned topics.

Keywords: Kullback–Leibler divergence (relative entropy), Renyi divergence, f-divergence, Lorenz curve, inequality indices, Gini index, renewal processes

PACS: 02.50.-r (probability theory, stochastic processes, and statistics); 05.40.-a (fluctuation phenomena, random processes, noise, and Brownian motion); 89.65.-s (social and economic systems)

1. Introduction

Measuring distances is of foundational importance in all fields of science and engineering. Arguably, measuring distances emerged with regard to points in planar geometry. Elevating from points in the plane to points in general spaces—e.g., Hilbert, Banach, and metric spaces—facilitates measuring distances between very general objects.

Even in the basic case of planar geometry, there are various ways of measuring distances. To illustrate this, envisage Manhattan (above 14th street). From an aerial perspective, the distance between two addresses in Manhattan is the Euclidean distance. From a pedestrian perspective, the distance between two addresses in Manhattan is the grid distance—which is attained by walking along avenues and streets (as pedestrians cannot walk through buildings).

Measuring distances is not at all confined to geometry—be it in the plane, or in general spaces. Indeed, consider a human society of interest and the distribution of wealth among its members. In a purely egalitarian society, the distribution of wealth is perfectly equal: all the members have exactly the same wealth. Of course, any real human society is not purely egalitarian. A key topic in economics and in the social sciences is quantifying socioeconomic inequality [1,2,3]. Namely, measuring the distance of the human society of interest from the ‘benchmark state’ of perfect equality. This topic is addressed by important notions including the Lorenz curve [4,5,6,7] and socioeconomic inequality indices [1,2,3,8,9].

Shifting from socioeconomics to statistics, and from distributions of wealth to statistical distributions: consider the statistical distributions of random variables that take values in a common range (e.g., the positive half-line). In this context, a key topic is measuring statistical divergence [10,11,12]: the distance of a statistical distribution of interest from a ‘benchmark’ statistical distribution. Mainly, this topic is addressed by the Kullback–Leibler divergence (also known as “relative entropy”) [13,14,15,16]; the Renyi divergence [17,18,19,20]; and the more general “f-divergence” [17,21,22,23,24,25,26]. Recent examples of the wide use of these divergences include [27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42].

This paper establishes a general framework for measuring statistical divergence. The general framework is presented in Section 2, and it has a high ‘return on investment’. Indeed, on the one hand, the framework is based on a simple and transparent construction. And, on the other hand, the framework yields potent gauges of statistical divergence. In particular, the framework leads to the aforementioned divergences. With the general framework established, the paper presents two applications of it.

Section 3 applies the general framework to the measurement of socioeconomic inequality. In particular, this application will show how the framework yields the aforementioned notions of the Lorenz curve and socioeconomic inequality indices; as well as the Gini index—perhaps the best known and the most widely applied socioeconomic inequality index [43,44,45,46,47,48]—and its generalizations.

Section 4 applies the general framework to renewal processes [49,50,51]. In particular, this application will show how the framework facilitates measuring the divergence of renewal processes from the Poisson process [52,53,54], and measuring the divergence of anomalous relaxation [55,56,57,58] from ‘regular’ exponential relaxation. In addition, this application will yield further relations to socioeconomic inequality indices, as well as further generalizations of the Gini index.

Each of the Section 2, Section 3 and Section 4 ends with a short summary. For a quick read of the paper, readers can go over these summaries.

The novelty of this paper is twofold. Firstly, the paper offers its readers a direct and transparently constructed path to statistical divergence. Thereof, the paper offers its readers further paths to socioeconomic inequality, to renewal processes, and to anomalous relaxation. The paper illuminates profound linkages, which are not straightforwardly apparent, between the different (and seemingly unrelated) topics it addresses.

The paper is written in a self-contained fashion, and its pre-requisites are basic calculus and basic probability. Thus, the paper is suitable for a wide range of audiences: theoreticians and practitioners alike, from diverse fields of science and engineering.

A note about notation. Throughout this paper, IID is an acronym for “independent and identically distributed” (random variables), and E[·] denotes expectation (namely, E[Z] is the mean of a given random variable Z).

2. General Framework

Consider a random variable X that takes values in a real range R=(rlow,rup), where rlow is the range’s lower bound, rup is the range’s upper bound, and rlow<rup. Further consider a random variable Y that is independent of X and that takes values in the range rlow,rup. Namely, in addition to values in the range R, the random variable Y can also attain the lower-bound value rlow and the upper-bound value rup.

The goal of this paper is to measure the statistical divergence of the random variable Y from the random variable X. In other words, the goal is to quantify the extent by which the statistical distribution of the random variable Y deviates from the statistical distribution of the random variable X.

Henceforth, the distribution functions of X and Y are denoted, respectively, by Ar=PrXr (rR) and by Br=PrYr (rR). The distribution function Ar is assumed to be increasing, and hence it has an increasing inverse function A1u (0<u<1). In addition, the probability that the random variables X and Y coincide is assumed to be zero, PrX=Y=0; this assumption holds whenever the distribution function Ar is continuous.

We shall measure the statistical divergence of Y from X via the curve

Cu=BA1u (1)

(0<u<1). It is straightforward to observe that the curve Cu is non-decreasing, and that its boundary values are limu0Cu=0 and limu1Cu=1. These observations imply that the curve Cu manifests a distribution function over the unit interval. Specifically, Cu=PrUu, where U is a random variable that takes values in the unit interval 0,1.

In the case that the distribution functions of X and Y are smooth, their corresponding density functions are denoted, respectively, by ar=Ar (rR) and by br=Br (rR). In turn, differentiating Equation (1) yields

Cu=bA1uaA1u (2)

(0<u<1). Namely, Cu is the density function of the random variable U. (As the distribution function Ar is assumed to be increasing, the density ar is positive, and hence: the denominator appearing on the right-hand side of Equation (2) is positive, and the ratio is well-defined).

Statistical distributions that are defined over the unit interval have a natural ‘benchmark’: the uniform distribution—the unique statistical distribution that assigns all possible outcomes the same likelihood of occurrence. The uniform distribution is characterized by the linear distribution function C*u=u (0<u<1), as well as by the flat density function C*u=1 (0<u<1). In what follows, we set U* to be a random variable that is uniformly distributed over the unit interval, PrU*u=u.

Equation (1) implies that the random variables X and Y are equal in law, ArBr, if and only if the random variables U and U* are equal in law, Cuu. Consequently, the statistical divergence of Y from X can be measured ‘by proxy’ as follows: quantifying the deviation of the statistical distribution of the random variable U from the uniform distribution. We shall do so using three methods: area—Section 2.1; moments—Section 2.2; and coincidence—Section 2.4.

2.1. Area

The graphs of distribution functions that are defined over the unit interval reside in the unit square, where they ‘climb’ from the square’s bottom-left corner to its top-right corner. Namely, the 2D coordinates of the unit-square’s bottom-left corner are 0,0, and the 2D coordinates of the top-right corner are 1,1. The distribution function of the uniform distribution, C*u=u (0<u<1), is the diagonal line of the unit square. This line splits the unit square into two triangles: the triangle below the diagonal line, whose area is 12; and the triangle above the diagonal line, whose area is also 12. Thus, the difference between the area of the upper triangle and the area of the lower triangle is zero.

Similarly to the diagonal line, the curve Cu splits the unit square into two sets: the ‘lower set’, comprising the square’s points that are below the curve; and the ‘upper set’, comprising the square’s points that are above the curve. A derivation detailed in Appendix A asserts that the area of the lower set is 01Cudu=PrYX. In turn, as the square’s total area is one, the area of the upper set is 1PrYX=PrY>X. Thus, the difference between the area of the upper set and the area of the lower set is the quantity

ΔY||X=PrY>XPrYX. (3)

With regard to Equation (3), recall that the random variables X and Y are considered to be mutually independent. In terms of the curve Cu, the quantity of Equation (3) admits the formulation ΔY||X=1201Cudu.

The quantity ΔY||X takes values in the range [1,1]. The quantity ΔY||X attains its lower bound if and only if the random variable Y equals its lower bound with probability one: ΔY||X=1PrY=rlow=1. Antithetically, the quantity ΔY||X attains its upper bound if and only if the random variable Y equals its upper bound with probability one: ΔY||X=1PrY=rup=1. The quantity ΔY||X is zero if—but not only if—the random variables X and Y are equal in law: ArBrΔY||X=0 (this implication uses the aforementioned assumption PrY=X=0).

We illustrate the fact that a zero quantity ΔY||X does not imply that the random variables X and Y are equal in law. To that end, consider the random variables X and Y to be symmetric: X=X and Y=Y, the equalities being in law. The symmetry of the random variables X and Y implies that PrY>X=PrYX (this implication also uses the aforementioned assumption PrY=X=0), and hence ΔY||X=0. However, the symmetry of X and Y does not imply that these random variables are equal in law.

The quantity ΔY||X is a ‘first-order measurement’ of the statistical divergence of the random variable Y from the random variable X. Indeed, the quantity ΔY||X does not characterize the case of a zero statistical divergence, i.e., the case where the random variables X and Y are equal in law. In the next subsection, we shall elevate from the ‘first-order measurement’ ΔY||X to ‘higher-order measurements’.

2.2. Moments

Consider a random variable ξ that takes values in the unit interval, and whose statistical distribution is governed by the density function ϕu (0<u<1). The moments of the random variable ξ are the sequence E[ξm]=01umϕudu (m=1,2,3,). The Hausdorff moment problem asserts that the moments of the random variable ξ characterize its statistical distribution [59]. Namely, if another random variable—that also takes values in the unit interval—has the same moments as ξ, then the two random variables are equal in law.

In this subsection we shall apply the Hausdorff moment problem in order to measure the statistical divergence of the random variable Y from the random variable X. In what follows, X1,,Xm denote m IID copies of the random variable X; these IID copies are considered to be independent of the random variable Y.

As above, U* denotes a random variable that is uniformly distributed over the unit interval. As the density function of the uniform distribution is C*u=1, the moments of the random variable U* are E[U*m]=1m+1. A derivation detailed in Appendix A asserts that the moments of the random variable U are E[Um]=PrX1,,XmY; namely, the moment E[Um] is the probability that the m IID copies of X are all no larger than Y. Consequently, multiplying the difference of the moments E[Um]E[U*m] by the factor m+1 yields the quantity

αmY||X=m+1PrX1,,XmY1. (4)

The quantity αmY||X involves the probability pm=PrX1,,XmY. Being a probability, pm takes values in the unit interval 0,1, and hence the quantity αmY||X takes values in the range [1,m]. Moreover, the quantity αmY||X attains its lower bound 1 if and only if pm=0, i.e., if and only if the random variable Y equals its lower bound rlow with probability one: αmY||X=1PrY=rlow=1. And, antithetically, the quantity αmY||X attains its upper bound m if and only if pm=1, i.e., if and only if the random variable Y equals its upper bound rup with probability one: αmY||X=mPrY=rup=1.

As the random variables U* and U take values in the unit interval, so do the random variables 1U* and 1U. The random variable 1U* is uniformly distributed, and hence its moments are E[(1U)*m]=1m+1. A derivation detailed in Appendix A asserts that the moments of the random variable 1U are E[(1U)m]=PrX1,,Xm>Y; namely, the moment E[(1U)m] is the probability that the m IID copies of X are all larger than Y. Consequently, multiplying the difference of moments E[(1U*)m]E[(1U)m] by the factor m+1 yields the quantity

βmY||X=1m+1PrX1,,Xm>Y. (5)

The quantity βmY||X involves the probability qm=PrX1,,Xm>Y. Being a probability, qm takes values in the unit interval 0,1, and hence the quantity βmY||X takes values in the range [m,1]. Moreover, the quantity βmY||X attains its lower bound m if and only if qm=1, i.e., if and only if the random variable Y equals its lower bound rlow with probability one: βmY||X=mPrY=rlow=1. And, antithetically, the quantity βmY||X attains its upper bound 1 if and only if qm=0, i.e., if and only if the random variable Y equals its upper bound rup with probability one: βmY||X=1PrY=rup=1.

The Hausdorff moment problem implies that the random variables U and U* are equal in law if and only if E[Um]=E[U*m] (m=1,2,3,), as well as if and only if E[1Um]=E[1U*m] (m=1,2,3,). In turn, we obtain that the random variables X and Y are equal in law if and only if the quantities αmY||X are all zero, as well as if and only if the quantities βmY||X are all zero: ArBrαmY||X=0 (m=1,2,3,); and ArBrβmY||X=0 (m=1,2,3,). Thus, collectively, the quantities αmY||X, as well as the quantities βmY||X, characterize the case of a zero statistical divergence of the random variable Y from the random variable X.

For m=1, Equations (4) and (5) imply that

α1Y||X=β1Y||X=ΔY||X, (6)

where ΔY||X is the quantity of Equation (3) (this implication uses the aforementioned assumption PrY=X=0). As noted in Section 2.1, the quantity ΔY||X is a ‘first-order measurement’ of the statistical divergence of the random variable Y from the random variable X. The quantities αmY||X and βmY||X are ‘higher-order measurements’ of this statistical divergence.

To conclude this subsection, we underscore the fact that no use of the moments of the random variables X and Y was made here. Indeed, in general, the random variables X and Y may or may not have well-defined moments. This subsection used the (well-defined) moments of the random variables U and 1U in order to ‘harvest information’ regarding the statistical divergence of the random variable Y from the random variable X. The information that was harvested appeared in terms of the probabilities PrX1,,XmY and PrX1,,Xm>Y (rather than in terms of moments of the random variables X and Y).

2.3. Weighted-Average Representations

The previous subsection established the quantities αmY||X (m=1,2,3,) and the quantities βmY||X (m=1,2,3,). In this subsection, we further establish weighted-average representations of these quantities. As in the previous subsection, X1,,Xm denote m IID copies of the random variable X that are independent of the random variable Y.

The quantity αmY||X of Equation (4) involves the maximal random variable maxX1,,Xm. Indeed, the probability appearing in the right-hand side of Equation (4) is PrmaxX1,,XmY. Set aˇmr (rR) to denote the density function of the maximal random variable maxX1,,Xm. Then, using this density function, a derivation detailed in Appendix A asserts that

αmY||X=RArBraˇmrdr. (7)

The quantity βmY||X in Equation (5) involves the minimal random variable minX1,,Xm. Indeed, the probability appearing in the right-hand side of Equation (5) is PrminX1,,Xm>Y. Set a^mr (rR) to denote the density function of the minimal random variable minX1,,Xm. Then, using this density function, a derivation detailed in Appendix A asserts that

βmY||X=RArBra^mrdr. (8)

For m=1, the maximal random variable maxX1,,Xm and the minimal random variable minX1,,Xm are both equal, in law, to the random variable X. Consequently, aˇmr=a^mr=ar, and hence Equations (7), (8) and Equation (6) imply that

ΔY||X=RArBrardr. (9)

Equations (7)–(9) manifest weighted-average representations for the quantities appearing in them. In these representations, the averages are of the difference between the distribution functions of the random variables X and Y. Moreover, the averaging weights are as follows: the density function of the maximal random variable maxX1,,Xm in Equation (7); the density function of the minimal random variable minX1,,Xm in Equation (8); and the density function of the random variable X in Equation (9).

2.4. Coincidence

As above, consider a random variable ξ that takes values in the unit interval, and whose statistical distribution is governed by the density function ϕu (0<u<1). Further consider m IID simulations of the random variable ξ (m=2,3,4,). The likelihood that the m simulations will all yield the same outcome is Lmξ=01ϕumdu. We term Lmξ the “coincidence likelihood”, of order m, of the random variable ξ. In this subsection we shall apply coincidence likelihoods in order to measure the statistical divergence of the random variable Y from the random variable X.

As above, U* denotes a random variable that is uniformly distributed over the unit interval. As the density function of the uniform distribution is C*u=1, the coincidence likelihoods of the random variable U* are LmU*=1. A derivation detailed in Appendix A calculates the coincidence likelihoods of the random variable U. The derivation implies that the difference LmULmU* between the two coincidence likelihoods is the quantity

γmY||X=Rbrarm1ardr. (10)

With regard to Equation (10), recall that ar and br are, respectively, the density functions of the random variables X and Y.

Evidently, the quantity of Equation (10) can be extended to any real value of the parameter m. In what follows, we extend this parameter from the discrete set of values m=2,3,4, to the continuous range of values m>1.

For the continuous range m>1, a convexity argument that is detailed in Appendix A implies the two following properties. (I) The quantity of Equation (10) is non-negative: γmY||X0. (II) The quantity of Equation (10) is zero if and only if the random variables X and Y are equal in law: γmY||X=0arbr. Hence, with regard to any parameter value in the continuous range m>1, the following conclusion is attained: the quantity γmY||X characterizes the case of a zero statistical divergence of the random variable Y from the random variable X.

The convexity argument that holds for the continuous range m>1 also holds for the continuous range m<0. Shifting from the former range (m>1) to the latter range (m<0) flips the statistical divergence from “Y with respect to X” to “X with respect to Y”. Indeed, straightforward calculations that are based on Equation (10) yield the following pair of ‘flipping formulae’. (I) When the parameter is in the continuous range m>1, then γmX||Y=γ(1m)Y||X, in which case (1m)<0. (II) When the parameter is in the continuous range m<0, then γmY||X=γ(1m)X||Y, in which case (1m)>1.

2.5. Discussion

The quantity of Equation (10) is intimately related to the Hellinger divergence (see [10,12,26]), as well as to the three divergences that were noted in the introduction: the Kullback–Leibler divergence (also known as “relative entropy”) [13,14,15,16]; the Renyi divergence [17,18,19,20]; and the more general “f-divergence” [17,21,22,23,24,25,26]. These relations are described and discussed hereinafter.

Hellinger divergence and Renyi divergence. In terms of the quantity of Equation (10), the Hellinger divergence of the random variable Y from the random variable X admits the formulation 1m1γmY||X; and the Renyi divergence of the random variable Y from the random variable X admits the formulation 1m1ln[1+γmY||X]. In both these divergences, the parameter m is positive (m>0) and different from one (m1).

Kullback–Leibler divergence. Setting m=1 in Equation (10), observe that γ1Y||X=0. In turn, taking the limit m1—while using L’Hopital’s rule—yields the following limiting result (see Appendix A for the details). The Hellinger divergence 1m1γmY||X and the Renyi divergence 1m1ln[1+γmY||X] converge (as m1) to the quantity

Rlnbrarbrdr. (11)

The quantity appearing in Equation (11) is the Kullback–Leibler divergence of the random variable Y from the random variable X.

f-divergence. The right-hand side of Equation (10) admits the general form

Rφbrarardr, (12)

where φt (t0) is a convex function that satisfies the condition φ1=0. The quantity appearing in Equation (12) is the f-divergence of the random variable Y from the random variable X. Applying the change in variables u=Ar to Equation (12) implies that the f-divergence admits—in terms of the density function Cu of Equation (2)—the formulation 01φCudu.

Different choices of the convex function φt yield different manifestations of the f-divergence (see, for example, [26]). In particular, the choice φt=tm1—with a real parameter m that takes values either in the range m>1, or in the range m<0—yields the quantity of Equation (10). The choice φt=1m1(tm1)—with a positive parameter m that takes values either in the range m<1, or in the range m>1—yields the Hellinger divergence. And the choice φt=tlnt yields the Kullback–Leibler divergence.

The convexity argument that was noted in Section 2.4 (with regard to the quantity of Equation (10)) holds for the f-divergence. Namely, the convexity argument implies the two following properties. (I) The f-divergence is non-negative. (II) The f-divergence is zero if and only if the random variables X and Y are equal in law.

Therefore, via the notion of “coincidence likelihood” (that was described in Section 2.4), the curve Cu of Equation (1) leads to Equation (10). In turn, Equation (10) leads to the four aforementioned divergences.

2.6. Summary & Implementation

In this section, we have established three different quantities that measure the statistical divergence of the random variable Y from the random variable X. The random variable X takes values in the real range R=(rlow,rup), the random variable Y takes values in the real range rlow,rup, and the three quantities are the following.

  • αmY||X=m+1PrX1,,XmY1, where m=1,2,3,.

  • βmY||X=1m+1PrX1,,Xm>Y, where m=1,2,3,.

  • γmY||X=Rbrarm1ardr, where m>1.

The quantities αmY||X and βmY||X can be used for any pair of random variables X and Y. In these quantities, X1,,Xm are m IID copies of the random variable X, and the IID copies are independent of the random variable Y. These quantities characterize the case of a zero statistical divergence collectively. Namely, the random variable Y is equal in law to the random variable X if and only if αmY||X=0 for all m, and βmY||X=0 for all m.

The quantity γmY||X can be used when the random variable X has a density function ar that is positive over the range R, and when the random variable Y has a density function br. For any given m, this quantity characterizes the case of a zero statistical divergence: the random variable Y is equal in law to the random variable X if and only if γmY||X=0. The quantity γmY||X is intimately related to the Hellinger and the Renyi divergences, and it leads to the Kullback–Leibler divergence and to the general f-divergence.

In general, none of the three quantities are symmetric. Namely, in general, the statistical divergence of the random variable Y from the random variable X is not the same as the statistical divergence of the random variable X from the random variable Y. Key properties of the three quantities are summarized in Table 1.

Table 1.

Key properties of the measures established in Section 2—the quantities αmY||X, βmY||X, and γmY||X. For each quantity, the table specifies the values of the underlying parameter m, the range of the quantity, and the characterization of the case where the random variable Y is equal in law to the random variable X. For the quantities αmY||X and βmY||X, the table also specifies the characterization of the following extreme cases: the random variable Y is equal, with probability one (w.p. 1), to the lower-bound value rlow; the random variable Y is equal, with probability one (w.p. 1), to the upper-bound value rup.

Quantity αmY||X βmY||X γmY||X
Parameter m=1,2,3, m=1,2,3, m>1
Range 1αmY||Xm mβmY||X1 0γmY||X<
Y=X(inlaw) αmY||X=0(allm) βmY||X=0(allm) γmY||X=0(anym)
Y=rlow(w.p.1) αmY||X=1 βmY||X=m ——
Y=rup(w.p.1) αmY||X=m βmY||X=1 ——

The implementation of the three quantities, based on empirical data—be it real-world, experimental, simulated, etc.—is practiced as follows. Firstly, given n samples of the random variable X, order these samples increasingly: x1x2xn1xn; in addition, set x0=rlow and xn+1=rup. Secondly, given a set of samples of the random variable Y, calculate the following proportions: πi is the proportion of the samples (of the random variable Y) that are in the range (xi1,xi] (i=1,2,,n+1). Thirdly, calculate the three quantities via the following approximation formulae.

  • αmY||Xm+1i=1n+1i1/2n+1mπi1.

  • βmY||X1m+1i=1n+11i1/2n+1mπi.

  • γmY||Xn+1m1i=1n+1πim1.

Indeed, based on the empirical data, the estimate of the curve Cu is the linear interpolation of the unit-square points C0=0 and C(in+1)=π1++πi (i=1,2,,n+1). In turn, the slopes of the piecewise-linear estimate of the curve Cu are Cu=(n+1)·πi for i1n+1<u<in+1 (i=1,2,,n+1). Consequently, the estimates of the three quantities are given by the above approximation formulae.

3. Socioeconomic Application

Consider a positive random variable W, whose statistical distribution is governed by the density function fw (0<w<). The mean of the random variable W is its first moment, EW=0wfwdw. We assume that the mean is positive and finite, and denote it by μ. This implies that f$w=1μwfw (0<w<) is a density function. We set W$ to be a random variable whose statistical distribution is governed by the density function f$w.

The random variable W$ is positive, and it has a socioeconomic interpretation that is described as follows. Envisage a human society that comprises members with positive personal wealth values, and consider W to be the personal wealth of a randomly sampled member of the society. Now, rather than sampling at random a single member of the society, sample at random a single dollar of the society’s overall wealth (i.e., the aggregate of the personal wealth values of all the society’s members). Then, W$ is the personal wealth of the society member to whom the randomly-sampled dollar belongs.

Whereas the random variable W has no inherent inclination towards large wealth values, the random variable W$ has such an inclination. Indeed, when a single member of the society is sampled at random, it is exactly as likely for any given rich individual, and for any given poor individual, to be the randomly-sampled member. However, when a single dollar is sampled at random from the society’s overall wealth, it is more likely that this randomly-sampled dollar belongs to a rich member of the society than to a poor member of the society.

This section presents a socioeconomic application of the general framework that was established in Section 2: measuring the statistical divergence of the random variable W$ from the random variable W. Throughout this section, Fw=PrWw (0<w<) denotes the distribution function of the random variable W. In addition, the density function fw is assumed to be positive, and hence the distribution function Fw is increasing, and it has an inverse function F1u (0<u<1) that is also increasing.

3.1. Lorenz Curve

In order to use the framework of Section 2, we set X=W and Y=W$ (the equalities being in law). In turn, the underlying range is the positive half-line R=0, and: the distribution function of X is Ar=Fr; the density function of X is ar=fr; the density function of Y is br=f$r=1μrfr.

Noting that br/ar=r/μ, Equation (2) implies that the derivative of the curve Cu is

Cu=1μF1u. (13)

As the inverse function F1u is increasing, so is the derivative Cu, and hence the curve Cu is convex. In turn, the convexity of the curve Cu implies that its graph resides below the diagonal line of the unit square: Cu<u (0<u<1).

The socioeconomic interpretation of the random variable W$ induces a socioeconomic interpretation of the curve Cu. Indeed, with regard to the underlying human society, the aggregate wealth held by the poor 100u% of the society’s members constitutes 100Cu% of the society’s overall wealth. Termed the “Lorenz curve” in honor of the American statistician Max Lorenz [4], the curve Cu is widely applied in economics and in the social sciences to investigate wealth and income distributions, as well as to quantify socioeconomic inequality [5,6,7].

As noted after Equation (13), the Lorenz curve is bounded from above by the diagonal line of the unit square: Cu<u (0<u<1). In the space of Lorenz curves, the diagonal line characterizes the socioeconomic state of perfect equality: a society in which all the society members have an identical personal wealth value (which is positive).

Evidently, the Lorenz curve is bounded from below by the zero line of the unit square: Cu>0 (0<u<1). In the space of Lorenz curves, the zero line characterizes the socioeconomic state of perfect inequality: a society in which 0% of the society’s members are in possession of 100% of the society’s overall wealth. The state of perfect inequality is attainable only when the society’s population is infinitely large.

Perhaps the simplest way to envisage the socioeconomic state of perfect inequality is via the following ‘Pharaonic example’. Indeed, consider a Pharaonic society comprising n members, in which one single member—the Pharaoh—has personal wealth 1, and all other members have personal wealth 0. Then, the socioeconomic state of perfect inequality is attained in the infinite-population limit n of the Pharaonic society.

The closer the Lorenz curve is to the diagonal line of the unit square, the more egalitarian the underlying human society. Antithetically, the closer the Lorenz curve is to the zero line of the unit square, the less egalitarian the underlying human society. Hence, the Lorenz curve provides a geometric approach to quantifying socioeconomic inequality.

3.2. Gini Index

Named in honor of the Italian economist Corrado Gini [43,44], the “Gini index” is arguably the principal gauge of socioeconomic inequality in economics and in the social sciences [45,46,47,48]. The scientific applications of the Gini index extend well beyond socioeconomic inequality (for recent such applications see [48] and references therein).

In terms of the Lorenz curve, the Gini index is defined as follows: it is twice the area captured between the Lorenz curve and the unit-square’s diagonal line, 201uCudu. The Gini index displays the three following properties. (I) It takes values in the range [0,1]. (II) It attains its lower bound 0 if and only if the underlying human society is in the socioeconomic state of perfect equality. (III) It attains its upper bound 1 if and only if the underlying human society is in the socioeconomic state of perfect inequality.

The definition of the Gini index implies that, with regard to the unit square (in which the graph of the Lorenz curve resides), the Gini index is the difference between the area above the Lorenz curve and the area below the Lorenz curve. Hence, Equation (3) implies that the Gini index is

ΔW$||W=PrW$>WPrW$W. (14)

As described above, the random variables W and W$ manifest two different wealth-sampling methods. The random variable W$ is inclined towards sampling large wealth values, and the Gini index quantifies this inclination.

3.3. Wealth Maxima & Minima

Having addressed the first-order quantity ΔW$||W=α1W$||W=β1W$||W in the previous subsection, we now move from the first order m=1, and address the higher-order quantities αmY||X and βmY||X (m=2,3,4,). In this subsection, W1,,Wm+1 denote m+1 IID copies of the random variable W.

A derivation detailed in Appendix A asserts that Equation (4) yields the quantity

αmW$||W=EmaxW1,,Wm+1EWEW. (15)

Namely, the quantity αmW$||W is the overshoot of the mean of the maximal random variable maxW1,,Wm+1, measured relative to the mean of the random variable W. The quantity αmW$||W, via the maximal random variable maxW1,,Wm+1, sets its focus on the occurrence of large wealth values. It is shown in Appendix A that the quantity αmW$||W takes values in the range [0,m].

A derivation detailed in Appendix A asserts that Equation (5) yields the quantity

βmW$||W=EWEminW1,,Wm+1EW. (16)

Namely, the quantity βmW$||W is the undershoot of the mean of the minimal random variable minW1,,Wm+1, measured relative to the mean of the random variable W. The quantity βmW$||W, via the minimal random variable minW1,,Wm+1, sets its focus on the occurrence of small wealth values. It is shown in Appendix A that the quantity βmW$||W takes values in the range [0,1].

3.4. Gini Index Revisited

As in the previous subsection, in this subsection W1,,Wm+1 denotes a sample comprising m+1 IID copies of the random variable W. The sample’s maximal wealth value is maxW1,,Wm+1, and the sample’s minimal wealth value is minW1,,Wm+1. In turn, the sample’s range is the gap between the sample’s maximal and minimal wealth values, maxW1,,Wm+1minW1,,Wm+1.

Summing up Equations (15) and (16), and denoting the sum by ρmW$||W=αmW$||W+βmW$||W, yields the following quantity:

ρmW$||W=EmaxW1,,Wm+1minW1,,Wm+1EW. (17)

Namely, the quantity ρmW$||W is the mean of the range of the sample W1,,Wm+1, measured relative to the mean of the random variable W. As the quantity αmW$||W takes values in the range [0,m], and as the quantity βmW$||W takes values in the range [0,1], the quantity ρmW$||W takes values in the range [0,m+1].

Consider now the order m=1. With regard to this order, as noted above, the quantities α1W$||W and β1W$||W coincide and yield the Gini index ΔW$||W. In turn, the Gini index is the arithmetic average of the quantities α1W$||W and β1W$||W, and hence ΔW$||W=12ρ1W$||W. With regard to the order m=1, the sample is W1,W2 and hence the sample’s range is maxW1,W2minW1,W2=W1W2. Consequently, Equation (17) implies that

ΔW$||W=EW1W22EW. (18)

Namely, the Gini index can be represented as the mean absolute deviation (MAD) of two IID copies of the random variable W, measured relative to twice the mean of the random variable W.

3.5. Wealth Moments

Having addressed the quantities αmY||X and βmY||X, we now turn to address the quantity γmY||X. Setting X=W and Y=W$ (the equalities being in law), a derivation detailed in Appendix A asserts that Equation (10) yields the quantity

γmW$||W=EWmEWmEWm. (19)

Namely, the quantity γmW$||W is the overshoot of the moment of order m of the random variable W, measured relative to the mth power of the mean of the random variable W.

For any m that is larger than one, a Jensen-inequality argument that is detailed in Appendix A affirms the fact that the quantity γmW$||W is non-negative, γmW$||W0. Moreover, the Jensen-inequality argument asserts that the quantity γmW$||W is zero if and only if the random variable W is constant with probability one: γmW$||W=0PrW=μ=1. As m is larger than one, note that: due to the moment EWm, the quantity γmW$||W is sensitive to large wealth values.

For m=2 Equation (19) yields the ratio γ2W$||W=VarW/EW2, where VarW is the variance of the random variable W. The coefficient of variation (CV) of the random variable W is its ‘noise-to-signal’ ratio: the ratio of its standard deviation to its mean, VarW/EW. Hence, γ2W$||W is the squared CV of the random variable W.

Setting X=W$ and Y=W (the equalities being in law), a derivation detailed in Appendix A asserts that Equation (10) yields the quantity

γmW||W$=EW1mEW1mEW1m. (20)

Namely, the quantity γmW||W$ is the overshoot of the moment of order 1m of the random variable W, measured relative to the (1m)th power of the mean of the random variable W.

For any m that is larger than one, a Jensen-inequality argument that is detailed in Appendix A affirms the fact that the quantity γmW||W$ is non-negative, γmW||W$0. Moreover, the Jensen inequality argument asserts that the quantity γmW||W$ is zero if and only if the random variable W is constant with probability one: γmW||W$=0PrW=μ=1. As m is larger than one, note that: due to the moment EW1m, the quantity γmW||W$ is sensitive to small wealth values.

Observing Equations (19) and (20), it is evident that the quantities appearing in these equations are linked by the following relations: γmW||W$=γ1mW$||W, and γmW$||W=γ1mW||W$. These linkages are manifestations of the general ‘flipping formulae’ that were described in Section 2.4.

3.6. Inequality Indices

Inequality indices are gauges that quantify the socioeconomic inequality of human societies [1,2,3,8,9]. The aforementioned Gini index is perhaps the best known inequality index. A general inequality index is a ‘socioeconomic score’ that takes values in the range [0,1] and that meets the three following properties. (I) The inequality index yields its lower-bound score 0 if and only if the society is in the socioeconomic state of perfect equality. (II) If the society is in the socioeconomic state of perfect inequality, then the inequality index yields its upper-bound score 1. (III) The inequality index is invariant with respect to the particular currency via which the personal wealth values of the society members are measured (e.g., Dollar or Euro).

With regard to the random variable W, the first inequality-index property can be formulated as determinism: the inequality index yields its lower-bound score 0 if and only if the random variable W is deterministic, i.e., if and only if it is constant with probability one. In addition, with regard to the random variable W, the third inequality-index property can formulated as scale invariance: the inequality index is invariant with respect to the change-of-scale Ws·W, where s is an arbitrary positive scale parameter.

The following quantities all satisfy the first and the third inequality-index properties. the quantities αmW$||W and βmW$||W of Equations (15) and (16); the quantity ρmW$||W of Equation (17); and the quantities γmW$||W and γmW||W$ of Equations (19) and (20). In fact, transformations of these quantities yield inequality indices, and these inequality indices are specified in Table 2. For a comprehensive study of the inequality indices that correspond to the quantities αmW$||W, βmW$||W, and ρmW$||W the readers are referred to [60]. For a comprehensive study of the inequality indices that correspond to the quantities γmW$||W and γmW||W$, the readers are referred to [60,61].

Table 2.

Inequality indices that are obtained via transformations of the quantities presented and discussed in Section 3. For each quantity, the table specifies the transformation and the resulting inequality index. The values of the underlying parameter m are as in Table 1: m=1,2,3, for the quantities αmW$||W and βmW$||W, as well as for the quantity ρmW$||W; and m>1 for the quantities γmW$||W and γmW||W$.

Quantity Transformation Inequality Index
αmW$||W 1mαmW$||W EmaxW1,,Wm+1EWmEW
βmW$||W βmW$||W EWEminW1,,Wm+1EW
ρmW$||W 1m+1ρmW$||W EmaxW1,,Wm+1minW1,,Wm+1(m+1)EW
ine γmW$||W γmW$||W1+γmW$||W 1EWmEWm
γmW||W$ γmW||W$1+γmW||W$ 1EW1mEW1m

3.7. Summary & Implementation

This section has presented a socioeconomic application of the general framework that was established in Section 2. The application gave rise to well-known socioeconomic notions including the Lorenz curve, the Gini index and its generalizations, and general inequality indices. Underlying this socioeconomic application is the following setting: a human society comprising members with positive personal wealth values.

Wealth is measured via two different statistical sampling methods. On the one hand, a single member of the society is sampled at random, and W is the wealth of the randomly-sampled member. On the other hand, a single dollar is sampled at random (from the aggregate wealth of all the society members), and W$ is the wealth of the society member to whom the randomly-sampled dollar belongs. Evidently, the random variable W$ is inclined towards sampling the richer members of the society.

The socioeconomic application focused on measuring the statistical divergence of the random variable W$ from the random variable W. This statistical divergence quantifies the extent by which the rich deviate from the rest of the society members. This statistical divergence was shown to be zero if and only if the human society is in the socioeconomic state of perfect equality: all the society members share a common (positive) personal wealth value.

In terms of the random variables W and W$, the state of perfect equality is characterized as follows: both the random variables W and W$ are equal, with probability one, to a fixed positive wealth value. Thus, from a probabilistic perspective, perfect equality manifests determinism. In turn, the statistical divergence of the random variable W$ from the random variable W quantifies the deviation from the ‘deterministic benchmark’.

The statistical divergence of the random variable W$ from the random variable W was measured by four quantities: the quantities αmW$||W and βmW$||W, as well as their sum ρmW$||W=αmW$||W+βmW$||W; and the quantity γmW$||W. In addition, the statistical divergence of the random variable W from the random variable W$ was measured by the quantity γmW||W$. As detailed in Table 2 above, these five quantities can be mapped—via one-to-one transformations—to socioeconomic inequality indices of the underlying human society.

The implementation of the five quantities, based on empirical data—be it real-world, experimental, simulated, etc.—is practiced as follows: Firstly, given n samples of the random variable W, order these samples increasingly: w1w2wn1wn. Secondly, calculate the corresponding wealth proportions: πi=wi/(w1++wn) (i=1,2,,n). Thirdly, calculate the five quantities via the following approximation formulae.

  • αmW$||Wm+1i=1ni1/2nmπi1.

  • βmW$||W1m+1i=1n1i1/2nmπi.

  • ρmW$||Wm+1i=1ni1/2nm+1i1/2nmπi.

  • γmW$||Wnm1i=1nπim1.

  • γmW||W$nmi=1nπi1m1.

Indeed, based on the empirical data, the estimate of the Lorenz curve Cu is the linear interpolation of the unit-square points C0=0 and C(in)=π1++πi (i=1,2,,n). In turn, the slopes of the piecewise-linear estimate of the Lorenz curve Cu are Cu=n·πi for i1n<u<in (i=1,2,,n). Consequently, the estimates of the five quantities are given by the above approximation formulae.

Last but not least, we emphasize that the socioeconomic application presented in this section is not at all confined to the underlying socioeconomic setting. Indeed, the socioeconomic application can be used with regard to any non-negative random variable W that has a positive and finite mean [8,9].

4. Renewal Application

Consider a positive random variable T, whose statistical distribution is governed by the survival function F¯t=PrT>t (0<t<). The mean of the random variable T is given by the integral of its survival function, ET=0F¯tdt. We assume that the mean is positive and finite, and (as in Section 3) denote it by μ. This implies that frest=1μF¯t (0<t<) is a density function. We set Tres to be a random variable whose statistical distribution is governed by the density function frest.

The random variable Tres is positive, and it has a renewal interpretation that is described as follows. Consider a renewal process that is generated from the random variable T [49,50,51]. Specifically, the renewal process is a sequence of temporal points 0=τ0<τ1<τ2<τ3< that are termed “renewal epochs”, and that are constructed as follows: the temporal durations between consecutive renewal epochs are IID copies of the random variable T. Now, standing at a fixed time point t, denote by Δt the temporal duration between the time point t and the first renewal epoch after the time point t. A key result of the theory of renewal processes asserts that [51]: the random variable Δt converges in law, in the limit t, to the random variable Tres—which is termed the “residual lifetime” of the random variable T.

The random variables T and Tres manifest two different observations of the renewal process. To illustrate these different observations, consider the renewal process as manifesting the time epochs at which buses (of a certain bus line) arrive at a given bus station. The random variable T manifests the waiting time of a passenger that reaches the bus station just after a bus has left the station. The random variable Tres manifests the waiting time of a passenger that reaches the bus station at an arbitrary time point.

This section presents a renewal application of the general framework that was established in Section 2: measuring the statistical divergence of the random variable Tres from the random variable T. In this section, the survival function F¯t is assumed to be smooth, and ft=F¯t (0<t<) denotes the corresponding density function. Namely, ft is the density function of the random variable T.

4.1. Hazard Rate

The likelihood that the random variable T is realized at the positive time point t is ft. In turn, the conditional likelihood that the random variable T is realized at the positive time point t—given the information that T was not realized up to the time point t—is ht=ft/F¯t. The conditional likelihood ht, as a function of the temporal variable t, is termed the “hazard rate” and the “failure rate” of the random variable T.

The hazard rate ht characterizes the statistical distribution of the random variable T. Indeed, in terms of the hazard rate ht, the survival function of the random variable T admits the representation F¯t=exp[0thudu]. The hazard rate ht is a key statistical tool that is widely applied in survival analysis [62,63,64] and in reliability engineering [65,66,67]. Here, as shall now be shown, the hazard rate ht emerges naturally in the context of the statistical divergence of the random variable Tres from the random variable T.

In order to use the framework of Section 2, we set X=T and Y=Tres (the equalities being in law). In turn, the underlying range is the positive half-line R=0,, and: the distribution function of X is Ar=Fr=1F¯r; the density function of X is ar=fr; the density function of Y is br=fresr=1μF¯r.

Noting that br/ar=1/[μ·hr], Equation (2) implies that the derivative of the curve Cu is

Cu=1μ1hF1u. (21)

The curve Cu coincides with the diagonal line of the unit square, Cu=u (0<u<1), if and only if the curve’s derivative is identically one, Cu=1 (0<u<1). In turn, Equation (21) implies that the curve’s derivative is identically one if and only if the hazard rate is flat: ht=1/μ (0<t<). This flat hazard rate is equivalent to the exponential survival function F¯t=expt/μ (0<t<).

The above argumentation affirms the well-known fact that the residual lifetime Tres (of the random variable T) is equal in law to the random variable T if and only if this random variable is exponentially distributed. Thus, in effect, the statistical divergence of the random variable Tres from the random variable T quantifies how “non-exponential” the statistical distribution of the random variable T is.

When the random variable T is exponentially distributed, then the renewal process that it generates is the Poisson process [52,53,54]. Hence, from a renewal perspective: the statistical divergence of the random variable Tres from the random variable T quantifies the following: the deviation of the renewal process that is generated by the random variable T from the ‘Poisson benchmark’.

4.2. Increasing and Decreasing Hazard Rates

In reliability engineering, two particular classes of random variables are distinguished as significantly important [65]. One is the increasing failure rate (IFR) class, which comprises all positive random variables whose hazard rates are increasing functions. The other is the decreasing failure rate (DFR) class, which comprises all positive random variables whose hazard rates are decreasing functions.

The IFR class is used to model the lifetimes of systems that age with time. Namely, a system is aging if the likelihood that the system will fail grows as the age of the system grows. Aging systems are all around us, and examples of such systems include buildings, infrastructures, machines, cars, ships, airplanes, and even our very own human bodies.

If the random variable T belongs to the IFR class, then its hazard rate ht is an increasing function. In this IFR scenario, Equation (21) implies that the derivative Cu is decreasing, and hence the curve Cu is concave. In turn, the concavity of the curve Cu implies that its graph resides above the diagonal line of the unit square: Cu>u (0<u<1).

The DFR class is used to model the lifetimes of phenomena that anti-age with time. Namely, a phenomenon is anti-aging if the likelihood that the phenomenon will cease diminishes as the age of the phenomenon grows [68]. Anti-aging phenomena are encountered in our culture and in our technologies, e.g., the symphonies of Beethoven, the writings of Shakespeare, the Georgian calendar, the English alphabet, cutlery, and the wheel. Indeed, the longer we listen to Beethoven and the longer we use cutlery, the greater the likelihood that we shall keep on doing so.

If the random variable T belongs to the DFR class, then its hazard rate ht is a decreasing function. In this DFR scenario, Equation (21) implies that the derivative Cu is increasing, and hence the curve Cu is convex. In turn, the convexity of the curve Cu implies that its graph resides below the diagonal line of the unit square: Cu<u (0<u<1).

As noted in Section 2.1, the curve Cu splits the unit square into two sets: the square’s points that are above the curve, and the square’s points that are below the curve. Moreover, Equation (3) asserts that the difference between the area of the upper set and the area of the lower set is the quantity

ΔTres||T=PrTres>TPrTresT. (22)

The quantity ΔTres||T takes values in the range [1,1], and for the IFR and DFR classes, it gauges the “deviation from exponentiality”.

Indeed, if the random variable T belongs to the IFR class, then the quantity ΔTres||T takes values in the range [1,0], and it displays the following properties. (I) The quantity ΔTres||T attains its upper bound 0 if and only if the random variable T is exponentially distributed. (II) The smaller the quantity ΔTres||T, the more “non-exponential” the statistical distribution of the random variable T is.

Similarly, if the random variable T belongs to the DFR class, then the quantity ΔTres||T takes values in the range [0,1], and it displays the following properties. (I) The quantity ΔTres||T attains its lower bound 0 if and only if the random variable T is exponentially distributed. (II) The larger the quantity ΔTres||T, the more “non-exponential” the statistical distribution of the random variable T is.

In statistical physics, the exponential distribution is the paradigmatic model for ‘regular’ relaxation. The IFR and DFR classes offer general models for ‘anomalous’ relaxation [55,56,57,58]. Specifically, the IFR class is a general model for super-exponential relaxation, and the IFR class is a general model for sub-exponential relaxation. In turn, the quantity ΔTres||T quantifies the deviation of anomalous relaxation—super-exponential and sub-exponential—from the ‘regular-relaxation benchmark’.

4.3. Weibull Example

To illustrate the quantity ΔTres||T in the context of the IFR and DFR classes, consider the example of a Weibull-distributed random variable T [69,70,71]. This example is characterized by the survival function F¯t=exp[(t/s)ϵ] (0<t<), where s is a positive scale parameter, and where ϵ is a positive exponent. Equivalently, this example is characterized by the hazard rate ht=ctϵ1 (0<t<), where c=ϵ/sϵ is a positive coefficient.

The Weibull hazard rate exhibits the following behaviors. For exponent values ϵ<1 the hazard rate is decreasing, and hence the random variable T belongs to the DFR class. At the exponent value ϵ=1 the hazard rate is flat, and hence the random variable T is exponentially distributed—with a mean that equals the scale parameter, μ=s. For exponent values ϵ>1, the hazard rate is increasing, and hence the random variable T belongs to the IFR class.

A calculation using a general result to be presented below (Equation (25)) yields—for the Weibull-distributed random variable T—the quantity

ΔTres||T=1211ϵ. (23)

Note that the right-hand side of Equation (23) depends only on the Weibull exponent ϵ (i.e., it does not depend on the scale parameter s). Denoting the right-hand side of Equation (23) by gϵ, this function of the Weibull exponent ϵ decreases from the level limϵ0gϵ=1 to the level limϵgϵ=1. In addition, this function passes through the origin at the exponent value one, g1=0.

Thus, for the Weibull-distributed random variable T, the quantity ΔTres||T exhibits the following behaviors. In the DFR range (ϵ<1), the quantity ΔTres||T is positive, and this quantity attains its upper bound 1 in the Weibull limit ϵ0. The quantity ΔTres||T is zero if and only if the Weibull exponent ϵ is one, which characterizes the case of an exponentially distributed random variable T. In the IFR range (ϵ>1), the quantity ΔTres||T is negative, and this quantity attains its lower bound 1 in the Weibull limit ϵ.

As noted in Section 4.2, the exponential distribution is the paradigmatic model for ‘regular’ relaxation in statistical physics. The paradigmatic model for ‘anomalous’ relaxation is the Weibull distribution [55,56,57,58]. The Weibull distribution spans sub-exponential anomalous relaxation (the DFR range ϵ<1), super-exponential anomalous relaxation (the IFR range ϵ>1), and ‘regular’ exponential relaxation (the ϵ=1 boundary, which separates the DFR and the IFR ranges). So, with regard to the Weibull model of anomalous relaxation, the quantity ΔTres||T of Equation (23) quantifies the deviation from the ‘regular-relaxation benchmark’.

4.4. Duration Maxima and Minima

Having addressed and exemplified the first-order quantity ΔTres||T=α1Tres||T=β1Tres||T in the previous subsections, we now elevate from the first order m=1, and address the higher-order quantities αmTres||T and βmTres||T (m=2,3,4,). In this subsection, T1,,Tm+1 denote m+1 IID copies of the random variable T.

A derivation detailed in Appendix A asserts that Equation (4) yields the quantity

αmTres||T=m+1EmaxT1,,Tm+1EmaxT1,,TmET1. (24)

The quantity αmTres||T is based on the difference between the mean of the maximal random variable maxT1,,Tm+1 and the mean of the maximal random variable maxT1,,Tm. The quantity αmTres||T, via the maximal random variables maxT1,,Tm+1 and maxT1,,Tm, sets its focus on the occurrence of large values of the duration T. As noted in Section 2.2, the quantity αmTres||T takes values in the range [1,m].

A derivation detailed in Appendix A asserts that Equation (5) yields the quantity

βmTres||T=1m+1EminT1,,Tm+1ET. (25)

The quantity βmTres||T is based on the mean of the minimal random variable minT1,,Tm+1. The quantity βmTres||T, via the minimal random variable minT1,,Tm+1, sets its focus on the occurrence of small values of the duration T. As noted in Section 2.2, the quantity βmTres||T takes values in the range [m,1].

4.5. Duration-Wealth Linkages

This section addresses the statistical divergence of the random variable Tres from the random variable T, where Tres is a random variable whose statistical distribution is governed by the density function frest=1μF¯t (0<t<). The previous section addressed the statistical divergence of the random variable T$ from the random variable T, where T$ is a random variable whose statistical distribution is governed by the density function f$t=1μtft (0<t<).

In this subsection we will show that the quantities αmTres||T and βmTres||T of this section are intimately linked to the quantities αmT$||T and βmT$||T of the previous section. Indeed, observing Equation (24) on the one hand, and Equation (15) on the other hand, it follows that

αmTres||T=m+1αmT$||Tαm1T$||T1. (26)

Moreover, observing Equation (25) on the one hand, and Equation (16) on the other hand, it follows that

βmTres||T=m+1βmT$||Tm. (27)

In particular, setting m=1 in Equations (26) and (27) yields

ΔTres||T=2ΔT$||T1. (28)

Specifically, Equation (28) follows from Equations (26) and (27) by noting that α0T$||T=0 (indeed, setting m=0 in Equation (15) yields α0T$||T=0), and, by using Equation (6) (with regard to the divergence of Tres from T, and with regard to the divergence of T$ from T). Recall that the quantity ΔT$||T that appears on the right-hand side of Equation (28) is the Gini index of the random variable T.

As described in the opening of this section, the random variable Tres has a renewal interpretation. And, as described in the opening of the previous section, the random variable T$ has a wealth interpretation. In fact, the random variable T$ also has a renewal interpretation which is described as follows.

Consider a renewal process that is generated from the random variable T (as detailed in the opening of this section). Standing at a fixed positive time point t, denote by Ct the temporal duration of the renewal interval that ‘covers’ the time point t. Namely, Ct is the temporal duration between the following renewal epochs: the last renewal epoch before the time point t, and the first renewal epoch after the time point t. A key result of the theory of renewal processes asserts that [51]: the random variable Ct converges in law, in the limit t, to the random variable T$. In this renewal context, the random variable T$ is termed the “total lifetime” of the random variable T.

The random variables T and T$ manifest two different observations of the renewal process. To illustrate these different observations, consider (as in the opening of this section) the renewal process as manifesting the time epochs at which buses (of a certain bus line) arrive at a given bus station. The random variable T manifests the waiting time between consecutive bus arrivals—as observed by a passenger that reaches the bus station just after a bus has left the station. The random variable T$ manifests the waiting time between consecutive bus arrivals—as observed by a passenger that reaches the bus station at an arbitrary time point.

4.6. Inequality Indices Revisited

Equations (26) and (27) established linkages between the quantities αmTres||T and βmTres||T of this section, and the quantities αmT$||T and βmT$||T of the previous section. In the previous section, it was shown that transformations of the quantities αmT$||T and βmT$||T are inequality indices of the random variable T. Thus, the following question arises naturally: are there transformations of the quantities αmTres||T and βmTres||T that are also inequality indices? This subsection shall answer the question affirmatively.

The following transformation of the quantity αmTres||T is an inequality index:

αmTres||T+1m+1=EmaxT1,,Tm+1EmaxT1,,TmET. (29)

Indeed, setting X=T and Y=Tres in Equation (4), it follows that the term appearing in Equation (29) is the probability PrX1,,XmY—which, of course, takes values in the range 0,1. The right-hand side of Equation (29) attains the lower bound 0 if and only if the random variable T is constant with probability one. In addition, the right-hand side of Equation (29) is invariant with respect to changes-of-scale of the random variable T.

The following transformation of the quantity βmTres||T is an inequality index:

βmTres||T+mm+1=ETEminT1,,Tm+1ET. (30)

Indeed, setting X=T and Y=Tres in Equation (5), it follows that the term appearing in Equation (30) is the probability PrX1,,Xm>Y, which, of course, takes values in the range 0,1. The right-hand side of Equation (30) attains the lower bound 0 if and only if the random variable T is constant with probability one. Moreover, the right-hand side of Equation (30) is invariant with respect to changes-of-scale of the random variable T.

Equation (27) implies that the left-hand side of Equation (30) is equal to the quantity βmT$||T. In the previous section, we saw that the quantity βmT$||T is an inequality index, and hence Equation (30) does not yield a ‘new’ inequality index. On the other hand, Equation (29) does yield a ‘new’ inequality index, i.e., an inequality index that was not encountered in the previous section.

4.7. Summary and Implementation

This section has presented a renewal application of the general framework that was established in Section 2. Underlying the renewal application is the following setting: a renewal process whose inter-renewal durations—i.e., the temporal durations between the process’ consecutive renewal epochs—are IID copies of the random variable T.

The waiting-time till the next renewal epoch was observed via two different temporal perspectives. On the one hand, an observer was placed right after a renewal epoch; the waiting time of this observer was the random variable T. On the other hand, an observer was placed at an arbitrary time point; the waiting time of this observer was the random variable Tres—the “residual lifetime” of the random variable T.

The renewal application focused on measuring the statistical divergence of the random variable Tres from the random variable T. In effect, this statistical divergence quantifies the extent by which the distribution of the random variable T deviates from the exponential distribution. Indeed, the statistical divergence of Tres from T was shown to be zero if and only if the random variables T and Tres share a common exponential distribution.

The renewal process whose inter-renewal durations are exponentially distributed is the Poisson process. Hence, from a renewal perspective: the statistical divergence of Tres from T quantifies the deviation of renewal processes from the ‘Poisson benchmark’. In statistical physics the exponential distribution is the paradigmatic model of ‘regular’ relaxation. Hence, from a statistical-physics perspective: the statistical divergence of Tres from T quantifies the deviation of anomalous relaxation from the ‘regular-relaxation benchmark’.

The statistical divergence of the random variable Tres from the random variable T was measured using two quantities, αmTres||T and βmTres||T. As shown above, these two quantities can be mapped—via one-to-one transformations—to inequality indices of the random variable T. The measurement of the statistical divergence of the random variable Tres from the random variable T by the quantity γmTres||T is detailed in Appendix A. Also detailed in Appendix A is the measurement of the statistical divergence of the random variable T from the random variable Tres via the quantity γmT||Tres.

The implementation of the aforementioned quantities, based on empirical data—be it real-world, experimental, simulated, etc.—is practiced as follows. Firstly, given n samples of the random variable T, order these samples increasingly: t1t2tn1tn. Secondly, set π0=0 and calculate the following proportions: πi=ti/(t1++tn) (i=1,2,,n). Thirdly, calculate the quantities via the following approximation formulae.

  • αmTres||Tm+1i=1ni1/2nmni+1πiπi11.

  • βmTres||T1m+1i=1n1i1/2nmni+1πiπi1.

  • γmTres||Tnm1i=1nni+1πiπi1m1.

Indeed, based on the empirical data, it is shown in Appendix A that the estimate of the curve Cu is the linear interpolation of the unit-square points C0=0 and C(in)=π1++πi+(ni)πi (i=1,2,,n). In turn, the slopes of the piecewise-linear estimate of the curve Cu are Cu=nni+1πiπi1 for i1n<u<in (i=1,2,,n). Consequently, the estimates of the aforementioned quantities are given by the above approximation formulae. An alternative way to implement the quantities αmTres||T and βmTres||T is the following: use Equations (26) and (27) to represent these quantities in terms of the corresponding quantities of Section 3, and use the approximation formulae of Section 3.7 with regard to the latter quantities.

5. Conclusions

This paper addressed the topic of statistical divergence. To that end, a pair of random variables, which take values in a common real range, were considered: a ‘benchmark’ random variable X, and a random variable Y of interest. The focus was put on gauging the distance of the statistical distribution of Y from the statistical distribution of X.

A general framework of statistical divergence was established in Section 2, and was summarized in Section 2.6. The general framework was constructed in a simple and transparent fashion, and the gauges it yielded included the Hellinger divergence, the Renyi divergence, the Kullback–Leibler divergence, and the f-divergence. Two applications of the general framework were then presented.

The first application was to the topic of socioeconomic inequality. This application was detailed in Section 3, and was summarized in Section 3.7. The fruits that this application yielded included the Lorenz curve, socioeconomic inequality indices, the Gini index, and generalizations of the Gini index.

The second application was to the topic of renewal processes. This application was detailed in Section 4, and was summarized in Section 4.7. The fruits that this application yielded included gauges that quantify the divergence of renewal processes from the Poisson process, gauges that quantify the divergence of anomalous relaxation from regular relaxation, and further generalizations of the Gini index.

Empirical applications of the general framework are beyond the scope of this paper. Each of the aforementioned summary subsections provided ‘implementation formulae’. Namely, with regard to given empirical data—be it real-world, experimental, simulated, etc.—the implementation formulae specify how to calculate the gauges that were established and presented here.

Theoretically, this paper offers its readers a transparent and rather general path to statistical divergence, paths thereof to further topics, and deep linkages between the different (and seemingly unrelated) topics addressed. Practically, this paper offers its readers potent gauges of statistical divergence, and explicit formulae that specify how to implement the gauges. Theoretically and practically alike, this paper provides a wide and multi-disciplinary perspective on statistical divergence.

Appendix A

This section provides detailed derivations of various results that were stated along the paper.

Appendix A.1. Derivations: Section 2

Appendix A.1.1. The Area under the Curve C(u)

Using the change of variables u=Ar, Equation (1) implies that

01Cudu=01BA1udu=RBrardr. (A1)

For X and Y that are mutually independent, conditioning on the value of X implies that

PrYX=RPrYr|X=rardr=RPrYrardr=RBrardr. (A2)

Combined together, Equations (A1) and (A2) imply that

01Cudu=PrYX. (A3)

Appendix A.1.2. The Moment E[Um]

The density function of Equation (2), and the change of variables u=Ar, imply that

E[Um]=01umCudu=01umbA1uaA1udu=RArmbrarardr=RArmbrdr. (A4)

For X1,,Xm that are IID copies of the random variable X, which are independent of the random variable Y, conditioning on the value of Y implies that

PrX1,,XmY=RPrX1,,Xmr|Y=rbrdr=RPrX1,,Xmrbrdr=RPrX1rPrXmrbrdr=RArmbrdr. (A5)

Combined together, Equations (A4) and (A5) imply that

E[Um]=PrX1,,XmY. (A6)

Appendix A.1.3. The Moment E[(1 − U)m]

Denote by A¯r=PrX>r (rR) the survival function of the random variable X, and note that A¯r=1Ar. The density function of Equation (2), and the change of variables u=Ar, imply that

E[(1U)m]=011umCudu=01(1u)mbA1uaA1udu=R1Armbrarardr=RA¯rmbrdr. (A7)

For X1,,Xm that are IID copies of the random variable X, and that are independent of the random variable Y, conditioning on the value of Y implies that

PrX1,,Xm>Y=RPrX1,,Xm>r|Y=rbrdr=RPrX1,,Xm>rbrdr=RPrX1>rPrXm>rbrdrRA¯rmbrdr. (A8)

Combined together, Equations (A7) and (A8) imply that

E[(1U)m]=PrX1,,Xm>Y. (A9)

Appendix A.1.4. Derivation of Equations (7) and (8)

Consider a smooth and bounded function ψu that is defined over the unit interval (0<u<1). As Cu is the density function of the random variable U, and as C*u=1 is the density function of the random variable U*, we have

EψUEψU*=01ψuCudu01ψuC*udu=01Cu1ψudu (A10)

(using integration by parts)

=Cuuψu0101Cuuψudu (A11)

(using the boundness of the function ψu, and the fact that Cu is a distribution function over the unit interval)

=01uCuψudu (A12)

(using Equation (1) and the change-of-variables u=Ar)

=01uBA1uψudu=RArBrψArardr. (A13)

Hence, Equations (A10)–(A13) yield

EψUEψU*=RArBrψArardr. (A14)

Set ψu=um, where m is a positive integer. The definition of the quantity αmY||X and Equation (A14) imply that

αmY||X=E[Um]E[U*m]=RArBrmArm1ardr. (A15)

Set maxX1,,Xm to be the maximum of m IID copies of the random variable X. The distribution function of the random variable maxX1,,Xm is

PrmaxX1,,Xmr=PrX1,,Xmr=PrX1rPrXmr=PrXrm=Arm. (A16)

Differentiating Equation (A16) implies that the density function of the random variable maxX1,,Xm is

aˇmr=mArm1ar. (A17)

Substituting Equation (A17) into Equation (A15) yields

αmY||X=RArBraˇmrdr. (A18)

Set ψu=1um, where m is a positive integer. The definition of the quantity βmY||X and Equation (A14) imply that

βmY||X=E[(1U*)m]E[(1U)m]=RArBrm1Arm1ardr=RArBrm1Arm1ardr. (A19)

Set minX1,,Xm to be the minimum of m IID copies of the random variable X. The survival function of the random variable minX1,,Xm is

PrminX1,,Xm>r=PrX1,,Xm>r=PrX1>rPrXm>r=PrX>rm=1PrXrm=1Arm. (A20)

Differentiating Equation (A20) implies that the density function of the random variable minX1,,Xm is

a^mr=m1Arm1ar. (A21)

Substituting Equation (A21) into Equation (A19) yields

βmY||X=RArBra^mrdr. (A22)

Appendix A.1.5. The Coincidence Likelihood Lm[U] and the Kullback–Leibler Limit

The density function of Equation (2), and the change in variables u=Ar imply that

LmU=01Cumdu=01bA1uaA1umdu=Rbrarmardr. (A23)

In turn, as LmU*=1, and as ar is a density function, Equation (A23) implies that

LmULmU*=Rbrarmardr1=RbrarmardrRardr=Rbrarm1ardr. (A24)

In addition, Equation (A24) and L’Hopital’s rule imply that

limm1LmULmU*m1=limm1Rbrarm1ardrm1=limm1Rbrarmlnbrarardr1=Rbrarlnbrarardr=Rlnbrarbrdr. (A25)

The quantity appearing in the bottom line of Equation (A25) is the Kullback–Leibler divergence of the random variable Y (whose density function is br) from the random variable X (whose density function is ar).

Appendix A.1.6. Convexity Argument

Consider a function φt (t0) that is convex and that satisfies the condition φ1=0. Convexity implies that there is a line that coincides with the function φt at the point t=1, and that is below the the function φt at all other points t1. Specifically, denoting by s the line’s slope:

φtφ1+s·t1, (A26)

and the inequality in Equation (A26) is strict for all t1. As φ1=0, Equation (A26) implies that

φbrars·brar1, (A27)

and the inequality in Equation (A27) is strict for all r such that arbr. In turn, multiplying Equation (A27) by ar, and then integrating over the range R, implies that

RφbrarardrRs·brar1ardr=s·Rbrardr=s·RbrdrRardr=s·11=0. (A28)

So,

Rφbrarardr0, (A29)

and as the density function ar is positive over the range R, the left-hand side of Equation (A29) is zero if and only if arbr (albeit for a set whose Lebesgue measure is zero). In other words, the left-hand side of Equation (A29) is zero if and only if the random variables X and Y are equal in law.

Appendix A.2. Derivations: Section 3

Appendix A.2.1. Derivation of Equation (15)

In what follows, Fw=PrWw (0<w<) is the distribution function of the random variable W, and f$w=1μwfw (0<w<) is the density function of the random variable W$. Setting X=W and Y=W$ (the equalities being in law) implies that Ar=Fr and br=f$r, where R=0,. In turn, Equation (A5) implies that

PrX1,,XmY=RArmbrdr=0Fwmf$wdw=0Fwm1μwfwdw=1m+1μ0wm+1Fwmfwdw. (A30)

Set m+1=maxW1,,Wm+1 to be the maximum of m+1 IID copies of the random variable W. The distribution function of the random variable m+1 is

Prm+1w=PrW1,,Wm+1w=PrW1wPrWm+1w=Fwm+1. (A31)

Differentiating Equation (A31) implies that the density function of the random variable m+1 is

ddwPrm+1w=m+1Fwmfw. (A32)

In turn, Equation (A32) implies that the mean of the random variable m+1 is

Em+1=0wm+1Fwmfwdw. (A33)

Combined together, Equations (A30) and (A33) imply that

PrX1,,XmY=Em+1m+1μ. (A34)

Substituting Equation (A34) into Equation (4) yields

αmY||X=m+1PrX1,,XmY1=Em+1μ1=Em+1μμ, (A35)

and thus we obtain that

αmW$||W=EmaxW1,,Wm+1EWEW. (A36)

Note that W1maxW1,,Wm+1W1++Wm+1, and hence

EW=EW1EmaxW1,,Wm+1EW1++Wm+1=EW1++EWm+1=m+1EW. (A37)

In turn, Equation (A37) implies that

0EmaxW1,,Wm+1EWmEW. (A38)

Combined together, Equations (A36) and (A38) imply that

0αmW$||Wm. (A39)

Appendix A.2.2. Derivation of Equation (16)

In what follows, F¯w=PrW>w (0<w<) is the survival function of the random variable W, and f$w=1μwfw (0<w<) is the density function of the random variable W$. Setting X=W and Y=W$ (the equalities being in law) implies that A¯r=F¯r and br=f$r, where R=0,. In turn, Equation (A8) implies that

PrX1,,Xm>Y=RA¯rmbrdr=0F¯wmf$wdw=0F¯wm1μwfwdw=1m+1μ0wm+1F¯wmfwdw. (A40)

Set m+1=minW1,,Wm+1 to be the minimum of m+1 IID copies of the random variable W. The survival function of the random variable m+1 is

Prm+1>w=PrW1,,Wm+1>w=PrW1>wPrWm+1>w=F¯wm+1. (A41)

Differentiating Equation (A41) implies that the density function of the random variable m+1 is

ddwPrm+1>w=m+1F¯wmfw. (A42)

In turn, Equation (A42) implies that the mean of the random variable m+1 is

Em+1=0wm+1F¯wmfwdw. (A43)

Combined together, Equations (A40) and (A43) imply that

PrX1,,Xm>Y=Em+1m+1μ. (A44)

Substituting Equation (A44) into Equation (5) yields

βmY||X=1m+1PrX1,,Xm>Y=1Em+1μ=μEm+1μ, (A45)

and thus we obtain that

βmW$||W=EWEminW1,,Wm+1EW. (A46)

Note that 0minW1,,Wm+1W1, and hence

0EminW1,,Wm+1EW1=EW (A47)

In turn, Equation (A37) implies that

0EWEminW1,,Wm+1EW. (A48)

Combined together, Equations (A46) and (A48) imply that

0βmW$||W1. (A49)

Appendix A.2.3. Derivation of Equations (19) and (20)

In what follows, fw (0<w<) is the density function of the random variable W, and f$w=1μwfw (0<w<) is the density function of the random variable W$. As noted above, the ratio of the density function of W$ to the density function of W is f$w/fw=w/μ.

Setting X=W and Y=W$ (the equalities being in law) implies that ar=fr and br=f$r, where R=0,. In turn, Equation (10) implies that

γmY||X=Rbrarm1ardr=0f$wfwm1fwdw=0wμm1fwdw=1μm0wmfwdw0fwdw=EWmEWm1, (A50)

and thus we obtain that

γmW$||W=EWmEWmEWm. (A51)

Setting X=W$ and Y=W (the equalities being in law) implies that ar=f$r and bw=fr, where R=0,. In turn, Equation (10) implies that

γmY||X=Rbrarm1ardr=0fwf$wm1f$wdw=0μwm11μwfwdw=μm10w1mfwdw1μ0wfwdw=EW1mEW1m1, (A52)

and thus we obtain that

γmW||W$=EW1mEW1mEW1m. (A53)

Consider a smooth function φt that is defined over the positive half-line (t>0), and that is convex: φt>0. Jensen’s inequality [59] asserts that

φEWEφW. (A54)

Jensen’s inequality further asserts that an equality sign holds in Equation (A54) if and only if the random variable W is constant with probability one: φEW=EφWPrW=c, where c is a positive constant. For m>1 the function φt=tm is convex, and hence Equation (A54) implies that EWmEWm; thus Equation (A51) implies that γmW$||W0. For m>1 the function φt=t1m is convex, and hence Equation (A54) implies that EW1mEW1m; thus Equation (A53) implies that γmW||W$0. Moreover, the quantities γmW$||W and γmW$||W are zero if and only if the random variable W is constant with probability one.

Appendix A.3. Derivations: Section 4

Appendix A.3.1. Derivation of Equation (24)

In what follows, Ft=PrTt (0<t<) is the distribution function of the random variable T, and F¯t=PrT>t=1Ft (0<t<) is the corresponding survival function. In addition, frest=1μF¯t (0<t<) is the density function of the random variable Tres. Setting X=T and Y=Tres (the equalities being in law) implies that Ar=Fr and br=fresr, where R=0,. In turn, Equation (A5) implies that

PrX1,,XmY=RArmbrdr=0Ftmfrestdt=0Ftm1μF¯tdt=1μ0Ftm1Ftdt=1μ0FtmFtm+1dt=1μ01Ftm+11Ftmdt=1μ01Ftm+1dt01Ftmdt. (A55)

Set k=maxT1,,Tk to be the maximum of k IID copies of the random variable T. The distribution function of the random variable k is

Prkt=PrT1,,Tkt=PrT1tPrTkt=Ftk. (A56)

In turn, using Equation (A56), the mean of the random variable k is

Ek=0Prk>tdt=01Prktdt=01Ftkdt. (A57)

Combined together, Equations (A55) and (A57) imply that

PrX1,,XmY=1μEm+1Em, (A58)

where T1,,Tm+1 are m+1 IID copies of the random variable T; and m=maxT1,,Tm, and m+1=maxT1,,Tm+1.

Substituting Equation (A58) into Equation (4) yields

αmY||X=m+1PrX1,,XmY1=m+11μEm+1Em1, (A59)

and thus we obtain that

αmTres||T=m+1EmaxT1,,Tm+1EmaxT1,,TmET1. (A60)

Appendix A.3.2. Derivation of Equation (25)

In what follows, F¯t=PrT>t (0<t<) is the survival function of the random variable T, and frest=1μF¯t (0<t<) is the density function of the random variable Tres. Setting X=T and Y=Tres (the equalities being in law) implies that A¯r=F¯r and br=fresr, where R=0,. In turn, Equation (A8) implies that

PrX1,,Xm>Y=RA¯rmbrdr=0F¯tmfrestdt=0F¯tm1μF¯tdt=1μ0F¯tm+1dt. (A61)

Set m+1=minT1,,Tm+1 to be the minimum of m+1 IID copies of the random variable T. The survival function of the random variable m+1 is

Prm+1>t=PrT1,,Tm+1>t=PrT1>tPrTm+1>t=F¯tm+1. (A62)

In turn, Equation (A32) implies that the mean of the random variable m+1 is

Em+1=0Prm+1>tdt=0F¯tm+1dt. (A63)

Substituting Equation (A63) into Equation (A61) implies that

PrX1,,Xm>Y=1μEm+1. (A64)

In turn, substituting Equation (A64) into Equation (5) yields

βmY||X=1m+1PrX1,,Xm>Y=1m+11μEm+1, (A65)

and thus we obtain that

βmTres||T=1m+1EminT1,,Tm+1ET. (A66)

Appendix A.3.3. The Quantities γm(Tres||T) and γm(T||Tres)

In what follows, ft (0<t<) is the density function of the random variable T, and frest=1μF¯t (0<t<) is the density function of the random variable Tres. As noted above, the ratio of the density function of Tres to the density function of T is frest/ft=1/[μh(t)], where ht is the hazard rate of the random variable T.

Setting X=T and Y=Tres (the equalities being in law) implies that ar=fr and br=fresr, where R=0,. In turn, Equation (10) implies that

γmTres||T=Rbrarm1ardr=0frestftm1ftdt=01μhtm1ftdt=μm0htmftdt0ftdt=ETmEhTm1. (A67)

Setting X=Tres and Y=T (the equalities being in law) implies that ar=fresr and br=fr, where R=0,. In turn, Equation (10) implies that

γmT||Tres=Rbrarm1ardr=0ftfrestm1frestdt=0μhtm1frestdt=μm0htmfrestdt0frestdt=ETmEhTresm1. (A68)

Appendix A.3.4. Estimation of the Curve C(u) for the Renewal Application

In the opening of Section 4, we described how the random variable Tres—the “residual lifetime” of the random variable T—is constructed via a renewal process that is generated from the random variable T. We now describe an alternative construction of the random variable Tres; this construction will also be based on a renewal process that is generated from the random variable T. In what follows, T1,T2,,Tn are IID copies of the random variable T.

The first n renewal epochs of the renewal process are the temporal points τi=T1++Ti (i=1,,n). Now, sample at random a time point over the interval 0,τn, and set Dn to be the distance from the time point (which was sampled at random) to the first renewal epoch after it. In other words, Dn is the distance from the time point (which was sampled at random) to nearest renewal epoch to its right (on the temporal time axis). The distance Dn converges in law, in the limit n, to the random variable Tres.

Given n samples of the random variable T, order these samples increasingly: t1t2tn1tn. Based on these samples, set A1u to be the inverse of the empirical distribution function of the random variable T. This inverse function passes through the following points:

A1in=ti (A69)

(i=1,,n).

Based on the above samples, set Bt to be the empirical distribution function of the random variable Dn. This distribution function is given by

Bt=mint,t1+mint,t2++mint,tnt1+t2++tn. (A70)

In particular, note that

Bti=t1++ti1+(ni+1)tit1+t2++tn=t1++ti+(ni)tit1+t2++tn=π1++πi+(ni)πi, (A71)

where πi=ti/(t1++tn) (i=1,2,,n).

Equations (1) and (A69) imply that

Cin=BA1in=Bti, (A72)

and hence Equation (A71) yields

Cin=π1++πi+(ni)πi. (A73)

In turn, from Equation (A73) it follows that

CinCi1n1n=nni+1πiπi1, (A74)

where π0=0.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The author declares no conflicts of interest.

Funding Statement

This research received no external funding.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Hao L., Naiman D.Q. Assessing Inequality. Sage Publications; London, UK: 2010. [Google Scholar]
  • 2.Cowell F. Measuring Inequality. Oxford University Press; Oxford, UK: 2011. [Google Scholar]
  • 3.Coulter P.B. Measuring Inequality: A Methodological Handbook. Routledge; New York, NY, USA: 2019. [Google Scholar]
  • 4.Lorenz M.O. Methods of measuring the concentration of wealth. Publ. Am. Stat. Assoc. 1905;9:209–219. doi: 10.1080/15225437.1905.10503443. [DOI] [Google Scholar]
  • 5.Gastwirth J.L. A general definition of the Lorenz curve. Econom. J. Econom. Soc. 1971;39:1037–1039. doi: 10.2307/1909675. [DOI] [Google Scholar]
  • 6.Chotikapanich D. (Ed.) Modeling Income Distributions and Lorenz Curves. Springer Science & Business Media; New York, NY, USA: 2008. [Google Scholar]
  • 7.Arnold B.C., Sarabia J.M. Majorization and the Lorenz Order with Applications in Applied Mathematics and Economics. Springer International Publishing; Berlin, Germany: 2018. [Google Scholar]
  • 8.Eliazar I. Harnessing inequality. Phys. Rep. 2016;649:1–29. doi: 10.1016/j.physrep.2016.07.005. [DOI] [Google Scholar]
  • 9.Eliazar I. A tour of inequality. Ann. Phys. 2018;389:306–332. doi: 10.1016/j.aop.2017.12.010. [DOI] [Google Scholar]
  • 10.Liese F., Vajda I. Convex Statistical Distances. Teubner; Leipzig, Germany: 1987. [Google Scholar]
  • 11.Gibbs A.L., Su F.E. On choosing and bounding probability metrics. Int. Stat. Rev. 2002;70:419–435. doi: 10.1111/j.1751-5823.2002.tb00178.x. [DOI] [Google Scholar]
  • 12.Liese F., Vajda I. On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory. 2006;52:4394–4412. doi: 10.1109/TIT.2006.881731. [DOI] [Google Scholar]
  • 13.Kullback S., Leibler R.A. On information and sufficiency. Ann. Math. Stat. 1951;22:79–86. doi: 10.1214/aoms/1177729694. [DOI] [Google Scholar]
  • 14.Kullback S. Information Theory and Statistics. Courier Corporation; Chelmsford, MA, USA: 1997. [Google Scholar]
  • 15.Cover T.M. Elements of Information Theory. John Wiley & Sons; New York, NY, USA: 1999. [Google Scholar]
  • 16.Perez-Cruz F. Kullback-Leibler divergence estimation of continuous distributions; Proceedings of the 2008 IEEE International Symposium on Information Theory; Toronto, ON, Canada. 6–11 July 2008; pp. 1666–1670. [Google Scholar]
  • 17.Renyi A. Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, Statistical Laboratory University of California, 20 June–30 July 1960. Volume 1. University of California Press; Berkeley, CA, USA: 1961. On measures of information and entropy; pp. 547–561. [Google Scholar]
  • 18.Aczel J., Daroczy Z. On Measures of Information and Their Characterizations. Academic Press; New York, NY, USA: 1975. [Google Scholar]
  • 19.Van Erven T., Harremoes P. Renyi divergence and majorization; Proceedings of the 2010 IEEE International Symposium on Information Theory; Austin, TX, USA. 13–18 June 2010; pp. 1335–1339. [Google Scholar]
  • 20.Van Erven T., Harremoes P. Renyi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory. 2014;60:3797–3820. doi: 10.1109/TIT.2014.2320500. [DOI] [Google Scholar]
  • 21.Morimoto T. Markov processes and the H-theorem. J. Phys. Soc. Jpn. 1963;18:328–331. doi: 10.1143/JPSJ.18.328. [DOI] [Google Scholar]
  • 22.Ali S.M., Silvey S.D. A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. Ser. (Methodol.) 1966;28:131–142. doi: 10.1111/j.2517-6161.1966.tb00626.x. [DOI] [Google Scholar]
  • 23.Csiszar I. On information-type measure of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 1967;2:299–318. [Google Scholar]
  • 24.Vos P.W. Geometry of f-divergence. Ann. Inst. Stat. Math. 1991;43:515–537. doi: 10.1007/BF00053370. [DOI] [Google Scholar]
  • 25.Sason I., Verdu S. f-divergence Inequalities. IEEE Trans. Inf. Theory. 2016;62:5973–6006. doi: 10.1109/TIT.2016.2603151. [DOI] [Google Scholar]
  • 26.Sason I. On f-divergences: Integral representations, local behavior, and inequalities. Entropy. 2018;20:383. doi: 10.3390/e20050383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Sason I. On the Renyi divergence, joint range of relative entropies, and a channel coding theorem. IEEE Trans. Inf. Theory. 2015;62:23–34. doi: 10.1109/TIT.2015.2504100. [DOI] [Google Scholar]
  • 28.Prest T. Advances in Cryptology–ASIACRYPT 2017, Proceedings of the 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, 3–7 December 2017. Springer International Publishing; Berlin/Heidelberg, Germany: 2017. Sharper bounds in lattice-based cryptography using the Renyi divergence; pp. 347–374. Proceedings, Part I 23. [Google Scholar]
  • 29.Sason I., Verdu S. Improved bounds on lossless source coding and guessing moments via Renyi measures. IEEE Trans. Inf. Theory. 2018;64:4323–4346. doi: 10.1109/TIT.2018.2803162. [DOI] [Google Scholar]
  • 30.Nishiyama T., Sason I. On relations between the relative entropy and chi-squared divergence, generalizations and applications. Entropy. 2020;22:563. doi: 10.3390/e22050563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Megias A., Santos A. Kullback-Leibler divergence of a freely cooling granular gas. Entropy. 2020;22:1308. doi: 10.3390/e22111308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ganesh A., Talwar K. Faster differentially private samplers via Renyi divergence analysis of discretized Langevin MCMC. Adv. Neural Inf. Process. Syst. 2020;33:7222–7233. [Google Scholar]
  • 33.Claici S., Yurochkin M., Ghosh S., Solomon J. Model fusion with Kullback-Leibler divergence; Proceedings of the International Conference on Machine Learning; Virtual Event. 13–18 July 2020; pp. 2038–2047. [Google Scholar]
  • 34.Bleuler C., Lapidoth A., Pfister C. Conditional Renyi divergences and horse betting. Entropy. 2020;22:316. doi: 10.3390/e22030316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Grivel E., Diversi R., Merchan F. Kullback-Leibler and Renyi divergence rate for Gaussian stationary ARMA processes comparison. Digit. Signal Process. 2021;116:103089. doi: 10.1016/j.dsp.2021.103089. [DOI] [Google Scholar]
  • 36.Birrell J., Dupuis P., Katsoulakis M.A., Rey-Bellet L., Wang J. Variational representations and neural network estimation of Renyi divergences. SIAM J. Math. Data Sci. 2021;3:1093–1116. doi: 10.1137/20M1368926. [DOI] [Google Scholar]
  • 37.Hien L.T.K., Gillis N. Algorithms for nonnegative matrix factorization with the Kullback–Leibler divergence. J. Sci. Comput. 2021;87:93. doi: 10.1007/s10915-021-01504-0. [DOI] [Google Scholar]
  • 38.Mosonyi M., Hiai F. Test-measured Renyi divergences. IEEE Trans. Inf. Theory. 2022;69:1074–1092. doi: 10.1109/TIT.2022.3209892. [DOI] [Google Scholar]
  • 39.Nielsen F. The Kullback-Leibler divergence between lattice Gaussian distributions. J. Indian Inst. Sci. 2022;102:1177–1188. doi: 10.1007/s41745-021-00279-5. [DOI] [Google Scholar]
  • 40.Zhu C., Xiao F., Cao Z. A generalized Renyi divergence for multi-source information fusion with its application in EEG data analysis. Inf. Sci. 2022;605:225–243. doi: 10.1016/j.ins.2022.05.012. [DOI] [Google Scholar]
  • 41.Bouhlel N., Rousseau D. Exact Renyi and Kullback-Leibler Divergences between Multivariate t-Distributions. IEEE Signal Process. Lett. 2023;30:1672–1676. doi: 10.1109/LSP.2023.3324594. [DOI] [Google Scholar]
  • 42.Huang Y., Xiao F., Cao Z., Lin C.-T. Higher order fractal belief Renyi divergence with its applications in pattern classification. IEEE Trans. Pattern Anal. Mach. 2023;45:14709–14726. doi: 10.1109/TPAMI.2023.3310594. [DOI] [PubMed] [Google Scholar]
  • 43.Gini C. Sulla misura della concentrazione e della variabilita dei caratteri. Atti Del R. Ist. Veneto Di Sci. Lett. Ed Arti. 1914;73:1203–1248. [Google Scholar]
  • 44.Gini C. Measurement of inequality of incomes. Econ. J. 1921;31:124–126. doi: 10.2307/2223319. [DOI] [Google Scholar]
  • 45.Yitzhaki S., Schechtman E. The Gini Methodology: A Primer on a Statistical Methodology. Springer Science & Business Media; New York, NY, USA: 2012. [Google Scholar]
  • 46.Giorgi G.M., Gigliarano C. The Gini concentration index: A review of the inference literature. J. Econ. Surv. 2017;31:1130–1148. doi: 10.1111/joes.12185. [DOI] [Google Scholar]
  • 47.Giorgi G.M. Gini Coefficient. In: Atkinson P., Delamont S., Cernat A., Sakshaug J.W., Williams R.A., editors. SAGE Research Methods Foundations. SAGE Publications; London, UK: 2020. [DOI] [Google Scholar]
  • 48.Eliazar I. Metron. Springer; Berlin/Heidelberg, Germany: 2024. Beautiful Gini. [DOI] [Google Scholar]
  • 49.Smith W.L. Renewal theory and its ramifications. J. R. Stat. Soc. Ser. (Methodol.) 1958;20:243–284. doi: 10.1111/j.2517-6161.1958.tb00294.x. [DOI] [Google Scholar]
  • 50.Cox D.R. Metron. Springer; Berlin/Heidelberg, Germany: 1962. Renewal Theory. [Google Scholar]
  • 51.Ross S.M. Applied Probability Models with Optimization Applications. Dover Publications; Mineola, NY, USA: 2013. [Google Scholar]
  • 52.Kingman J.F.C. Poisson Processes. Oxford University Press; Oxford, UK: 1993. [Google Scholar]
  • 53.Streit R.L. Poisson Point Processes. Springer; New York, NY, USA: 2010. [Google Scholar]
  • 54.Last G., Penrose M. Lectures on the Poisson Process. Cambridge University Press; New York, NY, USA: 2017. [Google Scholar]
  • 55.Williams G., Watts D.C. Non-symmetrical dielectric relaxation behaviour arising from a simple empirical decay function. Trans. Faraday Soc. 1970;66:80–85. doi: 10.1039/tf9706600080. [DOI] [Google Scholar]
  • 56.Phillips J.C. Stretched exponential relaxation in molecular and electronic glasses. Rep. Prog. Phys. 1996;59:1133. doi: 10.1088/0034-4885/59/9/003. [DOI] [Google Scholar]
  • 57.Kalmykov Y.P., Coffey W.T., Rice S.A., editors. Fractals, Diffusion, and Relaxation in Disordered Complex Systems. John Wiley & Sons; Hoboken, NJ, USA: 2006. [Google Scholar]
  • 58.Bouchaud J.-P. Anomalous Transport: Foundations and Applications. Wiley; Weinheim, Germany: 2008. Anomalous relaxation in complex systems: From stretched to compressed exponentials; pp. 327–345. [Google Scholar]
  • 59.Feller W. An Introduction to Probability Theory and Its Applications. Volume II John Wiley & Sons; New York, NY, USA: 1971. [Google Scholar]
  • 60.Eliazar I. Inequality spectra. Phys. Stat. Its Appl. 2017;469:824–884. doi: 10.1016/j.physa.2016.11.079. [DOI] [Google Scholar]
  • 61.Eliazar I. Investigating equality: The Renyi spectrum. Phys. Stat. Mech. Its Appl. 2017;481:90–118. doi: 10.1016/j.physa.2017.04.003. [DOI] [Google Scholar]
  • 62.Kalbfleisch J.D., Prentice R.L. The Statistical Analysis of Failure Time Data. John Wiley & Sons; New York, NY, USA: 2011. [Google Scholar]
  • 63.Kleinbaum D.G., Klein M. Survival Analysis. Springer; New York, NY, USA: 2011. [Google Scholar]
  • 64.Collett D. Modelling Survival Data in Medical Research. CRC Press; Boca Raton, FL, USA: 2015. [Google Scholar]
  • 65.Barlow R.E., Proschan F. Mathematical Theory of Reliability. Society for Industrial and Applied Mathematics; Philadelphia, PA, USA: 1996. [Google Scholar]
  • 66.Finkelstein M. Failure Rate Modelling for Reliability and Risk. Springer Science & Business Media; London, UK: 2008. [Google Scholar]
  • 67.Dhillon B.S. Engineering Systems Reliability, Safety, and Maintenance: An Integrated Approach. CRC Press; Boca Raton, FL, USA: 2017. [Google Scholar]
  • 68.Eliazar I. Lindy’s law. Phys. Stat. Mech. Its Appl. 2017;486:797–805. doi: 10.1016/j.physa.2017.05.077. [DOI] [Google Scholar]
  • 69.Murthy D.N.P., Xie M., Jiang R. Weibull Models. John Wiley & Sons; New York, NY, USA: 2004. [Google Scholar]
  • 70.Rinne H. The Weibull Distribution: A Handbook. CRC Press; Boca Raton, FL, USA: 2008. [Google Scholar]
  • 71.McCool J.I. Using the Weibull Distribution: Reliability, Modeling, and Inference. John Wiley & Sons; New York, NY, USA: 2012. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No new data were created or analyzed in this study.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES