Abstract
The median of a standard gamma distribution, as a function of its shape parameter k, has no known representation in terms of elementary functions. In this work we prove the tightest upper and lower bounds of the form 2−1/k(A + k): an upper bound with A = e−γ (with γ being the Euler–Mascheroni constant) and a lower bound with . These bounds are valid over the entire domain of k > 0, staying between 48 and 55 percentile. We derive and prove several other new tight bounds in support of the proofs.
Introduction
We prove some of the bounds conjectured by Lyon [1] for the median of a gamma distribution, relying on previously known and new bounds that are tighter in some regions of the domain, and also relying on numerically transparent evaluations of the CDF at a few points, using a convergent series.
The gamma distribution’s probability density function (PDF) is xk−1 e−x/θ/Γ(k)θk, but we’ll use θ = 1 because both the mean and median simply scale with this parameter. Thus we use this “standard gamma distribution” PDF with just the shape parameter k, with k > 0 and x > 0:
The mean μ of this distribution is well known to be μ(k) = k, which is easy to verify since the first moment of the PDF evaluates to Γ(k + 1)/Γ(k) = k. The median ν(k) is the value of x at which the cumulative distribution function (CDF) equals one-half:
We seek to prove closed-form tight bounds for ν(k) that achieve 50th percentile in the limit for k → 0 and for k → ∞. By “tight” we mean a bound is equal to the true value at a point, or in the limit at 0 or ∞. Informally, there are degrees of tightness, based on how many derivatives are also matched; so one tight bound might be tighter than another.
See Fig 1 for some known linear and piecewise-linear bounds, illustrating the lack of good known bounds for 0 < k < 1. Many prior publications have specifically only considered k ≥ 1 [2, 3], or only positive integer values [4], or positive half-integers for the Chi-square distribution [5]; the full range k > 0 has also been considered [6, 7], but the bounds found leave room for improvement, particularly at low k.
Fig 1. Linear and piecewise-linear bounds.
The upper (blue) and lower (red) bounds and ν(k) > 0 (solid lines) are shown along with the true value (solid curve), the piecewise-linear bound that combines the recent linear bound for k ≥ 1 [3] with a chord segment (dashed lines), and the linear lower bound that is tangent at k = 1 (dash-dot line). The region k < 1 is not very usefully bounded.
Theorems to prove
We will show that, for all 0 < k < ∞, the median of the gamma distribution is bounded above and below by:
with closed-form scalar constants and AU = e−γ (with γ ≈ 0.5772157 being the Euler–Mascheroni constant), and that these bounds are asymptotically tight for k → ∞ and k → 0, respectively. Equivalently, we define the function A(k) and tightly bound it with these constants:
In addition, we’d like to prove that A(k) is monotonically decreasing between its low-k and high-k limits, as suggested by asymptotic and graphical numerical observations. If we could prove monotonicity, the other proof would be easier, but we don’t see how yet. So, the strategy here is to show that over various subsets of the k domain, with their union covering 0 < k < ∞, there are other bounds that we can prove are between our new closed-form bounds and the true median. Therefore, along the way, we derive several other new upper and lower bounds that are tighter over portions of the k domain, and some asymptotic values and slopes.
We prove the new theorems exhibited in Table 1, culminating in Theorems U6 and U8.
Table 1. Bounds table.
Summary comparison of several upper and lower bounds. Gr&M refers to Groeneveld and Meeden 1977 [11], C&R refers to Chen and Rubin 1986 [4], B&P refers to Berg and Pedersen 2006 [6], and Ga&M refers to Gaunt and Merkle 2021 [3]. The * refers to bounds presented with informal proofs or derivations, and ** for bounds presented as conjectures without proof, in Lyon 2021 [1]; in the present paper they are treated as new theorems, with proofs.
| Upper bounds, with names and formulae | Domain | Tight at | Notes |
| ν(k) < U0(k) = k | k > 0 | k → 0 | Gr&M, C&R |
| ν(k) < U1(k) = ke−1/(3k) | k > 0 | k → ∞ | B&P |
| ν(k) ≤ U2(k) = log(2) + (k − 1) | k ≥ 1 | k = 1 | Ga&M |
| ν(k) ≤ U3(k) = log(2)k | k ≤ 1 | k = 1 | Theorem U3* |
| 0 < k ≤ 1 | k → 0 | Theorem U4 | |
| 0 < k ≤ 1 | k → 0 | Theorem U5 | |
| ν(k) < U6(k) = 2−1/k(e−γ + k) | k > 0 | k → 0 | Theorem U6** |
| Lower bounds, with names and formulae | Domain | Tight at | Notes |
| ν(k) > L0(k) = 0 | k > 0 | k → 0 | trivial lower bound |
| k > 0 | k → ∞ | Doodson 1917 [12], C&R | |
| ν(k) > L2(k) = 2−1/kk | k > 0 | k → 0 | B&P |
| ν(k) ≥ L3(k) = log(2) + (k − 1)ν′(1) | k > 0 | k = 1 | Theorem L3* |
| ν(k) > L4(k) = 2−1/kΓ(k + 1)1/k | k > 0 | k → 0 | Theorem L4* |
| ν(k) > L5(k) = 2−1/ke−γ | k > 0 | k → 0 | B&P asymptote, Theorem L5* |
| k > 0 | k → 0 | Theorem L6 | |
| ν(k) > L7(k) = νLi + k − ki | 0 < k ≤ ki | — | Theorem L7 (for νLi < ν(ki)) |
| ν(k) > L8(k) = 2−1/k(log(2) − 1/3 + k) | k > 0 | k → ∞ | Theorem L8** |
Recent related work
In addition to the works mentioned above, there have been several more recent works on asymptotic properties and bounds for medians and other quantiles of gamma distributions and of the closely related Poisson and negative binomial (or Polya or Pascal) distributions, with a variety of interesting approaches [8–10].
Theorems and proofs
Chords and tangents
The convexity of the median (i.e., nonnegative second derivative) proved by Berg and Pedersen [7] implies that any tangent line is a lower bound, tight at the point of tangency, and that any chord, the straight line segment defined by two points of intersection, is an upper bound over the k interval delimited by the points of intersection.
The point k = 1 with ν(1) = log(2) is a point for which we have a known value, so is a good place to make a tangent lower bound. We can also use it for a chord with the other point of intersection at k → 0 or k → ∞. The Gaunt and Merkle upper bound [3] U2(k) = log(2) + (k − 1) can be viewed as the limit of chords with a point of intersection at k = 1 and the other at k → ∞ where the slope approaches 1. At the zero end is our U3(k), a rather trivial but apparently new observation.
Theorem U3
The median of the standard gamma distribution is bounded above by the chord between k = 0 and k = 1:
Proof: The convexity of the median implies that a chord is an upper bound, between its points of intersection, tight at those points. The point k = 1, ν(1) = log(2) (from the median of the exponential distribution, a well-known result and an easy computation), and the limiting point at k = 0, ν(0) = 0, are the only places we have definite known expressions for the value of the median, so we can provide a formula for that chord. The straight line between (0, 0) and (1, log(2)) is the formula given.
Theorem L3
The median of the standard gamma distribution is bounded below by the tangent line at k = 1:
Proof: The convexity of the median [7] implies that a tangent line is a lower bound, tight at the point of tangency. At k = 1 we know ν(1) = log(2) and can compute the slope to form the equation for the tangent line, log(2) + (k − 1)ν′(1).
The slope ν′(k) is not generally tractable, but is at the special point k = 1, where the CDF Pk(x) (the lower incomplete gamma function) and PDF pk(x) are both exponential functions.
At the point where , where x = ν(k), the slope is:
The derivative with respect to x is easy,
except that we only have a closed-form relation between x and k at k = 1, where we know x = ν(1) = log(2) and , so the derivative is there. The derivative with respect to k is messier:
At k = 1, using Γ(k) = 1 and dΓ(k)/dk = −γ, this derivative evaluates to
where Ei(−log(2)) ≈ −0.3786710 is the exponential integral (integration and evaluation assisted by Wolfram Alpha and independently verified). Putting these results together we get the slope of the median at 1, and hence the slope of the tangent-line lower bound there:
Using bounds for the exponential
To find new bounds for the median, we can bound the exponential in the integrand. Since e−x is a decreasing function of x (derivative is −e−x < 0), we can upper bound it for x > 0 by its starting value: e−x < 1. Since it is convex (second derivative is e−x > 0), we can lower bound it by a tangent line: 1 − x < e−x, and can upper bound it over the interval 0 < x < k by a chord: e−x < 1 − x(1 − e−k)/k.
In the proofs below, the superscripts a, b, and c are just names, not exponents.
Theorem L4
The median of the standard gamma distribution is bounded below by:
Proof: Use the constant upper bound to the exponential, e−x < 1 for x > 0, in the CDF integrand, to notice this inequality:
which integrates to:
Since the denominator and the exponent are positive, this expression can be decreased to achieve equality to one-half by substituting for ν(k) a positive function of sufficiently lower value, which we’ll call L4(k):
Solving, we find the expression given: L4(k) = 2−1/kΓ(k + 1)1/k.
We choose to write the result factored this way, rather than a single fraction with exponent 1/k or −1/k, because the two parts emphasize the shape of the median function in two regions: 2−1/k, which is just a small fraction bigger than Berg and Pedersen’s asymptote L5(k) [6] at low k, and Γ(k + 1)1/k, which is just a small offset below the Chen and Rubin straight-line bound [4] at high k. Most of our other new bounds keep the 2−1/k factor, which characterizes the “hockey stick” shape at low k.
Theorem L5
The median of the standard gamma distribution, and its lower bound L4(k), are bounded below by Berg and Pedersen’s asymptote L5(k):
Proof: Comparing to Theorem L4, L5(k) < L4(k) is implied if e−γ < Γ(k + 1)1/k for all k > 0. We prove this by showing that Γ(k + 1)1/k monotonically increases from e−γ, its limit at 0.
That limk→0 Γ(k + 1)1/k = e−γ follows from the Taylor series about 0 of Γ(k + 1), which is 1 − γk + O(k2). That Γ(k + 1)1/k increases monotonically from there, even though Γ(k + 1) is decreasing, is proved by showing that its derivative is everywhere positive. Differentiating, in terms of the digamma function ψ(0), the logarithmic derivative of the gamma function, we have:
The only factor here that is not obviously positive for k > 0 is kψ(0)(k + 1) − log(Γ(k + 1)), which at k = 0 is equal to 0, and which has a surprisingly simple derivative:
Here ψ(1) is the trigamma function, the derivative of the digamma function. This derivative is positive since the trigamma function, a special case of the Hurwitz zeta function, is positive for real arguments, because it has a series expansion with all positive terms:
Since it starts at zero and has a positive derivative everywhere, the factor in question is positive for k > 0, so Γ(k + 1)1/k is monotonically increasing.
Theorem U4
The median of the standard gamma distribution is bounded above by this expression, which is asymptotic at low k to the lower bound L4(k):
Proof: Use the tangent-line-at-0 lower bound to the exponential, 1 − x < e−x, in the integrand, to notice this inequality with easy integrals:
In this difference of terms, their exists a Ua(k) such that the first term can be increased to achieve equality to one-half by substituting Ua(k) for ν(k):
Now, picking any known upper bound Ub(k) ≥ ν(k), we have Ua(k)kUb(k) > ν(k)k+1, so we can write this inequality where the subtracted term has been increased:
As long as both factors are positive, there exists a Uc(k) such that we can increase the first factor by substituting Uc(k) for Ua(k) to achieve equality:
So we can solve for this new bound Uc(k) in terms of the known bound Ub(k):
Using U1(k) = ke−1/(3k) as the known bound Ub, and verifying the positivity constraint by noting that U1(k) < 1 for k ≤ 1 completes the proof.
By this method, we’ve taken an upper bound that’s relatively loose at low k and converted it to one that’s asymptotically tight, approaching the lower bound L4(k), at low k. Let’s do another like that.
Theorem U5
The median of the standard gamma distribution is bounded above by:
Proof: Same as Theorem U4, but use U3(k) = klog(2) as the known upper bound Ub(k). Note that U3(k) is a bound only for k ≤ 1, and that the positivity constraint holds through k ≤ 1, since both factors Uc(k) and are less than 1 in that domain.
Theorem L6
The median of the standard gamma distribution is bounded below by:
Proof: Like Theorem U4, but with an upper, as opposed to lower, bound on the exponential, and changing all the directions of the inequalities. That is, use the chord from x = 0 to x = k upper bound to the exponential, , in the integrand, to notice this inequality with easy integrals:
In this difference of terms, the first term can be decreased to achieve equality to one-half by substituting for ν(k) a sufficiently smaller (but positive) La(k):
Now, picking any other lower bound Lb(k) ≤ ν(k), we have La(k)kLb(k) < ν(k)k+1 (even if Lb(k) < 0), so we can write this inequality where the subtracted term has been decreased:
As long as both factors are positive we can decrease the first factor by substituting a sufficiently smaller Lc(k) to achieve equality:
So we can solve for a new lower bound Lc(k) in terms of a known lower bound Lb(k):
The constraint is met for all positive k, with any lower bound Lb(k), since Lb(k) < ν(k) < k ⇒ Lb(k)/(k + 1) < 1.
Using L3(k) = log(2) + (k − 1)ν′(1), the tangent at 1, as the known bound Lb completes the proof.
Corollary 1
Proof: Use Lb(k) = log(2) + k − 1 < L3(k), a lower bound to the tangent at k = 1 for k ≤ 1, since the slope 1 exceeds the tangent-line slope ν′(1).
We could obviously write some more corollaries, replacing negative lower bounds by zero, i.e. using max(0, L3(k)) or max(0, log(2) + k − 1) as Lb(k). Where the negative values are clipped to zero, the bound will turn to follow the tighter L4(k).
Bounds in a box
Our theorems U6 and L8 follow if we can prove that
These bounds and AU = e−γ define the bottom and top edges of a “box” that we need A(k) to be constrained to, and we can visualize that by mapping other bounds through the same function, and plot them, and show that at least one is inside the box for any and all k. See Fig 2.
Fig 2. Bounds in a box.
Upper (blue) and lower bounds (red) mapped for comparison to A(k) (black dotted). Bounds U6(k) and L8(k) define the top and bottom of the box via AU6(k) = AU = e−γ and AL8(k) = AL = log(2) − 1/3, while the left and right are defined by the limits of arctan(k) for 0 < k < ∞. To prove that the top and bottom are bounds of A(k), our approach is to find other upper and lower bounds “inside the box” over domains covering all k > 0. In this figure, we have no lower bound in the box around 1.7 < k < 3.0.
The function that maps the median and its bounds is f(k, x(k)) = 21/kx(k) − k, where x(k) is any real-valued function of positive k. We map bounds into the same space as A(k) with this, and identify them with subscripts. In particular, consider these upper bounds (plotted in Fig 2):
It’s easy to see, graphically, that AU1(k) and AU2(k) are “in the box” for k ≥ 1, and that AU4(k) and AU5(k) are “in the box” for k ≤ 1. Whether it’s easy to prove is another matter. At least we’re working with well-defined expressions and functions with known properties, not with the implicitly defined ν(k) itself.
We can do the same for some lower bounds, and see where they end up relative to the box:
The lower bounds L1(k), L3(k), and L4(k) are well inside the box (greater than ) for k near enough to ∞, 1, and 0, respectively, but they leave significant holes between them, where they don’t constrain A(k) against sagging out of the bottom of the box. That’s why we resort to the additional complexity of L6(k) and L7(k), to fill in the holes.
Line bounds from point bounds
It is straightforward to find or verify bounds for ν(k) at a finite set of discrete values of k, using numerical evaluation of the CDF integral via a rapidly converging series based on the Taylor series for the exponential function. For Theorem L7, we prove slope-1 line lower bounds based on point bounds at eight k points, for k values less than the point k (for higher values of k, the line will cross to be greater than ν(k). We use these line bounds to supplement our other lower bounds, in service of proving Theorem L8.
Theorem L7
The median of the standard gamma distribution is bounded below by lines of slope 1 below point lower bounds:
where the values ki and νLi represent eight point lower bounds νLi < ν(ki) per this array of values:
Proof: Since the slope of ν(k) is everywhere less than 1, these lines of slope 1 below point lower bounds are below the tangent line lower bounds at those points. To demonstrate that each νLi is indeed a lower bound at each of the ki points, we use a transparent technique not relying on anybody’s implementation of incomplete gamma functions. The point lower bounds are verified to lead to percentiles (100 times the CDF estimate) below 50%, by more than the corresponding estimation error bounds. In each case, we have chosen the νLi to be the largest multiple of 0.001 that can be verified to be a lower bound.
These calculations are shown in the S1 Appendix, with intermediate steps printed out from the algorithm given there. The formulae used for the CDF include approximations for the incomplete gamma function integral. The incomplete gamma function integral in the CDF can be written as a convergent series, using the Taylor series of the exponential function:
Since the magnitudes of the terms are decreasing by half or better after enough terms that n > 2ν, the error in truncating the series is bounded by the magnitude of the last term added (and due to the alternation, the error is actually quite a bit lower than that). So to verify a point lower bound ν < ν(k), we accumulate terms until that condition is met and the sum plus the bound on the residue, divided by a lower bound of Γ(k), is less than 0.5.
We lower-bound Γ(k) at these points by truncating values from a standard math package to 5 decimal digits. We also computed these bounds from scratch, using Euler’s product formula with 80 or more factors, and using upper and lower bounds on the truncated tail residual factor computed by using integrals to bound the sum of logarithms of the factors. That analysis is too long and complicated to include, but we list the values we used so that others can independently verify them.
See the S1 Appendix for the numerical verifications of these values by this method.
The reason for the particular set of ki selected for this theorem will become clear later (or may be obvious from looking at Fig 3).
Fig 3. More bounds in the box.
Here we show some approximate-tangent lower bounds from Theorem L7 mapped to the box, and remove some of the others. The points {ki, νLi} of Theorem 7 are circled. The solid blue upper bound is in the box per Lemma 1 and Lemma 2. The solid red lower bound is in the box per Lemmas 3, 4, and 5. These bounds being in the box will prove our main theorems U6 and L8.
Piecing the bounds together
This is where it gets complicated. We have to find regions where various of the previous bounds are tighter than the main upper and lower bounds that we set out to prove. To this end we will prove the lemmas in Table 2.
Table 2. Lemmas table.
Lemmas to prove in support of Theorems U6 and L8.
| Upper bounds in box | Domain | Lemma |
| AU1(k) < e−γ | k ≥ 0.5 | Lemma 1 |
| AU5(k) < e−γ | k ≤ 1 | Lemma 2 |
| Lower bounds in box | Domain | |
| AL1(k) > log(2) − 1/3 | k ≥ 3.1 | Lemma 3 |
| AL4(k) > log(2) − 1/3 | 0 < k ≤ 0.36 | Lemma 4 |
| AL7i(k) > log(2) − 1/3 | ki−1 ≤ k ≤ ki, k0 = 0.36 | Lemma 5 |
Lemma 1
Proof:
The exponential can be upperbounded using ex < 1/(1 − x) for x < 1:
Concluding that the slope of AU1(k) is negative for k > 0.360, and evaluating AU1(0.5) < 0.527 and e−γ > 0.561, we can conclude that AU1(k) < e−γ for k ≥ 0.5.
Lemma 2
Proof:
The low-k limit is again e−γ; the slope is initially negative, but the AU5(k) eventually crosses above the limit and diverges to infinity where the denominator expression goes to 0. To show that it doesn’t cross above the limit until somewhere after k = 0.5, we’ll use several steps of bounding of functions and derivatives, starting by invoking an upper bound on the gamma function, Theorem 1.4 of Batir [13]:
The condition AU5(k) ≤ e−γ is satisfied if substituting this upper bound satisfies the constraint:
or
(since the denominator is positive in the domain we care about, 0 < k < 1). Dividing by the positive right-hand side gives the equivalent condition
which simplifies to
or
This function has a Taylor series at 0 starting with a negative k2 term and a positive k3 term, so it is negative in some region of low enough k, as needed. By showing that the third derivative stays positive in 0 < k < 1, we can conclude that it crosses its starting value not more than once, and we can bound where that happens with a few evaluations. So first we need the derivatives; with help from WolframAlpha:
To show g‴(k) > 0 over the domain of interest (say 0 < k < 0.5), we can throw away the leading obviously-positive factors, and the condition becomes 2eγ(3k2 + k + 1) − e2γ(k − 2)k3 − 3 > 0, a simple fourth-degree polynomial-in-k condition:
All coefficients except the highest-order one are positive. Without that fourth-order term, this polynomial would be positive for all k ≥ 0, with one real root below zero; with that term, it has a positive real root, below which it is positive, including about 0.562 at k = 0 and about 17.98 at k = 1, so it’s positive for at least 0 ≤ k ≤ 1. This is all we need; let’s unwind.
The positive g‴(k) means that the curvature, g″(k), is increasing between k = 0 and k = 1. The slope g′(k) starts out negative at k = 0, keeping g(k) < 0 until after the slope eventually increases enough due to the eventually positive curvature. After the g(k) becomes positive, it will stay that way until some time after g‴(k) turns negative, making g″(k) turn negative, etc., and this is well above the region of interest. Evaluating a few points, g(0.5) = −0.0258, g(1) = −0.00025, g(1.01) = 0.0018. So the original condition holds, AU5(k) is “in the box”, through k = 1, the end of its domain of validity.
Lemma 3
Proof: L1(k) maps to AL1(k) = 21/k(k − 1/3) − k. The Laurent series at infinity of this function is easily found (with the help of Wolfram Alpha) to be:
where the coefficient of k−1 is positive and all of the coefficients in the O(k−2) term are negative (where n > 3 log(2) − 1). Therefore, for sufficiently large k, AL1(k) exceeds LU = log(2) − 1/3, and for smaller k, the value can only cross below log(2) − 1/3 ≈ 0.35981 once due to all the higher-order coefficients being negative. Evaluating at k = 3 we find AL1(k) ≈ 0.35979, outside the box, and evaluating at k = 3.1 we find AL1(k) ≈ 0.35990, inside the box. Thus we can conclude that AL1(k) is inside the box for k ≥ 3.1.
Lemma 4
Proof: L4(k) maps to AL4(k) = Γ(k + 1)1/k − k, which approaches eγ (at the top of the box) as k → 0, and exits the box somewhere in 0.36 < k < 0.37 since AL4(0.36) ≈ 0.3638 > 0.3598 and AL4(0.37) ≈ 0.3583 < 0.3598. We just need to show it can only exit the box once in 0 < k < 0.37 to conclude AL4(k) > log(2) − 1/3 for all 0 < k < 0.36. Γ(k + 1)1/k has a positive derivative, as we showed in the proof of Theorem L5. Now, if we can prove that derivative is less than 1, then AL4(k) is monotonically decreasing, so it can only go out of the box once.
because both factors are less than 1; Γ(x)<1 in 1 < x < 2 (a well-known property of the gamma function); to show (kψ(0)(k + 1) − log(Γ(k + 1))/k2 < 1, we need to do a bit more work. First, define the function h(k) that this factor is the derivative of:
Then rewrite the log of the gamma function in terms of this identity derived again from the Weierstrauss product formula and the zeta function:
so that
The 1/(et − 1) in the denominator of the integrand makes this integral absolutely convergent, so we can differentiate twice, getting:
in which the integrand has a positive denominator and a negative numerator, since for any k we can define y = kt and write the numerator as [(y + 1)2 + 1]e−y − 2, which starts at 0 and has derivative −y2e−y < 0 for all y > 0. Since h″(k) < 0, h′(k) only decreases from its starting value, which is less than 1, this completes the proof that AL4(k) decreases monotonically, so it is “in the box” for 0 < k ≤ 0.36.
Lemma 5
Proof: The lower bounds L7i(k) = νLi + k − ki, lines of slope 1 that are above L1(k), are of the form k − a for a = νLi − ki, and map under f to a function with series similar to the one we saw for Lemma 3 where a = 1/3, but with a values in the range νL1 − k1 = 0.255 ≤ a ≤ νL8 − k8 = 0.328.
In Lemma 3 we saw that the coefficient of k−1 was positive, as it is here, since a < log(2)/2 ≈ 0.346, and that all the higher-order coefficients are negative, since for n ≥ 2, (1 + n)a > log(2) for a > log(2)/3 ≈ 0.231. Thus, as with AL1(k) in Lemma 3, lines of slope 1 are not monotonic when mapped into the box, but are in the box at some k and only leave the bottom of the box once if a ≤ 1/3; that is, if L7i(k)>L1(k). Therefore, to verify that they are in the box over the domains specified, all that is needed is to evaluate them at both ends of their respective domains, as in this array:
The ALi(ki) and ALi(ki−1) columns in this array being greater than log(2) − 1/3 ≈ 0.3598 proves that the lower bounds are “in the box” over the respective domains ki−1 ≤ k ≤ ki.
Theorem U6
The median of the standard gamma distribution is bounded above by L6(k):
Proof: Relative to the top of the box AU6(k) = e−γ ≈ 0.5615, from Lemma 1 and Lemma 2, AU1(k) < e−γ for k ≥ 0.5 and AU5(k) < e−γ for k ≤ 1, which imply U6(k) > U1(k) > ν(k) and U6(k) > U5(k) > ν(k) over the respective same domains of k, so we conclude U6(k) > ν(k) for all k > 0.
It is left as an exercise for the reader to prove using U2(k) and/or U4(k), leading to U6(k) > U2(k) ≥ ν(k) for k ≥ 1 and/or U6(k) > U4(k) > ν(k) for 0 < k ≤ 1; or some other combination of bounds in the box.
Theorem L8
The median of the standard gamma distribution is bounded below by L8(k):
Proof: Relative to the bottom of the box AL8(k) = log(2)−1/3 ≈ 0.3598, Lemmas 3, 4, and 5 show other bounds “in the box” over regions covering all k > 0: AL4(k) > log(2) − 1/3 for 0 < k ≤ 0.36, eight versions of AL7i(k) > log(2) − 1/3 for eight adjacent domains spanning 0.36 ≤ k ≤ 3.5, and AL1(k) > log(2)−1/3 for k > 3.1, which imply L8(k) < L4(k) < ν(k), L8(k) < L7i(k) < ν(k), and L8(k) < L1(k) < ν(k) over the respective same k domains, so we conclude L8(k) < ν(k) for all k > 0.
It is left as an exercise for the reader to prove via some other combination of lower bounds in the box. The Theorem L6, or its simpler corollary, provides a bound in the box until L3 takes over. The Theorem L7 bound with just the one segment at i = 8 can fill in between L3 and L1.
Better bounds
From the previous figures it is clear that there is room for smooth functions bounding A(k) above and below that are tighter (in some regions, especially in the middle) than the bounds we constructed to prove theorems U6 and L8. Lyon [1] explored rational-function and arctan interpolators to approximate or bound A(k), writing A(k) as a monotonic interpolation between its upper and lower limits via a function g(k):
The simplest rational-function interpolator used was
Based on slopes at 0 and infinity, Lyon showed that the b0 value for a lower bound could not exceed and the value for an upper bound could not be less than , but that these values lead to actual lower and upper bounds, respectively, has not been proved.
See Fig 4 for the relation between these interpolated approximations to A(k) and the actual (numerical) A(k). Conjectured bounds from arctan interpolators are also shown, using this formula to approximate or bound g(k):
where for an upper bound the b can not be less than , and for a lower bound not more than bL ≈ 0.205282 (no closed form is known for this limit). The plots also show that the CDF, near 0.5, is surprisingly insensitive to the value of A(k) near both k → 0 and k → ∞.
Fig 4. Tighter bounds.
Conjectured better upper (blue) and lower (red) bounds mapped for comparison to A(k) (black dotted). Solid curves are from first-order rational-function interpolators; dashed curves are from arctan interpolators. For a sense of how close these bounds come to the median (50th percentile), curves of selected nearby percentiles are included (thin black curves, computed with Matlab’s gammaincinv function).
Again, that these interpolated functions produce actual bounds has not been proved. Presumably the method of line bounds from point bounds could be extended to construct proofs by finding tighter bounds numerically, though it might take thousands of points. These conjectures are left for others to consider.
Conclusions
We give upper and lower bounds for the median of the gamma distribution that are tighter at low k than previously known bounds, and are proved valid over all k > 0: U6(k) and L8(k). They are simple and in closed form. Though their validity seemed obvious from the numerical and asymptotic methods by which they were discovered, they were originally presented as conjectures [1] because they were not easy to prove analytically; the proofs here follow the outline of the proof the “hard way” as proposed there.
In summary, the median ν(k) of the standard gamma distribution satisfies
or
This formulation for the median
defines the function A(k) = 21/kν(k) − k with the remaining conjecture that A(k) is monotonically decreasing between the limits limk→0A(k) = e−γ and limk→∞A(k) = log(2) − 1/3.
As we showed before [1], monotonic approximations to A(k) that interpolate between these limits can make excellent approximations to the median, in closed form, with controlled good properties such as being exact at k = 1.
Supporting information
(PDF)
Acknowledgments
Help and ideas from discussions with my mathematically talented colleagues at Google are gratefully acknowledged: Pascal Getreuer, Srinivas Vasudevan, Dan Piponi, Michael Keselman, Yuan Li, Thomas Fischbacher, Daniel Parry, Lizao Li, Fred Akalin, and John Vogler.
Data Availability
All relevant data are within the paper.
Funding Statement
The author(s) received no specific funding for this work. Author RFL is employed by Google. The funder provided support in the form of salary for RFL and the publication fee for this article, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. There was no additional external funding received for this study. The specific roles of these authors are articulated in the ‘author contributions’ section. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Lyon RF. On closed-form tight bounds and approximations for the median of a gamma distribution. PLOS One. 2021;16(5):e0251626. doi: 10.1371/journal.pone.0251626 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Choi KP. On the medians of gamma distributions and an equation of Ramanujan. Proceedings of the American Mathematical Society. 1994;121(1):245–251. doi: 10.1090/S0002-9939-1994-1195477-8 [DOI] [Google Scholar]
- 3. Gaunt RE, Merkle M. On bounds for the mode and median of the generalized hyperbolic and related distributions. Journal of Mathematical Analysis and Applications. 2021;493(1):124508. doi: 10.1016/j.jmaa.2020.124508 [DOI] [Google Scholar]
- 4. Chen J, Rubin H. Bounds for the difference between median and mean of gamma and Poisson distributions. Statistics & Probability Letters. 1986;4(6):281–283. doi: 10.1016/0167-7152(86)90044-1 [DOI] [Google Scholar]
- 5. Wilson EB, Hilferty MM. The distribution of chi-square. Proceedings of the National Academy of Sciences. 1931;17(12):684–688. doi: 10.1073/pnas.17.12.684 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Berg C, Pedersen HL. The Chen–Rubin conjecture in a continuous setting. Methods and Applications of Analysis. 2006;13(1):63–88. doi: 10.4310/MAA.2006.v13.n1.a4 [DOI] [Google Scholar]
- 7. Berg C, Pedersen HL. Convexity of the median in the gamma distribution. Arkiv för Matematik. 2008;46(1):1–6. doi: 10.1007/s11512-006-0037-2 [DOI] [Google Scholar]
- 8.Priore S, Petersen C, Oishi M. Approximate stochastic optimal control for linear time-invariant systems with heavy-tailed disturbances. arXiv preprint arXiv:221009479. 2022;.
- 9. Ouimet F. A refined continuity correction for the negative binomial distribution and asymptotics of the median. Metrika. 2023; p. 1–23.37360276 [Google Scholar]
- 10. Pinelis I. Monotonicity properties of the gamma family of distributions. Statistics & Probability Letters. 2021;171:109027. doi: 10.1016/j.spl.2020.109027 [DOI] [Google Scholar]
- 11. Groeneveld RA, Meeden G. The mode, median, and mean inequality. The American Statistician. 1977;31(3):120–121. doi: 10.1080/00031305.1977.10479215 [DOI] [Google Scholar]
- 12. Doodson AT. Relation of the mode, median and mean in frequency curves. Biometrika. 1917;11(4):425–429. doi: 10.1093/biomet/11.4.425 [DOI] [Google Scholar]
- 13. Batir N. Inequalities for the gamma function. Archiv der Mathematik. 2008;91(6):554–563. doi: 10.1007/s00013-008-2856-9 [DOI] [Google Scholar]




