Significance
The harmonic mean (HM) filter is better at removing positive outliers than the arithmetic mean (AM) filter. There are especially difficult issues when an accurate evaluation of expected HM is needed such as, for example, in image denoising and marginal likelihood evaluation. A major challenge is to develop a higher-order approximation of the expected HM when the central limit theorem is not applicable. A two-term approximation of the expected HM is derived in this paper. This approximation enables us to develop a new filtering procedure to denoise the noisy image with an improved performance, and construct a truncated HM estimator with a faster convergence rate in marginal likelihood evaluation.
Keywords: harmonic mean, second-order approximation, arithmetic mean, image denoising, marginal likelihood
Abstract
Although the harmonic mean (HM) is mentioned in textbooks along with the arithmetic mean (AM) and the geometric mean (GM) as three possible ways of summarizing the information in a set of observations, its appropriateness in some statistical applications is not mentioned in textbooks. During the last 10 y a number of papers were published giving some statistical applications where HM is appropriate and provides a better performance than AM. In the present paper some additional applications of HM are considered. The key result is to find a good approximation to , the expectation of the harmonic mean of n observations from a probability distribution. In this paper a second-order approximation to is derived and applied to a number of problems.
The harmonic mean of n observations drawn from a population is defined by
| [1] | 
There have been a number of applications of the harmonic mean in recent papers. A more general version of with weights is
| [2] | 
where . The harmonic mean is used to provide the average rate in physics and to measure the price ratio in finance as well as the program execution rate in computer engineering. Some statistical applications of the harmonic mean are given in refs. 1–4, among others. has been used in evaluation of the portfolio price-to-earnings ratio value (ref. 5, p. 339) and the signal-to-interference-and-noise ratio (6) among others. The asymptotic properties of including the asymptotic expansion of are investigated in refs. 7 and 8 by either assuming that some moments of are finite or that s follow the Poisson distribution. It is noted that recent papers (9, 10) enable one to use saddle-point approximation to give the asymptotic expansion of to any given order of for some constants , i.e.,
| [3] | 
However, such methods are not applicable for obtaining the asymptotic expansion of when the first moment of is infinite. In ref. 3, s are assumed to follow a uniform distribution in the interval , i.e., , motivated by learning theory. Using the property that the inverse of converges to the stable law, ref. 3 showed that
| [4] | 
where the symbol “∼” means asymptotic equivalence as n → ∞. Our interest in this paper is to determine the second term in the asymptotic expansion of or the general version under more general assumptions on distributions of s. We show that under mild assumptions,
| [5] | 
where the constant will be given. In addition, we use the approach for obtaining [5] to the case that the first moment of is finite, motivated by evaluation of the marginal likelihood in ref. 11.
Approximations
We derive the asymptotic approximation of when the first moment of is not finite. Let be a sequence of independent and identically distributed (i.i.d.) random variables with possible infinite first moment. Suppose that there exist constants and , such that the distribution of
| [6] | 
converges weakly to a nondegenerate distribution such that
| [7] | 
| [8] | 
where α, , and are constants with , , and , respectively. The set of all distributions converging to is called the domain of attraction of . It is known that only stable laws with index α have the nonempty domains of attraction as shown by refs. 12 (chap. 7) and 13 (chap. 2).
Assume that there is a positive constant which does not depend on n such that
| [9] | 
We further assume a uniform rate of convergence of to such that
| [10] | 
for some positive constant . Our assumptions are mild. Ref. 14 showed that has the rate of under some assumptions.
We have the following asymptotic approximation of :
Theorem 1. Assume that conditions [7]–[10] are satisfied and , , , , and . Then we have the following first approximation:
| [11] | 
where .
The proof is given in Appendix: Proof of Theorem 1. Because in [10] is smaller than the remaining terms in [11], the coefficients of both and are independent of β in [11].
Remark 1: For an extension of Theorem 1 to the weighted harmonic mean in [2], we consider the following partial sum:
| [12] | 
where . Motivated by ref. 15, we may assume the following two conditions on the weights s:
| [13] | 
and the characteristic function of in [6] satisfies that
| [14] | 
Under the conditions [13] and [14], ref. 15 showed that the distribution of converges to a stable distribution with characteristic function . For example, if s follow uniform distribution , the condition [14] is satisfied when and . Following the proof of Theorem 1, it can be shown that
| [15] | 
where . It is noted that the weights in [2] do not have to be nonnegative, but must satisfy both conditions [9] and [13].
By Theorem 1, in [5] has the value −1. It is noted that Theorem 1 holds true if s follow a uniform distribution .
A higher-order approximation may be similarly obtained but extra conditions on in [7] and [8] may be needed. In view of the proof of Theorem 2.1 given in Appendix: Proof of Theorem 1, the higher-order term should be . Because it is difficult to obtain the coefficient of this term theoretically, it may be constructed empirically. As a demonstration, we consider the case where s follow a uniform distribution . We perform Monte Carlo simulation with 1,000,000 replications of n independent observations from standard uniform distribution for different values of . The coefficient of is estimated to be 0.5673 by fitting the simulated data to the following model by least squares:
Thus, we obtain the following approximation:
| [16] | 
As in ref. 3, suppose that s follow a uniform distribution . The distribution of is easily seen to be given by
where is an indicator function. It is well known that the mean of is infinite but for . By considering the limit stable distribution with index of the distribution of for and , ref. 3 obtained the result [4], which is
| [17] | 
According to our Theorem 1 and the approximation [16],
| [18] | 
| [19] | 
Fig. 1 displays the approximations given in [17]–[19] compared with the sample mean of 1,000,000 replications of n independent observations from the uniform distribution that serves as a proxy for the exact value of . Here n takes values , and 200. From Fig. 1, it can be seen that the approximation [18] is better than the approximation [17]. Although the approximation [19] is purely empirical, this empirical exercise basically achieves the desired result as shown in Fig. 1; it clearly gives much better approximation of than its other two counterparts.
Fig. 1.
Comparisons of three approximations of with respect to the sample mean (denoted by ) of with 1,000,000 replications of n independent observations from for . (i) “L-M” denotes the approximations of by [17] less . (ii) “F-M” stands for the approximations of by [18] less . (iii) “S-M” represents the approximations of by [19] less .
We now consider the case that . In this case, and . Thus, we have
| [20] | 
In light of the proof of Theorem 1, we have the following asymptotic approximation of :
Theorem 2. Assume that conditions in [7]–[10] are satisfied and , , , , and ; then we have the following approximation:
| [21] | 
where .
Remark 2: A similar result as in Theorem 2 can be obtained for the weighted harmonic mean in [2] by assuming that conditions [13] and [14] are satisfied with and . It can be shown that
| [22] | 
where .
Some Applications
We present two applications which involve the use of the approximation of .
Image Denoising.
Image denoising is very important in image processing. There are many methods for image denoising in the literature of image process. We are interested in the local filters such as the arithmetic and harmonic mean filters that have been used in image denoising. The harmonic mean filter is better at removing positive outliers and preserving edge features than the arithmetic mean filter. However, both of them fail when the image is contaminated by a uniform noise. Comparing the difference between the two means on different segments, we use the ratio of the harmonic mean and the arithmetic mean (defined in [23]) as a local filter and select the corresponding threshold of the ratio using the improved approximation [16] plus a saddle-point approximation. This application shows how the local filter can improve the performance of image denoising. The details are given below.
For demonstration, we consider a test image with dimension 250 × 250 (Fig. 2A) including disk, hand, human body, ring, sunflower, and triangle as shown in figure 2 of ref. 16. We contaminate the image with uniform noise, which is displayed in Fig. 2B. The usual harmonic mean filter method in image denoising is to replace the value of each pixel with the harmonic mean of values of the pixels in a surrounding region. We consider a square containing 9 pixels for each pixel such that this pixel is located at the center. Here the variable represents the value of the pixel taking values 0 (black), 1/255,…,255/255 (white) in this 256 grayscale image and the sample size is 9. For the edge of an image with dimension 250 × 250 such as the first or last row and column, where the pixels are not surrounded by a square, we copy them to the neighbor areas in the original image and the new image becomes 252 × 252. Note that this handling is only for convenience of filtering and the added pixels will not be analyzed. From Fig. 2 C and D, it can be seen that even though the harmonic mean filter outperforms the arithmetic mean filter, both arithmetic mean filter and the harmonic mean filter fail to denoise the noisy image given in Fig. 2B. However, we can first use the ratio of the harmonic mean and the arithmetic mean jointly with a threshold θ to transform the pixel at the pixel location as follows:
| [23] | 
where and are, respectively, the harmonic mean and the arithmetic mean of 9 pixels centering at . We then apply the arithmetic or harmonic mean filter to the pixels to denoise the image of pixels . By Fig. 2 E and F, it can be seen that both images look much better than the images in Fig. 2 C and D. The image in Fig. 2F (by the harmonic mean filter) looks almost the same as the initial unnoisy image.
Fig. 2.
(A) Initial unnoisy image. (B) Image that is noised by adding noise to each pixel value of the image A. (C) Image obtained by denoising the noisy image B using the arithmetic mean filter. (D) Image obtained denoising the noisy image B using the harmonic mean filter. (E) The arithmetic mean filtered image of (see [23]). (F) The harmonic mean filtered image of .
We note that only when using the ratio of the harmonic mean and the arithmetic mean, we assign 1 or 0 according to a threshold θ in [23], which is determined by the asymptotic behavior of the ratio of their expected values. How to select the threshold θ is important in practice. To demonstrate how to select θ, we consider two cases of uniform distributions with sample size n: (i) ; (ii) . Let and be, respectively, the harmonic mean and the arithmetic mean of this sample. An approximation to would be the ratio of their means, as in ref. 9. For case (i), can be approximated by [16], an improved approximation compared with the result of Theorem 1. For case (ii), has moment of any order. Hence the saddle-point approximation [3.12] in ref. 10 can be applied, and can thus be approximated by the three terms in that expansion. Fig. 3 displays the approximations of ratios of with for both cases. It can be seen that the approximation for case (ii) is larger than the one for case (i). By this figure, a practical recommendation of the threshold θ may be 0.85, which has been used for obtaining images displayed in Fig. 2 E and F.
Fig. 3.
Ratios with for both cases. “R1” denotes the ratio for case (i), whereas “R2” stands for the ratio for case (ii). The dotted line is 0.85.
Evaluating Marginal Likelihood.
It is of importance to calculate the marginal likelihood in the process of likelihood maximization. Let be the posterior density for prior , which implies that . Ref. 11 proposed the harmonic mean estimator for the marginal likelihood by letting in [1], where s are i.i.d. draws from the posterior distribution. Ref. 11 noted that can have infinite variance, in which case the central limit theorem is not applicable to the partial sums. Later, ref. 17 showed that in typical applications may lie in the domain of attraction of a one-sided α-stable law with index . If the sample information exceeds the prior information in an application, the limit law for a harmonic mean estimator is stable with index α close to 1, and the convergence is very slow at rate . In the following we demonstrate via one of their examples that if are properly right truncated, a good approximation can be constructed so that it converges to the expected harmonic mean of the right truncated , which converges to the marginal likelihood.
Suppose we want to evaluate the marginal likelihood based on independently normally distributed variables with mean θ and variance 1 for a sample of size with sample mean . Set the prior distribution . The exact marginal likelihood for is available analytically, . Our aim is to estimate the marginal likelihood , where . The harmonic mean estimate of the marginal likelihood is , where for independent and identical draws from the posterior distribution . Ref. 17 showed that the convergence rate of to the marginal likelihood is slow because of , and the harmonic mean estimator behaves badly (Fig. 4). As described above, in light of the truncation method used in refs. 18 and 19, we consider the right truncated variable , where is an indicator function and δ is a positive constant. Let
| [24] | 
By Theorem 2.2, it follows that
| [25] | 
Fig. 4.
Comparison of four approximations of the marginal likelihood with . (i) “M” denotes the sample mean of in [1] with 100,000 replications of n independent observations from the posterior distribution. (ii) “T” stands for the sample mean of in [24] ( is used) with 100,000 replications of n independent observations from the posterior distribution. (iii) “L” represents the sample mean of in [25]. (iv) “F” denotes the sample mean of in [25].
As displayed in Fig. 4, the convergence rate of is very slow as described in ref. 17. The main reason is that the value of α is close to 1. From Fig. 4, it can be seen that given in [24] has a faster convergence rate to the two-term approximation in [25]. It is noted that this two-term approximation converges to the marginal likelihood . Thus, may be used as its approximation.
Similar results are obtained for different values of δ, although rate increases with less accuracy or decreases when δ is larger or smaller than 1.5, e.g., or .
Appendix: Proof of Theorem 1
First we prove the case , which implies that , in [6], and the distribution of converges to the stable distribution with index satisfying [7] and [8] where and . Denote .
Integrating by parts, we have
By [7] and [10], . We now show that . By [9]
Because , by applying l’Hôpital’s rule,
then . For the other part, . So, .
Using Taylor expansion, we have
By applying l’Hôpital’s rule, we obtain
which implies that , and hence
Because , we have by [8] and [10]. In sum, we have
Acknowledgments
This work was partially supported by the Natural Sciences and Engineering Research Council of Canada.
Footnotes
The authors declare no conflict of interest.
References
- 1.Hamerly G, Elkan C. 2002. Alternatives to k-means algorithm that find better clusterings. Proceedings of the 11th International Conference on Information and Knowledge Management (ACM, New York), pp 600–607. [Google Scholar]
- 2.Iman RL. Harmonic mean. In: Kotz S, Johnson NL, editors. Encyclopedia of Statistical Sciences. Vol 3. Wiley; New York: 1983. pp. 575–576. [Google Scholar]
- 3.Komarova NL, Rivin I. Harmonic mean, random polynomials and stochastic matrices. Adv Appl Math. 2003;31(2):501–526. [Google Scholar]
- 4.Zhang B, Hsu M, Dayal U. 1999. K-harmonic means a data clustering algorithm. (Hewlett-Packard Labs), Technical Report HPL-1999-124. Available at www.hpl.hp.com/techreports/1999/HPL-1999-124.html. Accessed October 10, 2014.
- 5.Pinto JE, Henry E, Robinson TR, Stowe JD. Equity Asset Valuation. 2nd Ed John Wiley & Sons; Hoboken, NJ: 2010. [Google Scholar]
- 6.Lim MCH, McLernon DC, Ghogho M. 2009. Weighted harmonic mean SINR maximization for the MIMO downlink. IEEE International Conference on Acoustics, Speech, and Signal Processing (Institute of Electrical and Electronics Engineers, Taipei, Taiwan) pp 2381–2384. [Google Scholar]
- 7.Jones CM. Approximating negative and harmonic mean moments for the Poisson distribution. Math Commun. 2003;8(2):157–172. [Google Scholar]
- 8.Pakes AG. On the convergence of moments of geometric and harmonic means. Stat Neerl. 1999;53(1):96–110. [Google Scholar]
- 9.Shi X, Reid N, Wu Y. Approximation to the moments of ratios of cumulative sums. Can J Stat. 2014;42(2):325–336. [Google Scholar]
- 10.Shi X, Wang X-S, Reid N. Saddlepoint approximation of nonlinear moments. Stat Sin 2014 [Google Scholar]
- 11.Newton MA, Raftery AE. Approximate Bayesian inference with the weighted likelihood bootstrap. J R Stat Soc, B. 1994;56(1):3–48. [Google Scholar]
- 12.Gnedenko BV, Kolmogorov AN. Limit Distributions for Sums of Independent Random Variables. Addison-Wesley; Cambridge, MA: 1954. [Google Scholar]
- 13.Ibragimov IA, Linnik YV. Independent and Stationary Sequences of Random Variables. Wolters-Noordhoff; Groningen, The Netherlands: 1971. [Google Scholar]
- 14.Hall P. On the rate of convergence to a stable law. J Lond Math Soc. 1981;23:179–192. [Google Scholar]
- 15.Berkes I, Tichy R. 2014 Lacunary series and stable distributions. Available at www.math.tugraz.at/discrete/publications/projects/files/berkes_tichy_lac.pdf. Accessed March 10, 2014.
- 16.Aue A, Lee TCM. On image segmentation using information theoretic criteria. Ann Stat. 2011;39(6):2912–2935. [Google Scholar]
- 17.Wolpert RL, Schmidler SC. α-Stable limit laws for harmonic mean estimators of marginal likelihoods. Stat Sin. 2012;22(3):1233–1251. [Google Scholar]
- 18.Shi X, Wu Y, Liu Y. A note on asymptotic approximations of inverse moments of nonnegative random variables. Stat Probab Lett. 2010;80(15-16):1260–1264. [Google Scholar]
- 19.Wu T-J, Shi X, Miao B. Asymptotic approximation of inverse moments of nonnegative random variables. Stat Probab Lett. 2009;79(11):1366–1371. [Google Scholar]




