Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2023 Nov 7;51(11):2232–2257. doi: 10.1080/02664763.2023.2277125

Smoothing level selection for density estimators based on the moments

Rosa M García-Fernández 1,CONTACT, Federico Palacios-González 1
PMCID: PMC11328809  PMID: 39157270

Abstract

This paper introduces an approach to select the bandwidth or smoothing parameter in multiresolution (MR) density estimation and nonparametric density estimation. It is based on the evolution of the second, third and fourth central moments and the shape of the estimated densities for different bandwidths and resolution levels. The proposed method has been applied to density estimation by means of multiresolution densities as well as kernel density estimation (MRDE and KDE respectively). The results of the simulations and the empirical application demonstrate that the level of resolution resulting from the moments method performs better with multimodal densities than the Bayesian Information Criterion (BIC) for multiresolution densities estimation and the plug-in for kernel densities estimation.

KEYWORDS: Multiresolution density estimation, kernel density estimation, bandwidth, moments and level of resolution

1. Introduction

This paper develops a novel and straightforward approach to select the bandwidth or smoothing parameter in multiresolution models1 estimation and nonparametric density estimation. It is based on the moments and the shape of the estimated densities for different resolution levels and bandwidths. The method is applied to density estimation by means of multiresolution densities [MRDE; see refs 12,13,14] as well as kernel functions [KDE; see for instance, refs 18,5].

Our choice of considering the moments for the selection of the smoothing parameter was guided by the observed changes in the shape of the estimated MRDE when the level of resolution, varies. If the level of resolution is too low, the density is smooth but the bias is large. Conversely, a large j leads to a rougher density but a small bias. Similar results are observed2 when a density is fitted by the Kernel method. Let us suppose a bandwidth equals to h=2j with jZ. If j is too small then h is too large and the result is a smooth and biased density. As j increases h decreases and the bias tends to diminish, but from a determined value of j, or its corresponding h, an undesirable roughness appears. To solve the problem, we need to select a value of j, or its corresponding smoothing parameter h, so that the bias will be reasonably small without incurring an excessive roughness.

In both methodologies, MRDE and KDE, the bias of the estimated density is shown clearly by its dispersion and shape, especially in the flattening of the fitted density. Or equivalently in an underestimation of the kurtosis that evolves toward more reasonable values as the resolution level j increases. That is, the bias evolution is related to the central moments of order 2, 3 and 4 since they are used to compute dispersion, asymmetry and kurtosis. When the resolution level in the MRDE increases, or the smoothing parameter in the KDE decreases, the flattening and the dispersion tends to stabilize indicating that the bias is small. From a certain level of resolution, the roughness begins to increase indicating which value of j or h should be selected. This value leads to a sufficiently smooth density with small bias. This is clearly shown in the graphs of sections 3, 4 and 5 that represent the evolution, as a function of j, of the expected value, variance, and skewness and kurtosis coefficients for MRDE and KDE.

The rest of the paper is organized as follows. Section 2 introduces the math expression to calculate the moments of a multiresolution density. Section 3 shows, by means of simulations, how to select the level of resolution using the moments of a MRDE. Section 4 extends the approach to the Kernel method. In both estimation methods, we use the Cubic Box Spline function defined in Section 2.1. For the MRDE method, this is the scaling function generating the multiresolution analysis structure that contains the MRDE and their estimates. In the KDE method this function is used as the kernel. Section 5 contains an application to real data and Section 6 concludes.

Finally, we want to point out that the MRDE is a technique devised for massive data. Therefore, everything that follows must be understood in a context of large sample sizes.

2. Moments calculation for a MRDE

This section shows the expressions of the central and non-central moments of a multiresolution density. The second, third and fourth central moments will be used to select the level of resolution of a MRDE. The math development is contained in the Appendices A1–A6.

2.1. Multiresolution densities

Let θ(x) be a symmetric density with mean zero and compact support [2,2], known as Cubic Box Spline. It is given by

θ(x)={0ifx2p1(2+x)if2x1p2(2+x)if1x0p2(2x)if0x1p1(2x)if1x20ifx2,

where:

p1(x)=x36 and where p2(x)=x32+2x22x+23. (1)

Applying dilations and translations to the density θ(x), the densities λj,k(x) are built as follows [12]:

λj,k(x)=(sxk), (2)

where s=2j. Note that λ0,0(x)θ(x).

For each level of resolution j, the following MR densities:

f(x)=kZakλj,k(x), (3)

are defined.

In expression (3) ak0kZ and kZak=1. By definition, all these functions belong to the Vj space of the multiresolution analysis structure (MRA) defined by the scaling function θ(x) [7,23,11]. Any density of squared integrable, belonging to the space of Hilbert L2(R), has an approximation in each Vj.

In the definition of a multiresolution structure, dilations and translations of θ given by θj,k(x)=s1/2θ(sxk) are used. It is obvious that λj,k(x)=s1/2θj,k(x) and using this notation the MR density defined by (3) can be rewritten as follows:

f(x)=kZakλj,k(x)=s1/2kZakθj,k(x).

2.2. Moments of a random variable with MR density function

The central moment of order r for a multiresolution (MR) density is calculated as follows:

μf(r)=1sri=0r(ri)μ0,0(i)μa(ri),

where μa(ri)=kZak(kk¯)ri, k¯=kZk×ak. By definition, the expression μ0,0(r)=μθ(r) is the moment of order r of the density λ0,0(x)θ(x) (see Appendices A1 and A2). That is:

μ0,0(r)=μθ(r)=22(x0)rθ(x)dx=22xrθ(x)dxmθ(r).

The calculus of μθ(r) is in proposition 4 in Appendix A1.

Given a MR density fj(x) the non-central moment of order r is defined by:

mj(r)=+xrfj(x)dx.

It can be obtained as follows (see Appendix A4):

mj(r)=kZakmλj,k(r),

where mλj,k(r)=+xrλj,k(x)dx=1sri=0r(ri)mθ(i)kri.

2.3. Asymptotic properties of the moments of an estimated MR density

Proposition 2.1:

Given a sample xii=1,2,,n and an estimated MR density for that sample:

f^j(x)=kZa^kλj,k(x),

for a value of j large enough is verified:

f^j(x)=1ni=1nλj,k(xi)(x),

where

k(xi)={maxtZsxitifsximaxtZsxit0.5maxtZsxit+1ifsximaxtZsxit>0.5.

Proof :

See Appendix A6.

Proposition 2.2:

Let be

m^r(j)=+xrf^j(x)dx,

the non-central moment of the estimated MR density from the sample xii=1,2,,n.

If j converges to infinity, then m^r(j) converges to its sample counterpart. That is:

limjm^r(j)=1ni=1nxir.

Proof :

See Appendix A6

Since each central moment of order r is a continuous (polynomic) function of the non-central moments of order less than or equal to r, we can state that as j approaches to infinity the central moments of a MR density estimated using a sample of size n also converge to the central moments of the sample.

3. Moments method for selecting the resolution level to estimate a MR density

In this section we introduce an alternative method to the Bayesian Information Criterion [17] to select the level of resolution in the estimation process of a MR density. It is based on the central moments of orders two, third, and four and the symmetry and kurtosis coefficients. When a MR density is estimated, if the resolution level is too low, the estimate is a smooth curve but it has excessive bias. Conversely, if the resolution level is too high the estimate bias is small but the roughness is large. In practice, the bias is mainly shown in an excessive dispersion and a flat density. Since the flattening can be measured by the Fisher coefficient, the evolution of the bias, as the resolution level increases, should be reflected in the gradual decrease of the central moment of order 2 and the central moment of order 4. Based on these moments, we will choose an appropriate resolution level so that the bias will be acceptable and the roughness of the estimator will be not excessive.

Since we are going to establish comparisons with the BIC criterion, let us introduce it briefly in the context of the MR densities. Any estimation using a MR density, for a finite-size sample n and any resolution level j, can be considered a finite mixture of densities λj,k(x) (see section 2.1) with the form:

f^j(x)=i=1pa^kiλj,ki(x), (4)

where a^ki is the proportion of data within the interval (ki0.52j,ki+0.52j].

Note that the expression (4), estimator of (3), has a finite number of addends while (3) has infinite addends. This is explained as follows. Depending on the level of resolution, two extreme situations can arise. Firstly, j can be so small that the entire sample will be within a single interval (k0.52j,k+0.52j]. In this case, there is only one coefficient a^k distinct from zero and mixture (4) degenerates into a single addend. Secondly, j can be so large that each observed value will be in a different interval existing as many a^ki different from zero as different values are observed in the sample. That is, if n is the sample size and p is the number of different values observed in the sample, then the number of addends in the mixture (4) is p and it is verified that 1pn. Obviously, p=n if all the sample values are different and each of them belongs to a single interval of the form3 (ki0.52j,ki+0.52j]kiZ.

Since mixture (4) contains p position parameters kii=1,2,,p and p mixture parameters a^kii=1,2,,p there are m=2p parameters. We can optimize the number of parameters, which depends on j, by using the BIC criterion [17]. That is, we will consider that the best value is that which minimizes the expression:

BIC(j)=2Log(L^j)+mLog(n),

where

L^j=i=1nf^j(xi),

is the sample likelihood of f^j(x).

The proposed method based on the moments, is simpler and requires less process time than the BIC. However, both criteria complement and reinforce each other, as will be shown in the following simulations.

To proceed, we have simulated a sample of size 10,000 by using two generator models: a normal distribution and a mixture of double exponential distributions. We have fitted these generator models using MR densities for different levels of resolution in each case. Finally, we have calculated Ej[X],μj(2),γ1=μj(3)μj(2)32 (Fisher coefficient of skewness or asymmetry) and γ2=μj(4)μj(2)23 (Fisher coefficient of kurtosis) to select an appropriate j. In the supplemental material, we provide more details about the calculus and a macro to apply the developed methodology to the data-generating models used in this paper.

3.1. Normal distribution

Table 1 and Figure 1 shows the values Ej[X],μj(2),γ1 and γ2 for a MR estimation of a N(10,5) distribution. Each value is divided by its empirical or sample counterparts, which are computed from the sample data without fitting any density. These indicators stabilize for j=0 (see Figure 1). The BIC criterion provides the value j=1. Figure 2 displays the fitted densities (f^(x)) for j=1 and j=0, and the data generator model (N(10,5);f(x)).

Table 1.

Ratios between MRDE moments and sample moments.

j E[X] Variance Asymmetry Kurtosis
−3 1.0014 2.0494 0.5553 1.987432
−2 1.0058 1.2919 −0.6275 −3.905073
−1 1.0011 1.0717 0.0905 −2.285609
0 1.0000 1.0160 1.0618 1.212460
1 0.9998 1.0033 1.0149 0.887280
2 1.0000 1.0010 1.0179 1.072510
3 1.0000 1.0006 0.9854 1.023095
4 1.0000 1.0001 0.9973 0.985362

Figure 1.

Figure 1.

Ratios between MRDE moments and sample moments.

Figure 2.

Figure 2.

Estimated densities f^(x) for j=1, j=0 and data generator model f(x).

The estimate for j=1 is smooth but the bias is noticeable. For j=0 the bias almost disappears but there is a small roughness that may be acceptable. According to the BIC criterion, the optimum level of resolution is j=1.

3.2. Mixture of double exponential distribution

The density function of a double exponential distribution with parameters μ and θ is given by:

f(x)=e|xμ|θ∀xR.

In this illustration, the generator model will be a mixture of three densities of this type whose parameters are in Table 2.

Table 2.

Parameters of the double exponential distribution.

μ θ π
20 5 0.3
30 6 0.5
40 7 0.2

Table 3 and Figure 3 display the values of Ej[X],μj(2),γ1,γ2, divided by their empirical counterparts, using levels of resolution from j=3 to j=4. As can be seen, stability is reached either when j=1 or j=0. Figure 4 shows the estimations, f^(x)) for =2, j=1, j=0, and the data generator model (mixture of double exponential, f(x)).

Table 3.

Ratios between MRDE moments and sample moments.

j E[x] Variance Asymmetry Kurtosis
−3 1.00020842 1.23072002 0.64137168 0.56500524
−2 1.0000844 1.05642867 0.92675066 0.90709002
−1 1.00058047 1.0155786 0.97057506 0.97393645
0 0.99986737 1.00355414 0.99758049 0.9933315
1 0.99996728 1.000518 0.99559745 0.99654088
2 0.99999484 1.00019494 1.00076368 1.00119058
3 1.00000043 1.00001854 0.99983838 1.00028379
4 1.00000388 1.00003893 0.99975579 0.99954511

Figure 3.

Figure 3.

Ratios between MRDE moments and sample moments.

Figure 4.

Figure 4.

Estimated densities f^(x) for =2, j=1, j=0 and data generator model f(x).

The mixture has three modes that are difficult to capture by the estimations.4 This leads us to use higher resolution levels and rougher estimates to avoid the bias that such difficulty produces. Based on Figure 4, we would opt for j=1 discarding j=2 because of bias excess, and j=0 for roughness excess. In this example, certain difficulties are encountered in the BIC criterion. Due to the fact it is based on the principle of parsimony, it tends to give up the peaks and selects a smoother estimation. The level of resolution, according to the BIC is j=2.

4. Moments and selection of the bandwidth of a kernel density

Kernel density estimation (KDE) has become a common and useful tool for empirical studies. The discussion on the selection of the bandwidth has given rise to numerous publications on the subject. Nonetheless, part of the scientific community that works in nonparametric statistics has accepted that it may not be a perfect procedure for selecting the optimal bandwidth. We will not give an overview of kernel estimation techniques since our main aim is to extend the use of the moments to the choice of bandwidth parameter. We refer readers to [15,2,9,6,24,4,10,21] for a review of the bandwidth selector techniques.

4.1. Moments of a kernel density

The Kernel estimator of a density for a sample xii=1,2,,n is given by:

f^h(x)=1nhi=1nθ(xxih),

where h is the bandwidth or smoothing parameter and θ is the cubic box spline that we are going to use as the Kernel function.

For this density, it is verified (see Appendix A5):

Ef^h[X]=1ni=1nxi,

and

μf^h(r)=t=0r(rr)htmθ(t)μ(rt),

where mθ(t) is the non-central moment of order t of the density θ (see section 2 and Appendix A5) and μ(rt) is the central moment of order rt for the sample. That is:

μ(rt)=1ni=1n(xix¯)tr.

4.2. Selection of the bandwidth of a kernel density based on the moments

The Kernel estimator of a density for a sample xii=1,2,,n [5] is:

f^h(x)=1nhi=1nK(xxih), (5)

where K is the kernel and h is the so-called smoothing parameter.

Let us assume that h=12j where j is an integer number. The kernel function that we are going to use is θ, that is, the cubic box spline introduced in section 2.1. It is evident that (5) can be written as follows:

f^h(x)=2jni=1nθ(2jx2jxi)=1ni=1n2jθ(2jx2jxi). (6)

Utilizing the multiresolution analysis notation, the expression (6) can rewrite as:

f^h(x)=1ni=1nλj,2jxi(x). (7)

The estimator of a MRDE at the resolution level j defined in (4) is:

f^j(x)=l=1pa^klλj,kl(x), (8)

where a^kl=nln, nl is the number of sample values xl such that kl is the closest integer to 2jxl. This allows us to write (8) as:

f^j(x)=1nl=1pnlλj,kl(x). (9)

Assuming that k(xl) is the closest integer to 2jxl it is easy to understand that:

f^j(x)=1ni=1nλj,k(xi)(x). (10)

Note that expressions (9) and (10) are equal. We can obtain (10) from (9) by a frequency count on the values k(xi),i=1,2,n. In (9) p represents the number of values k(xi) found and nl,l=1,2,p, is the number of repetitions observed for each of them. Therefore, we can write:

f^j(x)=1ni=1n2jθ(2jxk(xi)),
f^h(x)=1ni=1n2jθ(2jx2jxi),

Taking into account that by definition k(xi)2jxi0.5 for a j sufficiently high both expressions must give very close results.

To illustrate the selection of h based on the moments, we have simulated a sample of size 10,000 from a N (10, 5). Figure 5 shows the MRDE and the KDE for j=0.

Figure 5.

Figure 5.

MRDE and KDE for j=0.

As can be seen in the Figure 5, both estimates are very similar and show a similar degree of roughness. This fact suggests to us that the moment method is suitable to select an appropriate h when h=1/2j with j integer. Table 4 and Figure 6 show Ej[X],μj(2),γ1 and γ2 divided by their empirical counterparts for j=3,,1,2,3,5. The moments have been calculated by the expression developed in this section using the fitted densities for the values of h that correspond to the above j. The values of j are on the abscissa axis.

Table 4.

Ratios between KDE moments and sample moments.

j E[x] Variance Asymmetry Kurtosis
−3 1 1.83770525 0.40140818 4.41595072
−2 1 1.20942631 0.75184944 1.27816406
−1 1 1.05235658 0.92630844 0.95204746
0 1 1.01308914 0.98068268 0.97763648
1 1 1.00327229 0.99511157 0.99369833
2 1 1.00081807 0.99877415 0.99837911
3 1 1.00020452 0.9996933 0.99959192
4 1 1.00005113 0.99992331 0.9998978
5 1 1.00001278 0.99998083 0.99997444

Figure 6.

Figure 6.

Ratios between KDE moments and sample moments.

Note that the stability of both moments is reached when j=0 which is equivalent to h=1. Figure 7 displays the MR and the kernel estimations for j=1 and j=0.

Figure 7.

Figure 7.

Estimated densities f^(x) for j=1, j=0 and data generator model f(x). Note that j is conveniently the same for both estimates and it can be obtained from the MR or kernel estimated moments. KDE and MRDE estimates are similar since both are good approximations of the same unknown density.

In the following simulation, the generator model is a mixture of three double exponential distributions whose parameters are in Table 2. Table 5 and Figure 8 display the values of Ej[X],μj(2),γ1,γ2, divided by their empirical counterparts, using levels of resolution from j=3 to j=4. As can be seen, stability is reached either when j=1 or j=0. Figure 9 shows the estimations, f^(x)) for j=1, j=0, and the data generator model (mixture of double exponential distributions, f(x)).

Table 5.

Ratios between KDE moments and sample moments.

j E[x] Variance Asymmetry Kurtosis
−3 1.00020842 1.23072002 0.64137168 0.56500524
−2 1.0000844 1.05642867 0.92675066 0.90709002
−1 1.00058047 1.0155786 0.97057506 0.97393645
0 0.99986737 1.00355414 0.99758049 0.9933315
1 0.99996728 1.000518 0.99559745 0.99654088
2 0.99999484 1.00019494 1.00076368 1.00119058
3 1.00000043 1.00001854 0.99983838 1.00028379
4 1.00000388 1.00003893 0.99975579 0.99954511

Figure 8.

Figure 8.

Ratios between KDE moments and sample moments.

Figure 9.

Figure 9.

Estimated densities f^(x) for j=1, j=0 and data generator model f(x).

An alternative way to compare KDE and MRDE is making 2j=1/h in (10). That is:

f^j(x)=1ni=1nλj,k(xi)(x)=1ni=1n2jθ(2jxk(xi))=1nhi=1nθ(xhk(xi))=1nhi=1nθ(xhk(xi)h). (11)

Observe that (5) and (11) are quite similar. Taking into account that by definition:

k(xi)0.5<2jxik(xi)+0.5,

and multiplying by h=2j the three terms of the above inequality we have:

hk(xi)h0.5<xihk(xi)+h0.5.

The latter expression shows an increasing approximation between xi and hk(xi). Note that the amplitude of the previous interval is h.

limh0hk(xi)=limjk(xi)2j=xi.

The MRDE and the KDE had been compared in terms of time needed to run their density function in ref. [12]. Nonetheless, the previous simulations reveal some facts that are worth highlighting. The MR density is not a particular case of kernel density. On the one hand, when the multiresolution densities are estimated according to (4), the results are similar to a modified kernel in which each sample data xi is substituted in (5) by hk(xi) to obtain (6), with h=2j. On the other hand, we cannot state that a kernel estimator is a multiresolution kernel estimator. We could make the kernel and the scaling function identical. Also, we can equal both dilation factors by making h=2j. But the kernel for h=2j will be a density of the Vj space of the multiresolution analysis structure only if the sample is of the form xi=ki2ji=1,2,..,n, with kiZ∀i=1,2,n and where j is a fixed integer determined by the sample. Any other estimation with a different h will no longer be a function of the multiresolution structure.

Despite the above comment, we have to point out that there is a well-developed theory about the generalized kernel estimators, developed from the wavelets and the multiresolution analysis structures (see for instance [22]). Broadly speaking, this methodology requires the mother wavelet or scaling function of the multiresolution analysis structure to generate orthogonal bases for the Vj spaces of the MRA. This is not the case of the cubic box spline since it generates non orthogonal Riesz Bases. Going deeper into this aspect is an interesting question, but it is out of the scope of this work.

5. Real data application

In this section, we apply the proposed method to the gross income of Spanish households. The sample data comes from the Spanish Survey of Household Finances (EFF) for the year 2014, which was conducted by the Bank of Spain [1]. The EFF provides information on assets, debt, income and spending. The sample size is 6120 households. The household income is calculated as the sum of labor and non-labor incomes for all household members in 2013. It is expressed in hundred thousand euros.

Table 6 and Figure 10 show the evolution, according to j, of Ej[X],μj(2),γ1, and γ2, divided by their empirical counterpart for levels of resolution from j=13 to j=6.

Table 6.

Ratios between MRDE moments and sample moments.

j E[X] Variance Asymmetry Kurtosis
−13 0.99717924 1.01153181 0.98738403 0.98380212
−12 0.9995666 1.00258459 0.99586171 0.99430889
−11 0.99999127 1.00037759 0.9990044 0.99853096
−10 1.00016918 0.99997439 0.99958258 0.99926656
−9 0.99990519 1.00016536 0.99988219 0.99980877
−8 0.99989802 0.99998004 1.0000367 1.00010499
−7 0.99995971 0.99999305 1.00003152 1.0000789
−6 0.99992814 1.00000293 0.9999922 0.99998317

Figure 10.

Figure 10.

Ratios between MRDE moments and sample moments.

According to the BIC criterion the optimum is j=13. However, the moments stabilize for j=9 or j=8 (Figure 10). Let us focus on this difference by comparing the density estimates for the two resolution levels plotted in Figure 11.

Figure 11.

Figure 11.

Estimated multiresolution density for j=8 and j=13.

The density has a peak between 8000 and 9000 euros that cannot be captured accurately by using j=13. So, a higher level of resolution, j=8 or j=9, is needed. The bias for j=13 is evident when the two densities are compared. The roughness for j=8 is clearly appreciable. The BIC chooses the smoothness of the curve, which leads to a very skewed estimated density around the mode. A similar fact has been shown in Section 3.2. We have observed empirically that roughness has only a slight effect on the cumulative distribution function. Nonetheless, the bias has a remarkable impact on the concentration measurement producing an underestimation of the Gini index and the Lorenz curve. This is an important issue to be considered if we study distributional aspects of the distribution as concentration or inequality through the fitted density. At this point, it should be noted that the kernel method is frequently applied to study income distribution (see for instance [8,16,3,20]). Figures 12 and 13 plot the cumulative distribution functions and the Lorenz curves respectively. The cumulative distribution functions are similar except in the income interval [0, 10,000]. This difference leads to an underestimation of the Gini index5: for j=13 the index equals to 0.4338 and for j=8 it is equals to 0.5131. It also affects the Lorenz curve (see Figure 13) which is underestimated for j=13 . Therefore, the level of resolution j=8 obtained by the method of moments is preferable to the value j=13 selected by the BIC in the situations set out above.

Figure 12.

Figure 12.

Cumulative distribution functions.

Figure 13.

Figure 13.

Lorenz curves.

Next, we repeat the estimation for kernel densities and compare the results. Table 7, Figure 14 show the evolution, according to j, of Ej[X],μj(2),γ1 and γ2 divided by their sample counterpart for levels of resolution from j=13 to j=5.

Table 7.

Ratios between KDE moments and sample moments.

j h=1/2j E[X] Variance Asymmetry Kurtosis
−13 8192 1 1.00759984 0.98870759 0.98497182
−12 4096 1 1.00189996 0.99715681 0.99621088
−11 2048 1 1.00047499 0.99928794 0.9990507
−10 1024 1 1.00011875 0.99982191 0.99976255
−9 512 1 1.00002969 0.99995547 0.99994063
−8 256 1 1.00000742 0.99998887 0.99998516
−7 128 1 1.00000186 0.99999722 0.99999629
−6 64 1 1.00000046 0.9999993 0.99999907
−5 32 1 1.00000012 0.99999983 0.99999977

Figure 14.

Figure 14.

Ratios between KDE moments and sample moments.

The appropriate level of resolution according to the moments is j=10 or j=9 (Figure 14).

The results are similar reinforcing the idea of applying the analysis performed on the MRDE. The level of resolution selected for the MRDE was j=9. For the KDE we have opted for h=129=29=512 . The plug-in method to select the optimum h [19] provides the result h=459.277. This value corresponds to j=log2459.277=8.8432 which rounded to the nearest integer would give j=9. If we use the MRDE for j=13 the plug-in method provides the result h=3140.324 (Figure 15). That is j=log23140.324=11.6167 whose nearest integer is j=12. This is less conservative than the BIC but still conservative. In any case, the resulting j is the same or it is very close to that used for the plugged MRDE.

Figure 15.

Figure 15.

Estimated kernel density for j=9 and j=13.

Generalizing, if we use an estimated MR for a given j as plugged density, the plug-in method provides a value of h equal to log2h whose nearest integer is the value of j utilized to estimate the MRDE. It is faster and easier to use the method of the moment for KDE and determine the h=12j that we will use in the estimation. If we want more conservative results, regarding the smoothness of the fit, we can reduce the value of j by one or two units, paying special attention to the increasing bias.

6. Conclusions

This paper introduces an approach to select the bandwidth or smoothing parameter in semiparametric and nonparametric density estimation. It is based on the evolution of the expected value, the variance, the symmetry and kurtosis coefficients of the estimated densities for different bandwidths. Using these values, divided by their empirical counterpart, we select a resolution level so that the bias will be acceptable and the roughness of the estimator will be not excessive.

This method has been applied to the density estimation by means of multiresolution densities as well as Kernel density estimation. In this way, we have expanded the available criteria to smoothing parameter selection.

The results of the simulations and the empirical application indicate that the level of resolution resulting from the moments method is more flexible to fit a multimodal distribution than those resulting from the BIC for MRDE and the plug-in for KDE. The BIC chooses the smoothness of the curve which leads to a skewed estimated density around the modes. The method of the moments attributes more importance to the use of higher resolution levels and hence rougher estimates to avoid the bias that the fitting produces. This procedure is recommended to analyze some distributional aspects such as the concentration of income. As it has been shown in the empirical application, the bias can produce an underestimation of the concentration of the distribution.

Supplementary Material

Supplemental Material

Appendices.

A1. Central and non-central moments of the cubic box spline density

The density function θ has an expected value equal to zero. Hence the central and non-central moments are equal. It is also symmetric and consequently, the odd order moments are null.

Proposition A1:

Given p1(x) and p2(x) defined by (1), it is proved by polynomial integration that:

Q1(x,r)=xrp1(2+x)dx=16[xr+4r+4+6xr+3r+3+12xr+2r+2+8xr+1r+1].

Analogously:

Q(x,r)=xrp2(x)dx=xr+42(r+4)+2xr+3r+32xr+2r+2+2xr+13(r+1),

and

Q2(x,r)=xrp2(2+x)dx=i=0r(ri)2riQ(2+x.i).

Proposition A2:

The non-central and central moments of order r for θ are6 zero if r is odd. If r is even then:

μθ(r)mθ(r)=2{Q1(1,r)Q1(2,r)+Q2(0,r)Q2(1,r)}.

Proof:

Let us assume that r is even. In this case:

μθ(r)=Eθ[xr]=22xrθ(x)dx=220xrθ(x)dx=2{21xrp1(2+x)dx+10xrp2(2+x)dx}.

Given that θ is a symmetric density with expected value zero, all the central and non-central moments are equal and also the moments with r odd are zero.

Taking into account proposition 1, we can assert that, if r is even, then:

μθ(r)=2{Q1(1,r)Q1(2,r)+Q2(0,r)Q2(1,r)}.

A2. Central moments of the densities λj,k(x)

Proposition A3:

It is verified that:

Eλj.k[x]=ks. (A1)

Proof:

Let us consider:

Eλj,k[x]=+xλj,k(x)dx=+xsθ(sxk)dx. (A2)

Making the change of variable y=sxk in (A2) we have:

Eλj,k[x]=+y+ks(y)dys=1s+(y)dy+ks+θ(y)dy,

but +(y)dy=0y+θ(y)dy=1.

Hence the proposition is true.

Proposition A4:

Let

μj,k(r)=+(xks)rλj,k(x)dx,

be the central moment of order r of λj,k(x). It is verified that:

μj,k(r)=μ0,0(r)sr,

where μ0,0(r)=μθ(r) is, by definition, the moment of order r of the density λ0,0(x)θ(x) (see proposition 4).

Proof:

μj,k(r)=+(xks)rλj,k(x)dx=+(sxks)rλj,k(x)dx=1sr+(sxk)r(sxk)dx. (A3)

Making the change y=sxk in the integral (A3) we have:

μj,k(r)=1sr+yrθ(y)dy=μ0,0(r)sr.

A3. Central moments of a MR density

Proposition A5:

Let us consider a MR density as that given by (3). It is verified that:

Ef[X]=+xf(x)dx=kZakks=k¯s,

where:

k¯=kZak×k.

Proof:

It is trivial taking into account (3).

Proposition A6:

It is verified that the central moment of order r of the MR density given by (3) is:

μf(r)=1sri=0r(ri)μ0,0(i)μa(ri),

where μa(ri)=kZak(kk¯)ri and where μ0,0(i)=μθ(i) is calculated according to proposition 4.

Proof:

Let:

μf(r)=+(xk¯s)rf(x)dx=kZak+(xks¯)rλj,k(x)dx. (A4)

However,

+(xks¯)rλj,k(x)dx=+(xks+ksks¯)rλj,k(x)dx. (A5)

Considering that:

(xks+kskx¯)r=1sri=0r(ri)(kk¯)ri(sxk)r,

and substituting in (A3) we have:

+(xk¯y)rλj,k(x)dx=1sri=0r(ri)(kk¯)ri+(sxx)i(sxk)dx.

But making y=sxk we obtain:

+(sxx)i(sxk)=+(y)iθ(y)dy=μθ(i)μ0.0(i).

It is verified:

+(xk¯)rλj,k(x)dx=1sri=0r(ri)(kk¯)riμ0,0(i).

Substituting in (A4) we have:

μf(r)=kZak1sri=0r(ri)(kk¯)riμ0,0(i)=1sri=0r(ri)μ0,0(i)kZak(kk¯)ri=1sri=0r(ri)μ0,0(i)μa(ri),

as we want to prove.

A4. Non-central moments of a MR density

Proposition A7:

Taking into account (3) is trivial to prove that:

mj(r)=kZakmλj,k(r),

where

mλj,k(r)=+xrλj,k(x)dx.

Proposition A8:

It is verified:

mλj,k(r)=h=0r(rh)mθ(r)khr

where mθ(r)=μθ(r). They are defined and calculated in Proposition A2.

Proof:

mλj,k(r)=+xrλj,k(x)dx=+xr(sxk)dx. (A6)

Making the change of variable y=sxk in (A6) it is obtained:

1sr+(y+k)rθ(y)dy. (A7)

Taking into account that:

(y+k)r=h=0r(rh)yrkhr, (A8)

and substituting (A8) in (A7), the proof of the proposition is evident (see proposition 4).

A5. Central moments of a kernel density

Proposition A9:

It is verified:

Ef^h[X]=+xf^h(x)dx=1ni=1nxi=x¯.

Proof:

Let

+xf^h(x)dx=+x1nhi=1nθ(xxih)dx=1nhi=1n+(xxih)dx. (A9)

Making the change of variable y=xxih in the integral (A9) we have:

+(xxih)dx=h+(hy+xi)θ(y)dy=h2+(y)dy+hxi+θ(y)dy. (A10)

But +(y)dy=0 and +θ(y)dy=1 so (A10) equals hxi and substituting in (A9) we have:

Ef^h[X]=1nhi=1nhxi=1ni=1nxi=x¯.

Proposition A10:

It is verified:

μf^h(r)=t=0r(rr)htmθ(t)m(rt).

where mθ(t) is the non-central moment of order t of density θ . It is obtained following Section 2 in the Appendix. The expression μ(rt) is the sample central moment of order t . That is:

μ(rt)=1ni=1n(xix¯)tr.

Proof:

μf^h(r)=+(xx¯)rf^h(x)dx (A11)

That is:

μf^h(r)=1nhi=1n+(xx¯)rθ(xxih)dx. (A12)

Making the change of variable y=xxih in (A12) we have:

+(xx¯)rθ(xxih)dx=h+(hy+xix¯)rθ(y)dy. (A13)

But

(hy+xix¯)r=t=0r(rt)htyt(xix¯)rt.

Substituting in (A13) we have:

h+(xx¯)rθ(xxih)dx=t=0r(rt)ht+1mθ(t)(xix¯)rt.

Substituting the latter expression in (A12) we obtain:

μf^h(r)=1nhi=1nt=0r(rt)ht+1mθ(t)(xix¯)rt=t=0r(rt)htmθ(t)1ni=1n(xix¯)rt.

Naming

μ(rt)=1ni=1n(xix¯)rt,

the proposition is proven.

A6. Asymptotic properties of the MR moments

Proof Proof of Proposition 2.1 —

By definition [12,13]:

a^k=1nCard{xii=1,2,,n|xi[k22j,k+22j)}.

It is evident that xi[k22j,k+22j) if and only if k=k(xi).

Moreover, the intervals:

[k22j,k+22j),

have center k2j and radius 12j1, so for j large enough the previous intervals will have a radius so small that each sample element, xi, belongs to a different interval. In this case, the coefficients greater than zero are those associated with intervals that contain a sample element, that is:

a^k(xi)=1ni=1,2,,n.

Hence the proposition is true.

Proof Proof of Proposition 2.2 —

m^r=+xrf^j(x)dx=+xrkZa^kλj,k(x)dx=kZa^k+xrλj,k(x)dx.

For a j large enough, according to proposition 1, we have:

a^k={0ifkk(xi)∀i=1,2,,n1nifk=k(xi)∀i=1,2,,n,

which allows us to write:

m^r=1ni=1n+xrλj,k(xi)(x)dx. (A14)

However,

+xrλj,k(xi)(x)dx=2+2xr(sxk(xi))dx, (A15)

with s=2j.

If we make the change of variable y=sxk(xi) in (A14), we have:

2+2xr(sxk(xi))dx=2+2(y+k(xi)s)rθ(y)dy. (A16)

If j tends to infinite, we have:

limjys=0limjk(xi)s=xi. (A17)

The first equality in (A17) is trivially true and the second is also true since by definition:

xi[k(xi)2s,k(xi)2s),

and when j tends to infinity, the radius of the interval converges to zero and its center is the point k(xi)s.

Assuming that (A17) is true we can write:

limj2+2(y+k(xi)s)rθ(y)dy=2+2(xi)rθ(y)dy=xir2+2θ(y)dy=xir. (A18)

Wherewith, under (A14) and (A18), we have:

limjm^r=1ni=1nxir.

That is, the non-central moments of the estimated MRDE converge to the non-central moments of the sample.

Notes

1

The multiresolution densities are a particular case of semiparametric models (see, [12,14]).

2

This is a well-known fact underlying all the bandwidth selection methods.

3

Remind that these intervals form a partition of the real line and their amplitude converges to zero as j increases.

4

Unless this is done parametrically using the EM algorithm on a mixture model of three double exponential distributions. But for a sample of size 10,000 the process time is too long.

5

Note that the values for the Gini coefficient can differ from other publications since our illustration is based on gross income instead of net income.

6

The expected value of the density is zero and the central and non-central moments are equal.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.de España B., Survey of Household Finances (EFF) 2014: Methods, Results and Changes since 2011. Analytical Article, 24, January 2007.
  • 2.Cao R., Cuevas A., and Gonzalez Manteiga W., A comparative study of several smoothing methods in density estimation. Comput. Stat. Data Anal. 17 (1994), pp. 153–176. doi: 10.1016/0167-9473(92)00066-Z. [DOI] [Google Scholar]
  • 3.Charpentier A. and Flachaire E., Log-transform kernel density estimation of income distribution. L'Actualité économique 91 (2015), pp. 141–159. doi: 10.7202/1036917ar. [DOI] [Google Scholar]
  • 4.Hall P., and Marron J.S., Estimation of integrated squared density derivatives. Stat. Probab. Lett. 6 (1987), pp. 109–115. doi: 10.1016/0167-7152(87)90083-6. [DOI] [Google Scholar]
  • 5.Härdle W., Smoothing Techniques, Springer, New York, 1991. [Google Scholar]
  • 6.Heidenreich N.B., Schindler A., and Sperlich S., Bandwidth selection for kernel density estimation: a review of fully automatic selectors. AStA Adv. Stat. Anal. 97 (2013), pp. 403–433. doi: 10.1007/s10182-013-0216-y. [DOI] [Google Scholar]
  • 7.Hernández E., and Weiss G., A First Course on Wavelets, CRC Press, New York, 1996. doi: 10.1201/9780367802349. [DOI] [Google Scholar]
  • 8.Jenkins S.P., Did the middle class shrink during the 1980s? UK evidence from kernel density estimates. Econ. Lett. 49 (1995), pp. 407–413. doi: 10.1016/0165-1765(95)00698-F. [DOI] [Google Scholar]
  • 9.Jones M.C., On correcting for variance inflation in kernel density estimation. Comput. Statist. Data Anal. 11 (1991), pp. 3–15. [Google Scholar]
  • 10.Marron J.S. and Sheather S.J., Progress in data-based bandwidth selection for kernel density estimation. Comput. Stat. 11 (1996), pp. 337–381. [Google Scholar]
  • 11.Mallat S., A Wavelet Tour of Signal Processing, Academic Press, New York, 1998. doi: 10.1016/B978-012466606-1/50008-8. [DOI] [Google Scholar]
  • 12.Palacios-González F. and García-Fernández R.M., A flexible family of density functions. Statistics 49 (2014a), pp. 680–704. doi: 10.1080/02331888.2014.883398. [DOI] [Google Scholar]
  • 13.Palacios-González F. and García-Fernández R.M., Mixtures of mixtures based on multiresolution analysis theory. Commun. Stat. Simul. Comput. 43 (2014b), pp. 723–742. doi: 10.1080/03610918.2012.714031. [DOI] [Google Scholar]
  • 14.Palacios-González F. and García-Fernández R.M., A faster algorithm to estimate multiresolution densities. Comput. Stat. 35 (2020), pp. 1207–1230. doi: 10.1007/s00180-020-00952-w. [DOI] [Google Scholar]
  • 15.Park B.U. and Marron J.S., Comparison of data-driven bandwidth selectors. J. Am. Stat. Assoc. 85 (1990), pp. 66–72. doi: 10.1080/01621459.1990.10475307. [DOI] [Google Scholar]
  • 16.Pittau G.M. and Zelli R., Testing for changing shapes of income distribution: Italian evidence in the 1990s from kernel estimates. Empir. Econ. 29 (2004), pp. 415–430. doi: 10.1007/s00181-003-0175-3. [DOI] [Google Scholar]
  • 17.Schwarz G., Estimating the dimension of a model. Ann. Stat. 6 (1978), pp. 461–464. [Google Scholar]
  • 18.Silverman B.W., Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, New York, 1986, pp. 34–72. [Google Scholar]
  • 19.Scott D.W., Tapia R., and Thompson J.R., Kernel density estimation revisited. Nonlin. Anal. 1 (1997), pp. 339–372. doi: 10.1016/S0362-546X(97)90003-1. [DOI] [Google Scholar]
  • 20.Shaoping W., Ang L., Kuangyu W., and Ximing W., Robust kernels for kernel density estimation. Econ. Lett. 191 (2020), pp. 109138. doi: 10.1016/j.econlet.2020.109138. [DOI] [Google Scholar]
  • 21.Sheather S.J. and Jones M.C., A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B 53 (1991), pp. 683–690. doi: 10.1111/j.2517-6161.1991.tb01857.x. [DOI] [Google Scholar]
  • 22.Huang S.H., Density estimation by wavelet-based reproducing kernels. Stat. Sin. 9 (1999), pp. 137–151. [Google Scholar]
  • 23.Wojtaszczyk P., A Mathematical Introduction to Wavelets, Cambridge University Press, London, 1979. [Google Scholar]
  • 24.Ziane Y., Adjabi S., and Zougab N., Adaptive Bayesian bandwidth selection in asymmetric kernel density estimation for nonnegative heavy-tailed data. J. Appl. Stat. 42 (2015), pp. 1645–1658. American Statistical Association, vol. 91, no. 433, pp. 401–407. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES