Abstract
This paper is motivated by a wide range of background correction problems in gene array data analysis, where the raw gene expression intensities are measured with error. Estimating a conditional density function from the contaminated expression data is a key aspect of statistical inference and visualization in these studies. We propose re-weighted deconvolution kernel methods to estimate the conditional density function in an additive error model, when the error distribution is known as well as when it is unknown. Theoretical properties of the proposed estimators are investigated with respect to the mean absolute error from a “double asymptotic” view. Practical rules are developed for the selection of smoothing-parameters. Simulated examples and an application to an Illumina bead microarray study are presented to illustrate the viability of the methods.
Keywords: measurement error, gene microarray, conditional density, deconvolution, ridge parameter, kernel, bandwidth selection
1. Introduction
Measurement error problems have attracted a great deal of interest in the past two decades. A variety of models and methods for the problems have been applied in scientific fields, such as medicine, economy, and astronomy. Statistical deconvolution is an important component in measurement error models. The fundamental objective of deconvolution is to recover the unknown probability density function of a random variable when its observed values are contaminated with error. Let X be the variable of interest, which can not be observed directly. Instead, we observe a sample of W,
| (1) |
where Xj's are identically distributed as X, Uj's are identically distributed as U, and they are totally independent. The most popular approach to estimate the density of X is the deconvolution kernel estimator through applying an inverse Fourier transform and a kernel technique (Carroll and Hall, 1988; Stefanski and Carroll, 1990; Fan, 1991a,b). Other estimation procedures include the truncated Fourier inversion method (Diggle and Hall, 1993), the wavelet-based method (Fan and Koo, 2002), the penalization approach (Comte and Lacour, 2011), among others. Deconvolution problems based on more complicated model settings have also been extensively studied. Delaigle and Meister (2008), Wang et al. (2010), and McIntyre and Stefanski (2011) considered the problems of heteroscedastic measurement errors. Hall and Maiti (2009) investigated nonparametric deconvolution methods in two-level mixed models. Neumann (1997), Johannes (2009) and Wang and Ye (2012) studied the density deconvolution with unknown error distribution. Delaigle and Meister (2011) investigated kernel deconvolution when the characteristic function of the measurement errors contains zeros. Wang and Wang (2011) discussed fast Fourier transform algorithms in measurement error models and developed an R software package. The literature on deconvolution problems is particularly large and is surveyed in the monograph by Meister (2009).
In this paper, we consider the estimation problem of the conditional density of X given W, fX|W, from the contaminated data Wj's. The problem is motivated by a wide range of background correction problems in gene array data analysis. Gene microarray techniques have become very popular in medical studies. A microarray is a collection of microscopic DNA spots attached to a solid surface. Hundreds of thousands of gene expression values are obtained from one array chip simultaneously. However, reading the expression values from a microarray is a noisy measurement process. The sources of measurement error include, for instance, irregularities in the array surface, variations in the laboratory process, different image scanner settings, and dye effects.
Typically, the first step in gene array data analysis is known as background correction, which refers to adjustments to the contaminated data intended to remove measurement error from the measured signal. Estimating the conditional density function from contaminated gene expression data, therefore, plays a key aspect of statistical inference and visualization here. It provides the most informative summary of the relationship between the contaminated gene intensities and the unobserved true signals. The current popular model of background correction in bioinformatics is the normal-exponential model (Irizarry et al., 2003; Silver et al., 2009). It assumes that the observed intensity is equal to the true intensity plus the background noise, where the true signal follows an exponential distribution with mean α, and the background noise follows a normal distribution with mean μ and variance σ2. However, the validity of the parametric assumptions is unknown in real gene array studies. Thus, it is of particular interest to nonparametrically estimate the conditional density from the contaminated gene intensities.
A variety of papers discuss the nonparametric conditional density estimation when bivariate data are available. Hyndman et al. (1996) studied a kernel estimator. Bashtannyk and Hyndman (2001); Fan and Yim (2004) proposed several rules for selecting smoothing parameters. De Gooijer and Zerom (2003) proposed a modification of the Nadaraya-Watson type of smoother. Hall et al. (2004) discussed cross-validation and the estimation of conditional probability densities. Efromovich (2007) studied the conditional density estimation in a regression setting.
Unlike the conventional conditional density estimation problem from bivariate data, the observations for the variable-of-interest, X, are not available in the measurement error problem. In this paper, we investigate the estimation of the conditional density fX|W from the only-available contaminated sample Wj's under the model (1). In Section 2, estimators of fX|W are constructed in case of a known and an unknown error density. In Section 3, theoretical properties of the estimators are investigated with respect to the mean absolute error. In Section 4, practical rules are developed for the selection of smoothing-parameters. Simulated examples and an application to an Illumina Bead microarray study are presented in Section 5. The proofs of theorems are given in the Appendix and some additional asymptotic results are provided in the supplement of the article.
2. Methodology
Under the additive measurement error model (1), let fX, fU, and fW be the density functions of X, U, and W, respectively. Denote fX,W(x, w) as the joint density of (X, W). The conditional density of X given W = w is
| (2) |
2.1. Estimation of fX|W with known error distribution
If one assumes that the error density fU is known explicitly, fX can be estimated by the classical deconvolution kernel approach. It is given by,
| (3) |
where
| (4) |
is known as the deconvoluting kernel, and h > 0 is a smoothing parameter. In (4), ϕU is the characteristic function of U, and ϕK(t) = ∫eitxK(x)dx is the Fourier transform of K(x), a symmetric probability kernel with a finite variance ∫x2K(x)dx < ∞. Under the common assumption that ϕK is compactly supported and ϕU does not vanish on the real line, the deconvoluting kernel K*(·) is well defined and finite.
The estimation of fX|W is naturally approached by replacing fX with its deconvolution kernel estimator and fW with its ordinary kernel estimator in (2). This results in a re-weighted deconvolution kernel estimator (RDKE), defined by
| (5) |
where
| (6) |
Lb(·) = L(·/b)/b, L(·) is a real non-negative kernel function, and b is the bandwidth that associates with the kernel density estimate of fW. The estimator (5) is called RDKE in that, comparing with the conventional deconvolution kernel estimator, 1/n in (3) is replaced by a weight function τ̂(·) in this new estimator.
2.2. Estimation of fX|W with unknown error distribution
In practice, the exact distribution of measurement error is typically unknown. Thus, one often conducts a separate independent experiment to collect an additional noise sample. For example, in Illumina bead microarray studies, the additional noise sample is always available for each gene array. Density deconvolution from a contaminated sample coupled with an additional noise sample have been studied by Neumann (1997), Johannes (2009), and Wang and Ye (2012). For the conditional density estimation in the presence of error with unknown distribution, we propose here a ridge-based re-weighted kernel estimator. Denote that {U0j},j = 1, ⋯, n0 are direct observations from the error distribution, which are independent and identically distributed as U. The empirical characteristic function of U is then . Our estimator takes the form
| (7) |
where , and
| (8) |
| (9) |
In this estimator, Ga(·) = G(·/a)/a and Lb(·) = L(·/b)/b, where G and L are real, non-negative kernel functions, a, b are the bandwidths associated with the kernel density estimate of fU and fW, respectively.
We introduce a ridge parameter ρ in (8) in order to prevent ϕ̂U from being too close to zero. When the error distribution is unknown, ϕU might be estimated from its empirical counterpart, ϕ̂U. However, estimating the conditional density becomes unstable if one applies ϕ̂U without any regularization. Note that ϕU can be estimated by ϕ̂U (t) at each point t with the rate . The estimator ϕ̂U is reasonable as |ϕ̂U| ≫ n0−1/2, while it becomes unstable as |ϕ̂U| ≪ n0−1/2. Hence, in practice, we simply take the ridge parameter in order to avoid the difficulty of simultaneously selecting multiple tuning parameters in (8).
3. Theoretical properties
We present here the asymptotic results based on the mean absolute distance between f̂X|W and fX|W. Define the mean absolute error (MAE) by MAE(f̂X|W) =
|f̂X|W − fX|W|, which is the “local” analogue of the L1 distance between fX|W and its estimate. The L1 view of conventional kernel density estimation for error-free data had attracted great attentions. Devroye (1985) gave distinct illustrations of the mathematical attractions of L1 distance. For instance, it is always well-defined on the space of density functions, and it is invariant under monotone transformations. Studying the MAE properties is also due to the practical interest in gene array background correction. Some additional theoretical results based on the mean square distance can be found in the supplement of this paper.
In measurement error problems, the quality of a sample does not only depend on its size but also crucially relates to the magnitude of the error variance σ2. This phenomenon was observed by Fan (1992) and had been extensively studied by Delaigle (2008). Delaigle (2008) made the following argument for an alternative asymptotic view of measurement error problems. For standard asymptotic theory, n may not particular large but statisticians are interested in analyzing theoretical properties for the unrealistic situation (n → ∞). It allows us to reveal the properties of an estimator when the sample size is not too small. Therefore, in measurement error problems, just as any given sample size can be considered as a finite sample approximation of n → ∞, any given σ2 can be also considered as a finite sample version of σ2 → 0. Considering the asymptotic properties where both σ2 → 0 and n → ∞ (named as the double asymptotics by Delaigle (2008)) provides a more appropriate way to uncover important properties of an estimator when σ2 is not too large.
Following Delaigle's recommendation, we study the double asymptotic properties of our estimators by considering both σ2 → 0 and n → ∞. The rate of convergences under the standard asymptotic theory framework are also provided in this section. Let us rewrite the model (1) as:
| (10) |
where Uj = σZj, Zj's are independent and identically distributed as Z with the density fZ and the variance var(Z) = 1. Hereinafter, the asymptotic properties are for both σ2 → 0 and n → ∞, when we refer to model (10); the asymptotic properties are for n → ∞ only, when we refer to model (1). We always assume that the density function fW is strictly positive and has the continuous second derivative. The unknown probability density function fX belongs to the class
where α ∊ [0,1), m ∊ℕ, and B > 0 are known constants. We consider two classes of errors as in the classical deconvolution literature: ordinary smooth error U of order β if
for some positive constants d0,d1,β; and supersmooth error U of order β if
for some positive constants d0, d1, γ, β and constants β0, β1. We further state the conditions for the kernel functions:
(A1) K is bounded, continuous, and ∫ |y|m+α|K(y)|dy < ∞. Moreover, its characteristic function, ϕK, is symmetric and satisfies ϕK(t) = 1 + O(|t|m+α) as t→0.
(A2) The kernel function L(x) is a real, non-negative, even kernel function on ℝ such that ∫ L(x)dx = 1, ∫ xL(x)dx = 0, and ∫ x2L(x)dx < ∞. The kernel function G(·) follows the same conditions.
Condition (A1) asserts that K is basically a kernel function of order m + α.
3.1. Theoretical properties when fU is known
We use c(n) ∼ b(n) to represent d2c(n) ≤ b(n) ≤ d3c(n) for some constants d2, d3 > 0. c(n) ≫ b(n) (resp. c(n) ≪ b(n)) represents b(n) = o(c(n)) (resp. c(n) = o(b(n))).
(B1) ∫ |t|β|ϕK(t)|dt < ∞ and ∫ |t|2β|ϕK(t)|2 dt < ∞.
Theorem 1
Assume that the variable Z in (10) is ordinary smooth of order β. Under the assumptions (A1), (A2), (B1), and ,
- if and , then ∀x, w ∊ ℝ and ∀ fX ∊ Ψm,α,B,
if and , then for all x, w ∊ ℝ and for all fX ∊ Ψm,α,B,
Remark 1. For the model (1), by considering σ to be a constant, one gets
Notice that the assumption is standard in the kernel density estimation and is standard in the deconvolution kernel density estimation. If m + α < 4β + 2, then . Therefore, the rate of convergence is . On the other hand, if m + α ≥ 4β + 2, then the rate of convergence is . If we assume that fW(·) has higher order derivatives and we choose higher order kernel density function L, for instance 2r-th order, then we can let . In this case, the rate of convergence becomes .
Next we address the theorem in the case of supersmooth error. We need the following condition.
-
(C1) ϕK(t) = 0 for all |t| > 1, that is, ϕK(t) has support on [−1,1]. Moreover, .
More generally, in condition (C1), one can assume that ϕK has compact support [−M, M] for some 0 < M < ∞.
Theorem 2
Assume that the variable Z in (10) is supersmooth with ϕZ(t) ≠ 0 for any t. Under the assumptions (A1), (A2), (C1), and ,
-
if and , then ∀ x, w ∊ ℝ and ∀ fX ∊ Ψm,α,B,
if and with and D < 2m + 2α + 1, then ∀ x, w ∊ ℝ and ∀ fX ∊Ψm,α,B,
Remark 2. For the model (1), by considering σ to be a constant, one has,
The choice of is standard for the kernel density estimation and h = (4/γ)1/β(log n)−1/β is standard for the deconvolution kernel density estimation. Here, the order of the derivatives of fW(·) and the order of the kernel density function L do not change the rate of convergence.
3.2. Theoretical properties when fU is unknown
For the ordinary smooth error distribution, we need the following condition:
(B2) ∫ |ϕK(τ)|τ|2βsdτ <∞ for s = 0,1,2.
Theorem 3
Assume that the variable Z is ordinary smooth of order β but fZ is unknown. Under the assumptions (A1), (A2), (B1), (B2), and by choosing , ,
-
if σ = O(h) and , then for all x, w ∊ ℝ and for all fX ∊ Ψm,α,B,
if , and
then for all x, w ∊ ℝ and for all fX ∊ Ψm,α,B,
Remark 3. For the model (1), by considering σ to be a constant, one has
Theorem 4
Assume that the variable Z is supersmooth with ϕZ(t) ≠ 0 for any t but fZ is unknown. Under the assumptions (A1), (A2), (C1), and by choosing , ,
-
if σ = O(h) and , then for all x, w ∊ ℝ and for all fX ∊ Ψm,α,B,
-
if and , with and D < 2m + 2α + 1, then for all x, w ∊ ℝ and for all fX ∊ Ψm,α,B,
Remark 4. For the model (1), by considering σ to be a constant, one has
The above four theorems offer the double asymptotic view of the proposed RKDEs, which provide a better interpretation of asymptotic behaviors for the estimators than the results from the standard asymptotic view. For either ordinary smooth or supersmooth error, there are two different rates of convergence depending on the error magnitude. For instance, the convergence rate for supersmooth error varies from the rate of the error-free kernel density estimation to the very slow rate of the classical supersmooth deconvolution. The theoretical results support that the quality of the RKDEs depends not only on the sample size but also on the error magnitude. They also support in practice the RKDEs could perform well with a moderate sample size for the case of Gaussian error as long as σ2 is not too large.
4. Selection of smoothing parameters
Bandwidth plays a critical role in implementation of practical estimation using smoothing techniques. The selection of bandwidth in deconvolution kernel density estimation has been broadly studied in the literature (Delaigle and Gijbels, 2004). Here we propose a simple but intuitively appealing method for selecting the smoothing parameters in the conditional density estimation. Let us focus on the case of known error distribution. Our procedure includes two steps. First, we select b for the kernel density estimate . Many bandwidth selection methods are available in classical kernel density estimation. Here we use the one that minimizes mean absolute distance (Hall and Wand, 1988a,b). Second, for given b and w, the proposed bandwidth h for estimating f̂(x|w) using (5) is the minimizer of the mean integrated absolute error (MIAE) of g(x|w) = fU(w − x)fX(x), i.e.,
| (11) |
where ĝ(x|w) = fU(w − x)f̂X(x) and f̂X is the deconvolution kernel estimate defined by (3).
Let K be a m-th order kernel function with ∫ |K(z)zm+1| dz < ∞. Assume that the density function fX has continuous, bounded (m + 1)-th order derivative. Denote Rm(K) = ∫xmK(x)dx and for any positive integer m, and let ǁfX(x)ǁ∞ be the supremum norm of fX. It can be shown that the asymptotic dominating term of the MIAE of g(x|w) is given by
| (12) |
In practice, we evaluate this asymptotic approximation over the exact MIAE because of its rather simple expression. Formula (12) involves the unknown quantities Sm(fX) and ǁfX(x)ǁ∞. We suggest two possible estimators for them and obtain as such an estimator . The first one is a simple normal reference approach. Assume that X is from a normal distribution . Then one calculates μ̂X = μ̂W, where μ̂W and are the sample mean and variance from the observations Wj's. Hence, one can easily calculate Sm(f̂X) and ǁf̂X(x)ǁ∞. The approach may not perform well when X is far from a normal distribution. Our second approach is to estimate f̂X through the classical deconvolution kernel method and then numerically evaluate Sm(fX) and ǁfX(x)ǁ∞. From our simulation experiences, the second method often performs better than the normal reference method although it is more computationally complicated.
The proposed bandwidth hMIAE depends on w. If necessary, one could select a global bandwidth
where the integration is over the region of x and w of interest.
Certainly, one may consider to use the criterion of minimizing mean integrated squared error. Our simulation experiences suggest that there is negligible difference between two criteria in practical bandwidth selection. The above methods of selecting smoothing parameters can be naturally extended to the case of unknown error distribution, where both a and b are pre-determined from additional noise data and contaminated data.
5. Numerical properties
5.1. Simulated examples
We illustrate our methods via two simulated models. We take the kernel K to be a second order kernel function, where , and the kernels L and G to be Gaussian.
Example 1. Consider a normal-exponential convolution model, Wj = Xj + Uj, where Xj is exponentially distributed with mean α, while Uj is normally distributed with mean μ and variance σ2. It can be shown, with some simple transformation and algebra, the conditional density of X give W is
where φ(.) and Φ(.) denote the Gaussian density and distribution functions, respectively. A sample of 1000 was generated from this model, where the parameters were set as follows: α = 10, μ = 2, σ = 2. We assume that the measurement error parameters are known. Formula (5) was applied to estimate the conditional density functions from the contaminated data Wj's. The bandwidths were selected by the proposed method in Section 4. Figure 1, which presents a typical simulated example, displays the estimated conditional densities using the RDKE with known error distribution. In (a)-(c), the estimated conditional densities (solid curves), for (a) w = 5, (b) w = 10, (c) w = 30, are compared with the true densities (dashed curves). The RDKE performed quite well to recover the true functions. (d) shows a “stacked conditional density plot” (Hyndman et al., 1996), which displays a number of densities plotted side by side in a perspective plot. The plot highlights the conditioning which allows us to evaluate the changes of densities over w.
Figure 1.
A simulated example of conditional density estimation with measurement error for the normal-exponential model. In (a)-(c), estimated conditional densities (solid curves) for (a) w = 5, (b) w = 10, (c) w = 30 using the reweighed deconvolution kernel method with known error distribution are compared with the true densities (dashed curves). (d) shows a number of densities plotted side by side in a perspective plot.
Example 2. Consider a normal-normal convolution model, Wj = Xj + Uj, where Xj is normally distributed with mean μ1 and variance , while Uj is normally distributed with mean μ2 and variance . The true conditional density of X give W is
We generated Xj's and Uj's from N(2, 9) and N(0,1) respectively with the sample size n = 1000. In this example, we assumed that the error distribution was unknown. So, an additional noise sample U0j's were generated from N(0,1) with the size n0 = 500. Formula (7) was applied to estimate the conditional density functions from the contaminated data Wj's coupled with the noise data U0j's. Figure 2 shows the estimated conditional densities in a simulated example. In (a)-(c), the estimated conditional densities (solid curves), for (a) w = −2.5, (b) w = 0, (c) w = 2.5, are compared with the true densities (dashed curves). The estimated curves almost coincide with the true curves. (d) exhibits the stacked conditional density plot in a perspective view.
Figure 2.
A simulated example of conditional density estimation with measurement error for the normal-normal model. In (a)-(c), estimated conditional densities (solid curves) for (a) w = −2.5, (b) w = 0, (c) w = 2.5 using the reweighed deconvolution kernel method with unknown error distribution are compared with the true densities (dashed curves). (d) shows a number of densities plotted side by side in a perspective plot.
5.2. An application to an Illumina bead microarray study
Illumina bead microarray is one of most popular microarray platforms in genetics. One distinctive feature of Bead array technology by Illumina Inc. is that more than one thousand “control bead types” in addition to gene sequences are allocated in each array. These control beads do not correspond to any expressed sequences in the genome. The data of control beads present the additional noise sample that is used to evaluate the noise distribution on the array in an experiment.
We illustrate our methods using the Illumina microarray data from a leukemia study by Ding et al. (2008). The leukemia study was to investigate the pathogenesis of leukemia. Irradiated mice who subsequently developed acute myeloid leukemia (AML) were involved to study the leukemogenic process. Illumina Mouse-6 V1 BeadChip mouse whole-genome expression arrays were used to obtain the gene expression profiles of AML samples. Ding et al. (2008) considered the normal-exponential model for the observed gene intensities in their analysis. Here we demonstrate the analysis of condition density estimation for the third bead array. Other bead arrays showed similar results. The intensity values of 46120 genes and 1655 negative controls were obtained from this array. We estimate the conditional densities nonparametrically using the reweighed deconvolution kernel method with unknown error distribution. The results are displayed in Figure 3. In (a)-(c), the solid curves denote estimated conditional densities for the observed intensity (a) w = 180, (b) w = 190, (c) w = 200. For comparison, estimated conditional densities based on the parametric normal-exponential model are also displayed (dashed curves). The nonparametric fit shows that conditional densities are right-skewed, which are away from the fit based on the parametric model. The stacked conditional density plot in (d) unveils the evolution of conditional densities over the observed intensities. The results suggest that the normal-exponential assumption appears not to be a realistic assumption in the study. A model that relaxes the parametric assumptions may be useful for the gene background correction here.
Figure 3.
The analysis of the Illumina bead microarray data. In (a)-(c), estimated conditional densities (solid curves) for (a) w = 180, (b) w = 190, (c) w = 200 using the reweighed deconvolution kernel method with unknown error distribution are compared with the estimated conditional densities (dashed curves) using the normal-exponential method. (d) shows the nonparametric conditional density estimates plotted side by side in a perspective plot.
Supplementary Material
Acknowledgments
The authors are grateful to the editor and the reviewers for their valuable comments. The research of XFW is supported in part by NIH UL1 RR024989, and the research of DY is supported by a NSERC grant and a grant from Memorial University of Newfoundland.
Appendix
We now outline the key ideas of the proofs. We shall use Proposition 31.8 of Port (1994), which is restated as Lemma 1.
Lemma 1
Let q1(Xi) and q2(Xi) be two random variables with means μ1 and μ2, variances ν1 and ν2 respectively, and with covariance ν12. Let {X1, ⋯,Xn} be an i.d.d. sequence of random variables and define
Then the second-order approximation of
R̂ is
and the first-order approximation of var(R̂) is
.
Employing Lemma 1 to random variables q1(Wj) = 1 and , one has
| (.1) |
| (.2) |
where is the kernel estimator for fW.
Lemma 2
Let τ(w|x) = fU(w − x)/fW(w) and τ̂(w|x) be as in (9). By choosing and b ∼ n−1/5, one has, for all x,w ∊ ℝ,
| (.3) |
| (.4) |
Proof. Let us prove formula (.3) first. Recall that and are independent. Consider with
Then, one has (see for instance Lemma 1 of Hyndman et al. (1996))
| (.5) |
| (.6) |
where , and R(L) = ∫ L2(ω) dω.
By formulas (.1) and (.5), one has, for all x,w,
Similarly, formulas (.1), (.2), (.5), and (.6) imply
Hence, and b ∼ n−1/5 implies the desired result in Lemma 2.
To prove formula (.4), one can repeat the above calculation by removing all terms involving a, since here f̂U is replaced by fU.
Proof of Theorem 1
Recall that f̂X|W(x|w) = τ̂0(w|x)f̂X(x). We need the following inequality, which is a direct consequence of the triangle inequality and Cauchy-Schwarz inequality:
| (.7) |
It is enough to estimate
|f̂X
− fX|2. Under assumptions of Theorem 1 and by Taylor extension formula, one has (see e.g. Fan (1991b))
| (.8) |
Recall that supf∊Ψm,α,B f(x)≤C for some constant C > 0 (see Bickel and Ritov (1988)). By results in Fan (1991b) (see page 1266), one has
| (.9) |
where we have used ϕU(t) = ϕσZ(t) = ϕZ(σt). Since random variable Z is ordinary smooth, there exists a constant Q > 0 such that
for positive constants d0, d1 and β. Hence, by assumptions of Theorem 1,
| (.10) |
-
σ = O(h). Note that condition (B1) implies that ∫ |ϕK(t)|2 dt < ∞. Inequality (.10) implies . By (.8),
(.11) Combining with (.4) and (.7), one gets the desired conclusion, by choosing b ∼ n−1/5, h ∼ n−1/(2m+2α+1), and as for m + α ≥ 2.
-
σ ≫ h. Inequality (.10) implies var)f̂X(x)) ≤ O(σ2βn−1h−(2β+1)) (see also Delaigle (2008)).
The conclusion follows from (.4), (.7), and (.11), by choosing b ∼ n−1/5 and , which implies .
Proof of Theorem 2
σ = O(h) implies that |σ/h| can be bounded from above by a constant. Hence, formula (.9) implies , by condition (C1). The rest of the proof is same as that of the (i) in Theorem 1.
-
σ ≫ h. Since random variable Z is supersmooth, there exists Q > 0 s.t.,
Hence, by assumptions of Theorem 2 and (.9),
with l = 0 if β0 ≥ 0 and l = 2β0 if β0 < 0. Then, . Combining with (.4), (.7), and (.11), one gets the desired conclusion, by taking b ∼ n-1/5, and with and 0 < D < 2m + 2α + 1.
Proof of Theorem 3
For all x ∊ ℝ, we consider an estimator of fX(x) as
Cauchy-Schwarz inequality and Fubini's theorem imply that
As showed in Wang and Ye (2012), one has
Recall that , ∫ | ϕK (th) | dt = O (1/h) by condition (B2), and |ϕW(t)| = |ϕX(t)ϕU(t)| ≤ |ϕU(t)| = |ϕZ(σt)|. One has
Similar to the proof of inequality (.7), one has,
| (.12) |
-
σ = O(h). Take . Similar to the calculation of inequality (.10), . Recall that the proof of (i) in Theorem 1 gives var(f̂X) = O((nh)−1). Hence, combining with (.8), for all x∊ ℝ, and for all fx ∊ Ψm,a,B,
| f̂X(x)) − fX(x)|2 = O(h2m+2α) + O(1/(nh)). By Cauchy-Schwarz inequality,Combining with inequalities (.3) and (.12), one gets the desired result in Theorem 3, by choosing m + α ≥ 2, and b ∼ n−1/5.
-
Let σ ≫ h. Let us take , which implies . Similar to (.10), one has . Recall that the proof of (ii) in Theorem 1 gives ν̄1 = O(σ2βh−(2β+1)). Hence, combining with (.8), one has
Therefore, for all x ∊ ℝ and for all fX ∊ Ψm,α,B,
As in (i), the conclusion in (ii) follows from inequalities (.3) and (.12), and b ∼ n−1/5.
Proof of Theorem 4
σ = O(h) implies that |σ/h| can be bounded from above by a constant. Then by (.9), . The rest of the proof is same as that of (i) in Theorem 3.
-
Let σ ≫ h. As in (ii) of Theorem 2, one has, and . By the choices of the bandwidth h, one has, for all x ∊ ℝ and for all fX ∊ Ψm,a,B,
Combining with inequalities (.3) and (.12), one gets the desired results, by choosing and b ∼ n−1/5.
Derivation for the asymptotic dominating term of MIAE(ĝ(x|w))
Let K be a m-th order kernel function with ∫ |K(z)zm+1|dz < ∞. Assume that the density function fX has continuous, bounded (m + 1)-th order derivative. Notice that one can bound MIAE(ĝ(x|w; h)) from above as follows.
It is known that the bias of f̂X is
where, by assumption on fX, |O(hm+1) | ≤ c · hm+1, with c > 0 a constant. Hence if exists, then
For the variance, one has
where ǁfX(x)ǁ∞ denotes the supremum norm of fX (refer to the last formula on page 1266 in Fan (1991b)). Therefore,
That is,
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Bashtannyk D, Hyndman R. Bandwidth selection for kernel conditional density estimation. Computational Statistics & Data Analysis. 2001;36(3):279–298. [Google Scholar]
- Bickel PJ, Ritov Y. Estimating integrated squared density derivatives: sharp best order of convergence estimates. Sankhyā: The Indian Journal of Statistics, Series A. 1988;50(3):381–393. [Google Scholar]
- Carroll RJ, Hall P. Optimal rates of convergence for deconvolving a density. Journal of the American Statistical Associations. 1988;83:1184–1186. [Google Scholar]
- Comte F, Lacour C. Data-driven density estimation in the presence of additive noise with unknown distribution. Journal of the Royal Statistical Society Series B-Methodological. 2011;73(4):601–627. [Google Scholar]
- De Gooijer J, Zerom D. On conditional density estimation. Statistica Neerlandica. 2003;57(2):159–176. [Google Scholar]
- Delaigle A. An alternative view of the deconvolution problem. Statistica Sinica. 2008;18(3):1025–1045. [Google Scholar]
- Delaigle A, Gijbels I. Practical bandwidth selection in deconvolution kernel density estimation. Computational Statistics & Data Analysis. 2004 Mar;45(2):249–267. [Google Scholar]
- Delaigle A, Meister A. Density estimation with heteroscedastic error. Bernoulli. 2008;14(2):562–579. [Google Scholar]
- Delaigle A, Meister A. Nonparametric function estimation under Fourier-oscillating noise. Statistica Sinica. 2011;21:1065–1092. [Google Scholar]
- Devroye L. A note on the L1 consistency of variable kernel estimates. Annals of Statistics. 1985;13(3):1041–1049. [Google Scholar]
- Diggle PJ, Hall P. A Fourier approach to nonparametric deconvolution of a density estimate. Journal of the Royal Statistical Society Series B-Methodological. 1993;55(2):523–531. [Google Scholar]
- Ding LH, Xie Y, Park S, Xiao G, Story MD. Enhanced identification and biological validation of differential gene expression via illumina whole-genome expression arrays through the use of the model-based background correction methodology. Nucleic Acids Research. 2008;36(10):e58. doi: 10.1093/nar/gkn234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Efromovich S. Conditional density estimation in a regression setting. Annals of Statistics. 2007;35(6):2504–2535. [Google Scholar]
- Fan J. Asymptotic normality for deconvolution kernel density estimators. Sankhyā: The Indian Journal of Statistics, Series A. 1991a;53(1):97–110. [Google Scholar]
- Fan J. On the optimal rates of convergence for nonparametric deconvolution problems. Annals of Statistics. 1991b;19(3):1257–1272. [Google Scholar]
- Fan J. Deconvolution with supersmooth distributions. Canadian Journal of Statistics. 1992;20(2):155–169. [Google Scholar]
- Fan J, Koo J. Wavelet deconvolution. IEEE Transactions On Information Theory. 2002;48(3):734–747. [Google Scholar]
- Fan J, Yim T. A crossvalidation method for estimating conditional densities. Biometrika. 2004;91(4):819–834. [Google Scholar]
- Hall P, Maiti T. Deconvolution methods for non-parametric inference in two-level mixed models. Journal of the Royal Statistical Society Series B-Statistical Methodology. 2009;71:703–718. [Google Scholar]
- Hall P, Racine J, Li Q. Cross-validation and the estimation of conditional probability densities. Journal of the American Statistical Association. 2004;99(468):1015–1026. [Google Scholar]
- Hall P, Wand MP. Minimizing L1 distance in nonparametric density estimation. Journal of Multivariate Analysis. 1988a Jul;26(1):59–88. [Google Scholar]
- Hall P, Wand MP. On the minimization of absolute distance in kernel density estimation. Statistics & Probability Letters. 1988b Apr;6(5):311–314. [Google Scholar]
- Hyndman RJ, Bashtannyk DM, Grunwald GK. Estimating and visualizing conditional densities. Journal of Computational and Graphical Statistics. 1996;5(4):315–336. [Google Scholar]
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–264. doi: 10.1093/biostatistics/4.2.249. [DOI] [PubMed] [Google Scholar]
- Johannes J. Deconvolution with unknown error distribution. Annals of Statistics. 2009;37(5A):2301–2323. [Google Scholar]
- McIntyre J, Stefanski LA. Density estimation with replicate heteroscedastic measurements. Annals of the Institute of Statistical Mathematics. 2011;63(1):81–99. doi: 10.1007/s10463-009-0220-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meister A. Lecture Notes in Statistics. Springer; New York: 2009. Deconvolution Problems in Nonparametric Statistics. [Google Scholar]
- Neumann MH. On the effect of estimating the error density in nonparametric deconvolution. Journal of Nonparametric Statistics. 1997;7:307–330. [Google Scholar]
- Port SC. Lecture Notes in Statistics. John Wiley; New York: 1994. Theoretical Probability for Applications. [Google Scholar]
- Silver JD, Ritchie ME, Smyth GK. Microarray background correction: maximum likelihood estimation for the normal-exponential convolution. Biostatistics. 2009;10(2):352–363. doi: 10.1093/biostatistics/kxn042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stefanski LA, Carroll RJ. Deconvoluting kernel density estimators. Statistics. 1990;21:169–184. [Google Scholar]
- Wang XF, Fan Z, Wang B. Estimating smooth distribution function in the presence of heterogeneous measurement errors. Computational Statistics and Data Analysis. 2010:25–36. doi: 10.1016/j.csda.2009.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang XF, Wang B. Deconvolution estimation in measurement error models: The R package decon. Journal of Statistical Software. 2011;39(10):1–24. [PMC free article] [PubMed] [Google Scholar]
- Wang XF, Ye D. The effects of error magnitude and bandwidth selection for deconvolution with unknown error distribution. Journal of Nonparametric Statistics. 2012;24(1):153–167. doi: 10.1080/10485252.2011.647024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



