Abstract
The error distribution is generally unknown in deconvolution problems with real applications. A separate independent experiment is thus often conducted to collect the additional noise data in those studies. In this paper, we study the nonparametric deconvolution estimation from a contaminated sample coupled with an additional noise sample. A ridge-based kernel deconvolution estimator is proposed and its asymptotic properties are investigated depending on the error magnitude. We then present a data-driven bandwidth selection algorithm with combining the bootstrap method and the idea of simulation extrapolation. The finite sample performance of the proposed methods and the effects of error magnitude are evaluated through simulation studies. A real data analysis for a gene Illumina BeadArray study is performed to illustrate the use of the proposed methods.
Keywords: deconvolution, unknown error, ridge-based approach, bandwidth selection, SIMEX
1. Introduction
Deconvolution problems have attracted considerable attention in the past two decades. The interest of studying these problems is mainly due to a great number of medical, chemical, astronomical and financial studies in which data are measured with error. The classical deconvolution problem can be described as follows. Suppose that the random variables X1, …, Xn are identically distributed as X. However, Xj’s cannot be observed directly. Instead, we observe
| (1) |
where the measurement errors Uj’s are identically distributed as U, and Xj’s and Uj’s are independent. The goal is to recover the probability density function fX of X from Wj’s.
The popular approach to estimate the density is known as the deconvolution kernel method, where Fourier transform and the kernel smoothing technique are applied to obtain an estimator. Let K(·) be a symmetric probability kernel with a finite variance ∫ x2 K(x) < ∞, φK(t) = ∫ eitx K(x)dx be its Fourier transform with φK(0) = 1, and φU be the known characteristic function of the error variable U. The deconvolution kernel density estimator (Stefanski and Carroll 1990) is defined as
| (2) |
where is the empirical characteristic function of W, and h = h(n) > 0 is a smoothing parameter. Under the common assumption that φK is compactly supported and φU does not vanish on the real line, the estimator f̂X in (2) is well defined and finite.
There is a large amount of literature on deconvolution problems. Early contributions include Carroll and Hall (1988), Stefanski and Carroll (1990), Zhang (1990), Fan (1991a,b), Fan (1992), Neumann (1997) among others. Recent related works include discussions of heteroscedastic measurement errors (Delaigle and Meister 2008; Wang et al. 2010), repeated measurements (Delaigle et al. 2008), estimation of distributions, moments and quantiles (Hall and Lahiri 2008), and density estimation with unknown error distribution (Comte and Lacour 2011; Johannes 2009). Deconvolution problems in image analysis have been extensively studied by Hall and Qiu (2007a,b) and Qiu (2008). Fourier-type deconvolution kernel methods have also been applied to stochastic volatility models in finance (Comte 2004; Van Es et al. 2003). A new R software package decon has been recently developed by Wang and Wang (2011), which contains a collection of functions to deal with deconvolution problems for practical use.
Classical deconvolution problems usually assume that the density function of the measurement error is perfectly known. This is often an unrealistic assumption in practice. In real studies, one could have an additional noise sample from a separate independent experiment. The data ’s are indeed direct observations from the error distribution, i.e., ’s are independent and identically distributed as U. If one does not impose any parametric distribution assumption to U, φU can be estimated by its empirical counterpart, that is,
| (3) |
However, replacing φU (t) in (2) by φ̂U (t) without any regularization is risky, since it is in the denominator of the formula. Neumann (1997) introduced the truncated kernel estimator
where χ(A) is the indicator function of the set A. Thresholding in the Fourier domain by using this indicator function accounts for the uncertainty caused by estimating φU.
Johannes (2009) studied another spectral cut-off estimator that depended on two bandwidth-type parameters. The rate of convergence of the estimator was derived, while the selection of the parameters was not investigated. Recently, Comte and Lacour (2011) presented an interesting approach of data-driven density estimation in presence of error with unknown distribution. There was no kernel smooth function in their estimator. Instead, a regularized parameter in the integral was introduced, which played a bandwidth-type role in the estimation. The estimator is defined as
where g is the parameter to be selected. A penalization approach was proposed to choose g in their paper.
Hall and Qiu (2007a) investigated a ridge-based estimator for deconvolution in multivariate problems and applied it to image deblurring. A ridge function was applied to regularize the empirical function of the Fourier transform of f(x), which prevents the denominator of the estimator from being too close to zero in a direct way. Motivated from their method, in this paper, we propose a ridge-based kernel deconvolution estimator for contaminated data with unknown error distribution. We study the effects of error magnitude on the proposed estimator, since it plays an important role in deconvolution problems. A data-driven bandwidth selection method is developed, which is not restricted to be applicable to the proposed estimator only but can be exploited in other deconvolution density estimators.
The rest of the paper is organized in the following way. In Section 2, we present the estimator and explore its asymptotic properties depending on the error magnitude. In Section 3, a data-driven bandwidth selection method is proposed by combining the bootstrap approach and the idea of simulation extrapolation (SIMEX). In Section 4, we investigate the numerical properties of the estimator. Simulation studies are conducted to examine both the effects of error magnitude and the performance of the bandwidth selection method. Our method is also compared to the penalized method by Comte and Lacour (2011). A real data analysis for a gene Illumina BeadArray study is performed to illustrate the use of the proposed methods. The proofs of Theorems are provided in Section 5.
2. The estimator and its asymptotic properties
Under the additive error model (1), we assume that two samples are observed: a contaminated sample W1, …, Wn and an additional noise sample . By combining the kernel smoothing approach and the ridge-parameter technique, we consider the following regularized deconvolution density estimator for contaminated data with unknown error distribution,
| (4) |
where h is the smoothing parameter, ρm = ρ(m) > 0 is an additional ridge regularization parameter depending on m, φK is the characteristic function of the kernel K, φ̂W and φ̂U are the empirical characteristic functions of W and U, respectively.
The parameters h and ρm need to be appropriately selected to lead to a bias-variance compromise. Note that φU can be estimated by (3) at each point t with the rate m−1/2. φ̂U is an appropriate estimator as |φ̂U| ≫ m−1/2, while it becomes unstable as |φ̂U| ≪m−1/2. Therefore, it is reasonable to take the ridge parameter as,
| (5) |
in (4) in order to avoid the difficulty of selecting two parameters simultaneously. The ridge-based kernel method was also discussed in Meister (2009), where a different ridge parameter was used. Throughout this paper, the ridge function is defined by (5).
The rates of convergence for the conventional deconvolution kernel estimator (2) have been thoroughly studied by Fan (1991b), who characterized the error distributions into two classes: the ordinary smooth and the supersmooth. The difficulty of deconvolution depends heavily on the smoothness of the error distribution. For instance, when the error distribution is normal, the convergence rate is only at O ((log n)−1/2). Such a slow convergence rate seems to indicate that the deconvolution kernel method is infeasible in practice. However, reasonable estimation results could be reached in both simulations and real data analysis. The phenomenon was explained by Delaigle (2008) through studying the asymptotic behavior of the estimator (2) with a “double asymptotical approach”, where the asymptotic properties were derived by considering that both the error variance goes to zero and the sample size goes to infinity.
Here we also derive the asymptotic properties of the ridge-based kernel estimator (4) depending on the error magnitude. Following the discussion by Delaigle (2008), we assume that the error U can be standardized by a variable that has unit variance. That is, U = σZ, where σ is a scale parameter and the variance of Z is one. However, there is no parametric assumption imposed on Z. Thus, the model (1) is rewritten as
| (6) |
where fZ is unknown but V ar(Z) = 1. The additional data are independent and identically distributed as U = σZ.
The common smoothness condition imposed on the unknown density fX is that fX is in the set
where ν ∈ ℕ is the smoothness degree of fX, α ∈ [0, 1) and B > 0 are known constants. As commented in Fan (1991a), the class Fν,α,B is larger than the commonly-used class that formulated in Stone (1982). It also contains the class formulated in Delaigle (2008), namely, f satisfies that both ||f(ν)||∞ and ∫ |f(ν)(x)|2dx are bounded from above by a constant. Indeed, the class Fν,α,B contains many commonly-used densities. For instance, normal, normal mixture, and Cauchy are in Fν,α,B for all ν ∈ ℕ, α = 0 and some B > 0; Gamma(k, θ) (k ∈ ℕ) is also in Fν,α,B for ν ≤ k − 2.
We always assume that |φZ (t)| ≠ 0 for all t ∈ ℝ. The asymptotic results of fX,ρm are derived based on two classes of error distributions as in the classical literature (Fan 1991b): ordinary smooth error Z of order β if
with d0, d1, β, Q some positive constants; and supersmooth error Z of order β if
with d0, d1, γ, β, Q some positive constants and β0, β1 some constants. Ordinary smooth distributions include, for example, gamma, symmetric gamma and Laplacian distributions, while supersmooth distributions include, such as, normal, mixture normal, Cauchy distributions, etc.
We assume that the kernel K satisfies:
-
(A)
K is bounded, continuous, and ∫ |y|ν+2α|K(y)| dy < ∞. Moreover, its characteristic function φK is symmetric and satisfies φK (t) = 1 + O(|t|ν+α) as t → 0.
This condition basically asserts that K is a kernel function with order ν + α. We use c(n) ≫ b(n) (resp. c(n) ≪ b(n)) to represent b(n) = o(c(n)) (resp. c(n) = o(b(n))). Similarly, h ~ a(n) means that there exist constants a1, a2 > 0 such that a1a(n) ≤ h ≤ a2a(n).
For the ordinary smooth fZ, the following theorem gives the rate of convergence of the estimator f̂X,ρm. We need the following additional assumption on the kernel K:
-
(B)
∫ |t|2β|φK (t)|2 dt < ∞.
Theorem 2.1
Assume that Z is ordinary smooth of order β but unknown. Under conditions (A), (B), one has,
- if and , then
- if and , then
The rate of convergence of the estimator f̂X,ρ for the supersmooth error is given by the following theorem. We need another assumption for the kernel K:
-
(C)
φK (t) = 0 for all |t| > 1, that is, φK (t) has support on [−1, 1]. Moreover, ∫ |φK (t)|2[|t|−2β0 + |t|−2β1] dt < ∞.
Theorem 2.2
Assume that Z is supersmooth of order β but unknown. Under conditions (A), (C), one has,
- if and , then
- if and with and D <2ν+ 2α+ 1, then
Conditions (B), and (C) give the restrictions on φK to ensure integrability of the estimator, for the ordinary smooth error and supersmooth error, respectively. In supersmooth error case, Condition (C) imposes a much stronger restriction on the tail behavior of φK; see Fan (1992); Stefanski and Carroll (1990) for further details. A widely used kernel function K for practice in deconvolution literature is the one with φK (t) = (1 − t2)3χ(−1 ≤ t ≤ 1), which satisfies the conditions (A), (B) and (C) (for β0, β1 < 1/2).
The quality of a sample does not only depend on its size but also relates to the magnitude of measurement error in deconvolution problems. Under the model (6), Theorems 2.1 and 2.2 provide a better interpretation of asymptotic behaviors for the estimator f̂X,ρm compared to from a classical asymptotic view of the problem. There are two different rates of convergence depending on the error level. In the case of ordinary smooth error, Part (i) of Theorem 2.1 shows that if , the rate of convergence for f̂X,ρm can be as good as the conventional (error-free) kernel density estimation, while part (ii) of Theorem 2.1 gives a better rate of convergence for f̂X,ρm than that of the classical deconvolution estimator f̂X, if σ is small enough.
In the case of supersmooth error, Theorem 2.2 says that the convergence rate varies from the convergence rate of the error-free kernel density estimation to the very slow rate of the classical deconvolution depending on the error level. If σ → 0, the rate of convergence for f̂X,ρm is better than that of the classical deconvolution. The results help us to explain why the estimator f̂X,ρm for supersmooth unknown error can work well in practice, even with moderate sample sizes. We will also see that it is not necessary to have a small error variance for this theory to be appropriate from the section of the simulation study.
Theorems 2.1 and 2.2 generalize the asymptotic results of Delaigle (2008), who derived the double asymptotic properties of the deconvolution estimator (2) with a known error density. It is noticed from Theorem 2.1 and 2.2 that the sample size of the additional data ’s contributes to the rate of convergence for f̂X,ρm only if m ≤ n.
3. A SIMEX-type bootstrap bandwidth selection method
The selection of bandwidth plays a crucial role in the implementation of practical estimation using deconvolution kernel techniques. This has been extensively studied in the classical deconvolution problems (Delaigle and Gijbels 2004; Hesse 1999). Although several kernel-type regularized deconvolution estimators have been proposed for density deconvolution with unknown error distribution (Johannes 2009; Neumann 1997), the bandwidth selection of them was not investigated. Here we propose a data-driven bandwidth selection algorithm for the ridge-based kernel estimator f̂X,ρm, which combines the bootstrap bandwidth selection and the SIMEX idea. The algorithm can be also applied to the bandwidth selection for other deconvolution kernel estimators.
A natural idea to implement a resampling-based bootstrap method for bandwidth selection in deconvolution with unknown error distribution can be described as follows. We first obtain an initial density estimate to generate bootstrap samples. If the measurement error is ignored, a naive estimate of fX (x) is the ordinary kernel estimate of . This provides a reasonable but overly smoothed estimate of fX (x). The bootstrap data is generated as , where is generated from f̂X,naive, and is generated from , the ordinary kernel density estimate from the additional data ’s.
Then, we construct bootstrap deconvolution kernel estimates , b = 1, …, B, where B is the number of bootstrap samples to be taken. The bootstrap estimate of the optimal bandwidth for the deconvolution kernel estimate is the value h that minimizes over h the bootstrap mean integrated squared error (MISE),
This naive approach, however, tends to select an overly large bandwidth in practice. This is not surprising since we can not observe Xj’s and the naive bootstrap sample above is not a “true” bootstrap sample from X1,…, Xn, but a bootstrap sample with error. The bootstrap sample above is a contaminated sample with a higher error level. Hence, we need an appropriate shrinking factor to adjust ĥ from the naive method. Our algorithm is motivated from Delaigle and Hall (2008) who used SIMEX for the cross-validation bandwidth selection in nonparametric regression with errors-in-variables. MISE* in the above naive method is the bootstrap MISE of the contaminated data Wj’s rather than that of the “true” data Xj’s. Using the SIMEX concept, we may develop two versions of bootstrap MISEs for data with higher error levels, i.e., and for the variables W(1) = W + U and W(2) = W(1) + U, respectively. Then we estimate the optimal bandwidths for data with the higher error levels, and back-extrapolate to yield the final bandwidth of interest. Our SIMEX-type bootstrap bandwidth selection algorithm is described as follows.
Algorithm 1
SIMEX-type bootstrap bandwidth selection for deconvolution with unknown error distribution.
Generate two bootstrap error-inflated samples , and , j = 1,···, n, where is generated from f̂W, and are generated from f̂U independently.
Construct the deconvolution estimates and , with given bandwidths h1, h2, based on the bootstrap samples and .
Repeat the steps (1) and (2) B times to obtain , b = 1, …, B.
-
Estimate for the variable W byand for the variable W(1) bywhere f̂W is the ordinary kernel density estimate of fW, and f̂W(1) is the kernel density estimate of fW(1), given by
Obtain the estimated optimal bandwidths ĥ1 and ĥ2 by minimizing and over h1 and h2.
Select the bandwidth ĥ for fX,ρm (x, h) by using the linear back-extrapolation from the pair (log ĥ1, log ĥ2). This suggests .
The key rationale of this algorithm remains the same as that of in SIMEX, which is to determine the effect of measurement error on the bandwidth experimentally via simulation. The effect of measurement error on a statistic of interest can be studied with a simulation experiment in which additional measurement error is added to the measured data and then the statistic recalculated. In the algorithm, measures in the same way that measures Wj, and that Wj measures Xj. So, we expect that the relationship between ĥ2 and ĥ1 is close to that between ĥ1 and ĥ.
The second step of the classical SIMEX method is extrapolation (Stefanski and Cook 1995). Typically, a regression approach is used to fit an extrapolant function to the “pseudo” statistics. Then, extrapolation to the case of no measurement error yields the final estimate. Note that here an assumption of the extrapolant function needs to be imposed in order to fit the regression curve. One of a few simple functional forms, such as a quadratic function, is often used to make SIMEX an approximate method in practice. In our algorithm, we use the simple linear back-extrapolation function on the logarithm of the bandwidths. That is, the relation log ĥ2 − log ĥ1 ≈ log ĥ1 − log ĥ is applied here, which leads to . In such a case, the shrinking factor to adjust the naive method is ĥ1/ĥ2. The back-extrapolation function we use is the same as in Delaigle and Hall (2008). However, our algorithm is essentially different from theirs. We consider the bootstrap bandwidth selection for density deconvolution, while they studied the cross-validation bandwidth selection in nonparametric regression with errors-in-variables.
Certainly, like all other SIMEX approaches, using this linear back-extrapolation function is a fairly strong assumption. Extrapolating on the logarithm of the bandwidths is based on our experiences from numerical studies. Our numerical simulations indicate that the extrapolant function is a reasonable choice and the algorithm works quite well in practice for deconvolution with unknown error distribution. One may consider a more computationally intensive SIMEX algorithm and/or another extrapolant function.
4. Numerical Studies
4.1. Simulation examples
We illustrate the performance of the proposed methods through simulations in this subsection. Throughout the studies, we take the commonly-used second order kernel for deconvolution problems,
which corresponds to the kernel function
We use the integrated squared error (ISE), defined by ISE(f̂) = ∫ {f̂X (x) − fX (x)}2 dx, to evaluate the performance of deconvolution estimators.
Our first simulation study was to examine the averaged ISEs (also denoted as “MISE”) as a function of the noise level σ. We considered three target densities of X corresponding to (1) X ~ N(0, 1), (2) X ~ 0.5N(−2, 1) + 0.5N(2, 1), (3) X ~ Gamma(2, 1). The measurement errors were generated from U ~ N(0, σ2), where three error levels were considered (σ = 0.4, 0.6, or 0.8). In each case, we set the sample size n = 100, 250, or 1000, and the size of the additional noise data m = 100, or 500, respectively. In the study, we performed a grid search to find the bandwidth h that minimizes ISE under the assumption that the true density function is known. Although it is not a practical bandwidth selection method, it provides an optimistic view of the ISE performance of the proposed estimator. The grid size was set to 25. The numerical integration of ISE was evaluated from min1≤j≤n Xj to max1≤j≤n Xj.
The MISEs averaged over 250 replications are reported in Table 1. For the purpose of comparison, the MISEs of the ordinary kernel density estimates from uncontaminated data are listed, which corresponds to σ = 0 in the table. The MISEs with σ = 0.4 are close to those with no measurement error, which supports our asymptotic conclusion that the deconvolution is not more difficult than ordinary density estimation when the error level is small. For all three target densities, MISEs increase when the error levels increase, and MISEs decrease when the sample sizes increase. The quality of the deconvolution estimates not only depends on the sample size but also is affected by the level of the measurement error variance. We have shown that there are two very different convergence rates depending on the error magnitude in Section 2. Although it is difficult to tell which rate applies in practice, the proposed estimator seems feasible for a moderate sample size even though the error level is not very small in this simulation study.
Table 1.
MISE (expressed in 0.01 units) for the ridge-based kernel deconvolution estimate as a function of the error level σ. The “optimal” bandwidths of the estimates are estimated by assuming that fX is known. The column of σ = 0 corresponds to the MISEs of the kernel estimates from the uncontaminated data.
| Density | n | MISE
|
||||||
|---|---|---|---|---|---|---|---|---|
| σ=0 | m=100 | m=500 | ||||||
|
| ||||||||
| σ=0.4 | σ=0.6 | σ=0.8 | σ=0.4 | σ=0.6 | σ=0.8 | |||
| Normal | 100 | 0.508 | 0.816 | 1.145 | 1.632 | 0.703 | 0.993 | 1.334 |
| 250 | 0.269 | 0.532 | 0.831 | 1.259 | 0.442 | 0.692 | 1.115 | |
| 1000 | 0.097 | 0.296 | 0.501 | 0.857 | 0.232 | 0.405 | 0.682 | |
| Normal mixture | 100 | 0.409 | 0.657 | 0.851 | 1.092 | 0.565 | 0.747 | 1.026 |
| 250 | 0.224 | 0.413 | 0.617 | 0.862 | 0.349 | 0.506 | 0.741 | |
| 1000 | 0.103 | 0.208 | 0.363 | 0.562 | 0.168 | 0.279 | 0.466 | |
| Gamma | 100 | 0.609 | 0.961 | 1.272 | 1.656 | 0.835 | 1.059 | 1.454 |
| 250 | 0.311 | 0.618 | 0.899 | 1.269 | 0.528 | 0.768 | 1.096 | |
| 1000 | 0.131 | 0.394 | 0.639 | 0.964 | 0.315 | 0.526 | 0.797 | |
Next we illustrate our SIMEX-type bootstrap method for bandwidth selection through one of typical simulated examples. In this example, Xj’s were generated from 0.5N(−2, 1) + 0.5N(2, 1). The errors Uj’s and ’s were generated from N (0, 0.52). The sample size n = 1000, the size of the additional noise data m = 500, and the bootstrap size B = 200. The analysis results are displayed in Figure 1. The left panel of the figure shows the three estimated ISEs as the functions of the bandwidth parameter h: the solid line denotes the ISE calculated using the true density function from the original contaminated data; the dashed line denotes the from the first level error-inflated bootstrap samples; and the dotted line denotes the from the second level error-inflated bootstrap samples. The corresponding estimated optimal bandwidths are ĥopt = 0.188, ĥ1 = 0.218, and ĥ2 = 0.239, respectively. Note that the bandwidth increases as the level of measurement error increases. Using the proposed linear back-extrapolation function on the logarithm of bandwidths, our final selected bandwidth . It is very close to ĥopt, the bandwidth assuming that the true density is known. The right panel of Figure 1 shows the density estimates: the ordinary kernel estimate from the uncontaminated sample Xj’s (solid line), the ridge-based kernel deconvolution estimate with the bandwidth ĥ from the contaminated sample Wj’s (dashed line), and the ordinary kernel estimate from Wj’s (dotted line). The true density is denoted by the solid line. The deconvolution estimate with the selected bandwidth works quite well to recover the true density, which is very close to the kernel estimate from the uncontaminated data.
Figure 1.
A simulated example of density deconvolution with unknown error distribution using the SIMEX-type bootstrap bandwidth selection method. The left panel shows the different estimated ISEs as functions of h: ISE with the true density (solid line), (dashed line), and (dotted line). The right panel shows the different density estimates: the ordinary kernel estimate from Xj’s (solid line), the deconvolution estimate from Wj’s (dashed line), and the ordinary kernel estimate from Wj’s (dash-dotted line). The dotted line denotes the true density function.
We finally compare the performance of our estimator with Comte and Lacour (2011)’s penalized estimator. The true model we considered was X ~ 0.5N(−2, 1) + 0.5N(2, 1). The measurement error U was from N(0, σ2). Four different cases of sample sizes were studied, and two levels of measurement error variance were considered: σ = 0.3 or 0.6. The bootstrap size B = 200 for the bandwidth selection. Table 2 displays the MISEs and their associated standard errors from 100 replications. The results show that the performance of our estimator is comparable to that of the estimator of Comte and Lacour (2011). As the measurement error variance is small, the method of Comte and Lacour (2011) is slightly better than ours. As the error variance becomes large and sample size increases, our method appears to become a little better. In general, our kernel estimator seems to be a good alternative to the penalized estimator under the simulation setting.
Table 2.
Comparison of our estimator with the SIMEX-type bootstrap bandwidth selection and the penalized estimator of Comte and Lacour (2011). Table entries without parentheses are the MISEs and entries with parentheses are their associated standard errors (both are expressed in 0.01 units).
| Case | MISE and its standard error
|
||||
|---|---|---|---|---|---|
| Comte & Lacour’s method | The proposed method | ||||
|
| |||||
| σ=0.3 | σ=0.6 | σ=0.3 | σ=0.6 | ||
| n=250 | m=100 | 0.432 (0.033) | 0.711 (0.037) | 0.493 (0.038) | 0.727 (0.041) |
| n=250 | m=500 | 0.387 (0.035) | 0.587 (0.033) | 0.401 (0.037) | 0.584 (0.040) |
| n=1000 | m=100 | 0.247 (0.016) | 0.423 (0.018) | 0.261 (0.013) | 0.416 (0.017) |
| n=1000 | m=500 | 0.207 (0.017) | 0.332 (0.014) | 0.201 (0.013) | 0.313 (0.015) |
4.2. A case study with Illumina BeadArray data
We illustrate the deconvolution method using the Illumina BeadArray data from a leukemia study (Xie et al. 2009). The objective of the leukemia study was to investigate the pathogenesis of leukemia. Irradiated mice who subsequently developed acute myeloid leukemia (AML) were involved to study the leukemogenic process. Illumina Mouse-6 V1 BeadChip mouse whole-genome expression arrays were used to obtain the gene expression profiles of AML samples. More details of the biology experiment can be found in Xie et al. (2009).
Illumina BeadArray has become more and more popular in recent years. The BeadArray technology from Illumina Inc. makes its preprocessing and quality control different from other microarray technologies. BeadArrays are arrays of randomly positioned silica beads of 3 micron diameter. In this leukemia study, the pre-processing of the raw BeadArray image data was routinely carried out using the BeadStudio software developed by Illumina inc. For each bead, the foreground intensity was calculated as a weighted average of signals. The local background, an average of the five dimmest pixels (unsharpened intensities) within the 17 × 17 pixel area around each bead centre, was then subtracted to produce bead summary data from the software (Dunning et al. 2008). Other image processing options (Qiu and Sun 2007; Wang and Ye 2010) may be also feasible here.
One of distinctive features of Illumina technology is that more than one thousand control bead types in addition to gene sequences are allocated in each array. These control beads do not correspond to any expressed sequences in the genome, which are not expected to hybridize to any genes in the RNA samples. This unique design enable us to obtain additional noise data for the non-specific binding effect on the array in an experiment.
The first step of the microarray data analysis is sometimes named as background correction, which is a process of correcting the measurement error effects of the observed gene intensities on an array using information only on that array. It should note that “background” here means the noise from, such as, the non-specific binding on the chip rather than the local image background. The popular method for background correction is the “normexp” method based on an additive measurement error model (Irizarry et al. 2003; Silver et al. 2009; Wang et al. 2011). The model imposes the parametric assumptions that the observed intensity (W) is the sum of the true signal (X, assumed exponentially distributed) and the noise (U, assumed normally distributed).
In the the leukemia study, 46120 genes and 1655 negative controls were randomly allocated on each array. Here we only report the analysis of the data from Array # 4 in the leukemia study, as a demonstrating example, to assess whether the normal-exponential assumption is valid. Denote the observed (local image-background subtracted) gene intensity from the array as Wj, j = 1, …, 46120 and the observed (local image-background subtracted) negative control intensity as , k = 1, …, 1655. Consider the additive error model, Wj = Xj + Uj, where Xj is the true signal and Uj is the noise. We recovered the density of the true signal, fX, without imposing any distribution assumptions from the data Wj’s coupled with ’s. Figure 2 displays the analysis results. The left panel shows the ordinary kernel density estimates of contaminated gene intensities (solid line) and negative control intensities (dashed line). Both data exhibit long-tailed and right-skewed distributions. A Kolmogorov-Smirnov test was performed to test the normality of the negative control data (p < 0.0001). Hence, there is no evidence to support that error is from a normal distribution. The right panel shows the ridge-based kernel deconvolution estimate f̂X, ρm. The SIMEX-type bootstrap method was used for the bandwidth selection, which suggested that the smoothing parameter was 8.14. Note that the location of the recovered density function is shifted to the left about 100 units. This is due to the non-zero mean noise of the non-specific binding. The recovered density shows a “gamma-like” shape. A higher peak is obtained in the recovered function (0.0121 as X = 10.07), comparing with that in the naive estimate from contaminated gene intensities (0.0115 as X = 112.72). The analysis provides some evidences that the normal-exponential model may impose too strong parametric assumptions and may not be appropriate to the Illumina BeadArray background correction.
Figure 2.
A case study of Illumina BeadArray data: the left panel displays the ordinary kernel density estimates of contaminated gene intensities (solid line) and negative controls (dashed line). The right panel displays the ridge-based kernel deconvolution estimate with the SIMEX-type bootstrap bandwidth selection method.
5. Proofs
Parseval’s identity and Fubini theorem imply that
Recall that . From Proposition 31.8 in Port (1994), one has
| (7) |
We now show that
where
To this end, by formula (7), one gets I1 = O(m−1|φU (t)|−4). Following the same line as the proof of Lemma 2.1 in Neumann (1997), one gets I2 = O (min{m−1|φU (t)|−4, |φU (t)|−2})
Therefore, as φW (t) = φU (t)φX(t), |φX(t)| ≤ 1, and φU (t) = φZ(σt), one has
Proof of Theorem 2.1
Let Z have ordinary smooth error fZ. Together with assumptions (A) and (B), one has
In fact, by letting τ = th, one has
| (8) |
Following the same rational as the proof of Theorem 2.1 in Delaigle (2008), on can check that,
Cauchy-Schwarz inequality implies that
-
Let σ = O(h). We have
Clearly, by choosing , one gets the desired result.
- Let σ ≫ h. As long as as min{n, m} → ∞ and h → 0, one has
As in (i), one can then choose , which of course verifies . Combining σ ≫ h with , one can get . The desired conclusion of (ii) follows by the choice of h.
Proof of Theorem 2.2
Let Z have supersmooth error fZ. Under the assumptions (A), (C), similar to the estimation of in (8), we have
where l = 0 if β0 ≥ 0 and l = −2β0 if β0 < 0.
if σ = O(h), the proof is identical to that of (i) in Theorem 2.1.
-
Let σ ≫ h. As long as as min{n, m} → ∞ and h → 0, one has
(9) Now let us choose and , with and D <2ν+2α+1. Clearly our choice of h verifies the condition and the assumption σ ≫ h. The desired conclusion of (ii) is then an immediate consequence by putting the choice of h into (9).
Acknowledgments
We are grateful to the Associate Editor and two reviewers for their valuable suggestions which substantially improved the paper. The research of XFW is supported in part by NIH UL1 RR024989. The research of DY has been initiated and completed by the support of NSF-FRG DMS grants 0652571 and 0652684 (from University of Missouri, Columbia).
References
- Carroll RJ, Hall P. Optimal Rates of Convergence for Deconvolving a Density. Journal of the American Statistical Associations. 1988;83:1184–1186. [Google Scholar]
- Comte F. Kernel Deconvolution of Stochastic Volatility Models. Journal of Time Series Analysis. 2004;25:563–582. [Google Scholar]
- Comte F, Lacour C. Data-driven Density Estimation in the Presence of Additive Noise with Unknown Distribution. Journal of the Royal Statistical Society Series B-Methodological. 2011;73:601–627. [Google Scholar]
- Delaigle A, Gijbels I. Practical Bandwidth Selection in Deconvolution Kernel Density Estimation. Computational Statistics and Data Analysis. 2004;45:249–267. [Google Scholar]
- Delaigle A, Meister A. Density Estimation with Heteroscedastic Error. Bernoulli. 2008;14:562–579. [Google Scholar]
- Delaigle A. An Alternative View of the Deconvolution Problem. Statistica Sinica. 2008;18:1025–1045. [Google Scholar]
- Delaigle A, Hall P. Using SIMEX for Smoothing-parameter Choice in Errors-in-variables Problems. Journal of the American Statistical Association. 2008;103:280–287. [Google Scholar]
- Delaigle A, Hall P, Meister A. On Deconvolution with Repeated Measurements. Annals of Statistics. 2008;36:665–685. [Google Scholar]
- Dunning MJ, Barbosa-Morais NL, Lynch AG, Tavare S, Ritchie ME. Statistical Issues in the Analysis of Illumina Data. BMC Bioinformatics. 2008;9:85. doi: 10.1186/1471-2105-9-85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan J. Global Behavior of Deconvolution Kernel Estimates. Statistica Sinica. 1991a;1:541–551. [Google Scholar]
- Fan J. On the Optimal Rates of Convergence for Nonparametric Deconvolution Problems. Annals of Statistics. 1991b;19:1257–1272. [Google Scholar]
- Fan J. Deconvolution with Supersmooth Distributions. The Canadian Journal of Statistics. 1992;20:155–169. [Google Scholar]
- Hall P, Lahiri SN. Estimation of Distributions, Moments and Quantiles in Deconvolution Problems. Annals of Statistics. 2008;36:2110–2134. [Google Scholar]
- Hall P, Qiu P. Nonparametric Estimation of a Point-Spread Function in multivariate Problems. Annals of Statistics. 2007a;35:1512–1534. [Google Scholar]
- Hall P, Qiu P. Blind Deconvolution and Deblurring in Image Analysis. Statistica Sinica. 2007b;17:1483–1509. [Google Scholar]
- Hesse C. Data-Driven Deconvolution. Journal of Nonparametric Statistics. 1999;10:343–373. [Google Scholar]
- Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U, Speed T. Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics. 2003;4:249–264. doi: 10.1093/biostatistics/4.2.249. [DOI] [PubMed] [Google Scholar]
- Johannes J. Deconvolution with Unknown Error Distribution. Annals of Statistics. 2009;37:2301–2323. [Google Scholar]
- Meister A. Lecture Notes in Statistics. Springer; New York: 2009. Deconvolution Problems in Nonparametric Statistics. [Google Scholar]
- Neumann MH. On the Effect of Estimating the Error Density in Nonparametric Deconvolution. Journal of Nonparametric Statistics. 1997;7:307–330. [Google Scholar]
- Port SC. Lecture Notes in Statistics. John Wiley; New York: 1994. Theoretical Probability for Applications. [Google Scholar]
- Qiu P. A Nonparametric Procedure for Blind Image Deblurring. Computational Statistics & Data Analysis. 2008;52:4828–4841. [Google Scholar]
- Qiu P, Sun J. Local Smoothing Image Segmentation for Spotted Microarray Images. Journal of the American Statistical Association. 2007;102:1129–1144. [Google Scholar]
- Silver JD, Ritchie ME, Smyth GK. Microarray Background Correction: Maximum Likelihood Estimation for the Normal-exponential Convolution. Biostatistics. 2009;10:352–363. doi: 10.1093/biostatistics/kxn042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stefanski LA, Carroll RJ. Deconvoluting Kernel Density Estimators. Statistics. 1990;21:169–184. [Google Scholar]
- Stefanski LA, Cook JR. Simulation Extrapolation: The Measurement Error Jackknife. Journal of the American Statistical Association. 1995;90:1247–1256. [Google Scholar]
- Stone CJ. Optimal Global Rates of Convergence for Nonparametric Regression. Annals of Statistics. 1982;10:1040–1053. [Google Scholar]
- Van Es B, Spreij P, Van Zanten H. Nonparametric Volatility Density Estimation. Bernoulli. 2003;9:451–465. [Google Scholar]
- Wang B, Wang XF, Xi Y. Normalizing Bead-based MicroRNA Expression Data: A Measurement Error Model-Based Approach. Bioinformatics. 2011;27:1506–1512. doi: 10.1093/bioinformatics/btr180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang XF, Fan Z, Wang B. Estimating Smooth Distribution Function in the Presence of Heteroscedastic Measurement Errors. Computational Statistics and Data Analysis. 2010;54:25–36. doi: 10.1016/j.csda.2009.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang XF, Wang B. Deconvolution Estimation in Measurement Error Models: The R Package decon. Journal of Statistical Software. 2011;39:1–24. [PMC free article] [PubMed] [Google Scholar]
- Wang XF, Ye D. On Nonparametric Comparison of Images and Regression Surfaces. Journal of Statistical Planning and Inference. 2010;140:2875–2884. doi: 10.1016/j.jspi.2010.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie Y, Wang X, Story M. Statistical Methods of Background Correction for Illumina BeadArray Data. Bioinformatics. 2009;25:751–757. doi: 10.1093/bioinformatics/btp040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang CH. Fourier Methods for Estimating Mixing Densities and Distributions. Annals of Statistics. 1990;18:806–831. [Google Scholar]


