Image Modeling and Denoising With Orientation-Adapted Gaussian Scale Mixtures

David K Hammond; Eero P Simoncelli

doi:10.1109/tip.2008.2004796

. Author manuscript; available in PMC: 2014 Aug 26.

Published in final edited form as: IEEE Trans Image Process. 2008 Nov;17(11):2089–2101. doi: 10.1109/tip.2008.2004796

Image Modeling and Denoising With Orientation-Adapted Gaussian Scale Mixtures

David K Hammond ¹, Eero P Simoncelli ²

PMCID: PMC4144921 NIHMSID: NIHMS594994 PMID: 18972652

Abstract

We develop a statistical model to describe the spatially varying behavior of local neighborhoods of coefficients in a multi-scale image representation. Neighborhoods are modeled as samples of a multivariate Gaussian density that are modulated and rotated according to the values of two hidden random variables, thus allowing the model to adapt to the local amplitude and orientation of the signal. A third hidden variable selects between this oriented process and a nonoriented scale mixture of Gaussians process, thus providing adaptability to the local orientedness of the signal. Based on this model, we develop an optimal Bayesian least squares estimator for denoising images and show through simulations that the resulting method exhibits significant improvement over previously published results obtained with Gaussian scale mixtures.

I. Introduction

The set of natural photographic images is a distinct subset of the space of all possible 2-D signals, and understanding the distinguishing properties of this subset is of fundamental interest for many areas of image processing. For example, restoring images that have been corrupted with additive noise relies upon describing and exploiting the differences between the desired image signals and the noise process. Here, we characterize these differences statistically.

In a statistical framework, each individual photographic image is viewed as a sample from a random process. Since it is difficult to characterize such a process in a high-dimensional space, it is common to assume that the model should be translation-invariant (stationary). The most well-known example is that of power spectral models, which describe the Fourier coefficients of images as samples from independent Gaussians, with variance proportional to a power function of the frequency [1]. More recently, many authors have modeled the marginal statistics of images decomposed in multiscale bases using generalized Gaussian densities that exhibit sparse behavior (e.g., [2]–[4]). Denoising algorithms based on marginal models naturally take the form of 1-D functions (such as thresholding) that are applied pointwise and uniformly across the transform coefficient domain. However, natural images typically contain diverse content, with high-contrast features such as edges, textured regions, and low-contrast smooth regions appearing in different spatial locations. Thus, a denoising calculation that is appropriate in one region may be inappropriate in another.

One means of constructing a model that is statistically homogeneous, but still able to adapt to spatially varying signal behaviors is by parametrizing a local model with variables that are themselves random. Such doubly stochastic or hierarchical statistical model have found use in a wide variety of fields, from speech processing to financial time series analysis. For image modeling, a number of authors have developed models with hidden variables that control the local amplitudes of multiscale coefficients [5]–[12]. These models can adapt to the spatially varying amplitude of local clusters of coefficients, a feature that clearly differentiates photographic images from noise. The special case in which clusters of coefficients are modeled as a product of a Gaussian random vector and a (hidden) continuous scaling variable is known as a Gaussian scale mixture (GSM) [13]. This model has been used as a basis for high-quality denoising results [14].

One of the most striking features that distinguishes natural images from noise is the presence of strongly oriented content. Although they account for the variability of local coefficient magnitudes, the models mentioned above do not explicitly capture the fact that many local image regions are dominated by a single orientation which varies with spatial location. One method of introducing local adaptation to the GSM model by estimating covariances over local image regions has been recently studied by Guerrero-Cólon, Maneera and Portilla in [15].

In this article, we extend the GSM model to describe patches of coefficients as Gaussian random variables, with a covariance matrix that depends on a set of three hidden variables associated with the signal amplitude, the “orientedness” (a measure of how strongly oriented the local signal content is), and the dominant orientation. The probability distribution over the set of all patches thus takes the form of a mixture of Gaussians that are parametrized by these hidden variables. We validate our model by using it to construct a Bayesian least squares estimator for removing additive white noise. The resulting denoising algorithm yields performance that shows significant improvement over the results obtained with the GSM model [14], and is close to the current state of the art. A brief and preliminary version of this work was described in [16].

II. Statistical Models

The use of multiscale oriented decompositions (loosely speaking, wavelets) has become common practice in image processing. As natural image signals typically produce sparse responses to in the wavelet domain, whereas noise processes do not, modeling the differences between image and noise signals becomes easier in the wavelet domain than in the original pixel domain. Additionally, constructing the model in the wavelet domain allows information at multiple spatial scales to be treated in a consistent manner. The models presented in this article are all models for local patches of wavelet coefficients.

A. Steerable Pyramid Representation

In order to construct a model that can adapt to local orientation, we require a representation that allows measurement of local orientation and rotation of patches of subband coefficients. Critically sampled separable orthogonal wavelet representations are unsuitable for this: the filters in each subband do not span translation-invariant or rotation-invariant subspaces, and, thus, the coefficients cannot be interpolated at intermediate orientations or spatial positions because of the aliasing effects induced by critical sampling.

As an alternative, we use a steerable pyramid (SP) [17], an overcomplete linear transform that has basis functions comprised of oriented multiscale derivative operators. The steerable pyramid transform of height J decomposes an image into a highpass residual subband, J sets of oriented bandpass bands at dyadically subsampled scales, and a residual lowpass band. The full set of bandpass filters for the SP can be obtained from dilations, translations and rotations of a single “mother wavelet.” The SP transform may be constructed with an arbitrary number K of orientation subbands. For example, when K = 2 the bandpass filters are first order derivative operators in the x and y directions. In this case, the oriented subbands of the SP provide a measure of the image gradient at multiple scales.

Unlike separable wavelets, the linear span of SP filters centered at a single point form rotation invariant subspaces. As such, any of the filters may be rotated to any angle simply by taking a linear combination of the full set of filters, and this in turn means that the entire basis may be rotated to any desired orientation. Throughout this paper, we use the phrase “wavelet coefficients” to refer to the steerable pyramid coefficients, unless otherwise indicated.

B. Local Coefficient Patches and the GSM Model

Models based on marginal statistics of wavelet coefficients are appealing due to their relative tractability, but make the implicit assumption that the coefficients are independent. But wavelet coefficients of natural images from neighboring space and scale locations show strong statistical interdependencies. These have been described and exploited by a number of authors for applications such as image coding, texture synthesis, block artifact removal and denoising [6], [18], [19]–[21]. These dependencies arise from the fact that images contain sparse localized features, which induce responses in clusters of filters that are nearby in space, scale, and orientation.

A natural means of capturing these dependencies is to construct multivariate probability models for small patches of wavelet coefficients. For denoising purposes, each coefficient can then be estimated based on a collection of coefficients centered around it [14]. Including a “parent” coefficient from a coarser scale subband, as shown in Fig. 1 allows the model to take advantage of cross-scale dependencies in a natural manner [20], [22]. Note that in this type of local model, we do not segment the image into nonoverlapping patches, since this would likely introduce block-boundary artifacts. We instead applying the local model independently on overlapping patches, and simply ignore the effects of the patch overlap.

Fig. 1 — Generalized neighborhood of wavelet coefficients. The diagram indicates an example generalized patch for a coefficient at position (x) consisting of sibling (s) and parent (p) coefficients.

Perhaps the most noticeable dependency among nearby wavelet coefficients is the clustering of large magnitude coefficients near image features: the presence of a single large coefficient indicates that other large coefficients are likely to occur nearby. This behavior is the primary inspiration for the Gaussian Scale Mixture (GSM) [9]. Letting v ∈ ℝ denote a patch of d coefficients, the GSM model is

v = \sqrt{z} u

(1)

where u is a zero mean Gaussian with fixed covariance C, and z is the spatially varying scalar hidden variable.

The GSM model explains the inhomogeneity in the amplitudes of wavelet coefficients through the action of the scalar hidden variable, z, which modulates a homogeneous Gaussian process. One property of the model is that if the action of this scalar multiplier is “undone,” by dividing an ensemble of coefficient vectors by their hidden variables z, the subsequent ensemble of transformed vectors would have homogeneous Gaussian statistics. It has been observed by several authors that transforming filtered image data by dividing by the local standard deviation yields marginal statistics that are closer to Gaussian [23], [9].

This is illustrated in Fig. 2 for a single steerable pyramid subband. The original subband marginal statistics are far from Gaussian, as can be seen from examining the log histogram. Introduce the notation

g (v; C) = \frac{1}{\sqrt{{(2 π)}^{d} ∣ C ∣}} exp (- \frac{1}{2} {v C}^{- 1} v^{T})

(2)

for the zero mean multivariate Gaussian with covariance C. Each patch x may be viewed as a sample of the Gaussian g(x; zC), where C may be estimated for the entire band by taking the sample covariance of all of the extracted overlapping coefficient patches. For each individual patch x_i, the maximum likelihood estimate of z is given by

{\hat{z}}_{i} = {argmax}_{z} g (x_{i}; z C) = \frac{1}{d} x_{i}^{T} C^{- 1} x_{i}

(3)

Dividing each coefficient by the value of $\sqrt{\hat{z}}$ computed using a neighborhood centered around it gives the transformed subband shown in Fig. 2(c). As can be seen visually, the power of this normalized subband is much more spatially homogeneous than for the original subband. The marginal statistics are also much closer to Gaussian, as may be seen by examining the log-histogram, which is very close to an inverted parabola.

C. OAGSM Model

We extend the GSM model to an orientation-adaptive GSM (OAGSM) model by incorporating adaptation to local signal orientation using a second spatially varying hidden variable θ. Patches under the OAGSM model may be viewed as having been produced through the following generative process. At each location the z and θ variables are first drawn according to a fixed prior distribution. A coefficient patch v is formed by drawing a sample from a fixed multivariate Gaussian process, rotating it by θ (around the center of the patch), and scaling by $\sqrt{z}$ . This implies

v = \sqrt{z} R (θ) u

(4)

where R(θ) is an operator performing rotation about the center of the patch, and u is a zero mean multivariate Gaussian with fixed covariance C₀.

Both rotating using R(θ) and scaling by $\sqrt{z}$ are linear operations. This implies that when conditioned on fixed values for the hidden variables, v is simply a linearly transformed Gaussian and is, thus, itself Gaussian, with covariance

z C (θ) = z R (θ) C_{0} R {(θ)}^{T}

(5)

where the “oriented covariances” C(θ) are defined as rotated versions of base covariance C₀.

The OAGSM model relies on the idea that differences in structure between coefficient patches in different oriented regions may be explained by the action of θ. This is illustrated in Fig. 3, where the structure of two wavelet patches in different oriented regions is clearly similar up to rotation. Attempting to describe the statistics of an ensemble of such patches without accounting for the rotational relationship between them will result in a mixing structures at different orientations together. Such inappropriate data pooling will result in a less powerful signal description as some of the structure will have been averaged out. Conversely, taking advantage of this rotational relationship between coefficient patches can lead to a more homogeneous, easier to describe model.

Fig. 3 — Patches of two-band SP coefficients, displayed as fields of gradient vectors, taken from two oriented regions with different orientations. The structures of the two patches are similar, apart from the change in dominant orientation.

This statement can be made more precise by analyzing the second-order covariance statistics for ensembles of coefficient patches. Analogous to the divisive normalization described in Section II-B, we can “undo” the action of the rotator hidden variable by estimating the dominant local orientation of each patch, and then rotating each patch around its center by the estimated orientation. Performing this “orientation normalization” on every patch of coefficient from a particular spatial scale gives an ensemble of transformed patches with the same dominant orientation. These rotated patches are more homogeneous, and, therefore, easier to describe compactly, than the ensemble of raw original patches.

One simple way to quantify this is to examine the energy compaction properties of the two representations by performing principle component analysis (PCA) on both sets of patches. Let $v_{i}^{raw}$ and $v_{i}^{rot}$ denote the raw and rotated “vectorized” patches extracted from one spatial scale of an image. We define sample covariance matrices for each ensemble and then examine their eigenvalues and eigenvectors. If the eigenvalues are normalized by the trace of the covariance, then they may be interpreted as the fraction of total signal variance that lies along the direction of each corresponding eigenvector. Normalized eigenvalues for C^rot and C^raw are plotted in decreasing order in Fig. 4. Comparing the results for the raw and rotated patches, we see that a greater portion of the total variance in the rotated patches is accounted for by a smaller number of dimensions. This “energy compaction” should translate into an advantage for a denoising method, since it effectively amplifies the signal, but not the noise.

Fig. 4 — Normalized eigenvalues of covariance matrix estimated from coefficient patches drawn from single scale of the pyramid representation of an example image (“peppers”). Dashed curve corresponds to raw patches, and solid curve to patches rotated according to dominant orientation.

\begin{array}{l} C^{raw} = \sum_{i} v_{i}^{raw} {(v_{i}^{raw})}^{T} \\ C^{rot} = \sum_{i} v_{i}^{rot} {(v_{i}^{rot})}^{T} \end{array}

D. OAGSM/NC Model

Although orientation is an important property of natural image patches, images may also contain features such as corners and textures that are not strongly oriented. Modeling such nonoriented areas with the OAGSM process may lead to inappropriate behavior, such as the introduction of oriented artifacts during denoising. To better model such regions, we augment the OAGSM model by mixing it with a nonoriented adaptive process that is a simple GSM. Selection between the oriented and nonoriented processes is controlled by a third spatially varying binary hidden variable δ, which allows adaptation to local signal “orientedness.” The generative process for the this Orientation-adapted GSM with nonoriented component (OAGSM/NC) model is given by

v = {\begin{cases} \sqrt{z} R (θ) u^{ori}, & if δ = 1 \\ \sqrt{z} u^{nor}, & if δ = 0 \end{cases}

(6)

where u^ori is a sample from a Gaussian with covariance C₀ and u^nor is a sample from a Gaussian with covariance C_nor. As before for the OAGSM model, samples from the distribution described by C₀ should represent oriented structure at a fixed nominal orientation. Intuitively, the nonoriented component determined by C_nor is used to describe the “leftovers,” i.e., image regions not well captured by the oriented component of the model.

It follows that the probability distribution for v when conditioned on the hidden variables is

p (v ∣ z, θ, δ) = g (v; δ z C (θ) + (1 - δ) {z C}_{nor}) .

(7)

The complete density on the patch v is then computed by integrating against the prior density of the hidden variables

p (v) = \int p (v ∣ z, θ, δ) p (z, θ, δ) dzd θ d δ .

(8)

We assume a separable prior density for the hidden variables, p(z, θ, δ) = p(z)p(θ)p(δ). Following [14], the prior on z is derived from the so-called Jeffrey’s noninformative pseudo-prior p(z) ∝ (1/z) [10]. This “density” cannot be normalized unless is truncated within some range [z_min, z_max]. The Jeffrey’s pseudo-prior is equivalent to placing a uniform density on log z. For the θ hidden variable we use a constant prior.

The oriented covariances used this work are π periodic, i.e., C(θ) = C(θ+π). This is a consequence of the Steerable Pyramid filters being all either symmetric or anti-symmetric with respect to rotation by π. Accordingly, we choose a prior for θ that is constant on the interval [0, π]. This is equivalent to assuming rotation invariance of natural signals, i.e., that oriented content is equally likely to occur at any orientation. While this assumption may not hold exactly for classes of images with strong preference for particular orientations, such as landscapes or city scenes taken with the camera parallel to the horizon, it is a natural generic choice. If desired, the prior could be modified to incorporate higher probabilities for orientations such as vertical or horizontal.

Finally, δ is a binary variable, and its density is simply a two component discrete density. Fixing our notation, we set β to be p(δ = 1), i.e., the prior probability of drawing each patch from the oriented process. Note that integrating over δ in (8) implies

\begin{array}{l} p (v) = β \int g (v; z C (θ)) p (z) p (θ) dzd θ + (1 - β) \int g (v; {z C}_{nor}) p (z) d z \\ = β p_{ori} (v) + (1 - β) p_{nor} (v) . \end{array}

(9)

This expression makes it clear that as β varies between 0 and 1, the OAGSM/NC model interpolates between the oriented and nonoriented component models. Unlike the priors for z and θ which we assume are fixed and not data adaptive, we treat β as a parameter to be estimated for each wavelet subband. This allows the model to handle image subbands that have different proportions of oriented versus nonoriented content.

III. Estimating Model Parameters

The parameters for the full OAGSM/NC model consist of the oriented covariances C(θ), the nonoriented covariance C_nor and the orientedness prior β. We can fit these parameters to the noisy data for each subband of the SP transform of the image to be denoised.

A. Oriented Covariances

The oriented covariances C(θ) for fixed θ are defined by the expectation E[R(θ)u(R(θ)u)^T]. We compute these by a “patch rotation” method analogous to that used in the example of Fig. 4. At each spatial location, the dominant local orientation is estimated. Covariances are then computed by spatially rotating each of the patches in the subband, and calculating the sample outer product of this rotated ensemble.

In principle, the oriented covariances could also be computed by measuring C₀ = E[uu^T] and applying (5). However, there is a technical difficulty with defining the rotation operator R(θ) for patches of coefficients that are not embedded within a larger subband. Rotating a square patch of coefficients will require access to coefficients outside of the original square, to compute values for the “corners” of the rotated patch which will arise from image signal outside of the original square. This implies that the domain of R(θ) must be larger than its range. In this paper we avoid this difficulty by only attempting to apply R(θ) to patches “in context,” i.e., that are embedded within a complete subband of coefficients.

The OAGSM model describes an ensemble of noisy oriented wavelet patches {w_i}, generated by local hidden variables z_i and ϕ_i, as

w_{i} = \sqrt{z_{i}} R (ϕ_{i}) u_{i} + n_{i}

(10)

where n_i are samples of the noise process. The primary impediment to computing the oriented covariances is that the hidden rotator variables ϕ_i are different for each patch. A method for resolving this difficulty is suggested by the following thought experiment. Suppose one had access to an ensemble of noisy patches w_i that were formed from a single fixed value of the hidden variable, i.e., ϕ_i = θ^* for all i. Taking the sample outer product of this ensemble (suppressing the index i) would give

\begin{array}{l} E [{w w}^{T}] = E [z R (θ^{*}) u u^{T} R (θ^{*})] + E [{n n}^{T}] \\ = E [z] R (θ^{*}) E [{u u}^{T}] R (θ^{*}) + C_{n} \end{array}

(11)

= E [z] C (θ^{*}) + C_{n} .

(12)

Note that we have assumed that the noise and signal are independent. The oriented covariance for angle θ^* may thus be computed by taking the sample outer product of the ensemble, subtracting off the noise covariance C_n, and dividing by the expected value of z. This is feasible, since we assume both the noise process and the distribution of z are known. Note that as the oriented covariance estimates are computed from data, it is possible that they are no longer positive definite after subtracting C_n. In practice, we impose positive definiteness by diagonalizing the estimated covariances and replacing all negative eigenvalues by a small positive constant.

A collection of patches drawn from a fixed value θ^* of the rotator hidden variable is not immediately accessible. Instead, we produce such an ensemble by manipulating the patches present in the given noisy image. This manipulation relies on the idea that rotating an image region around its center will change only the orientation of the region, while preserving the rest of its structure. Given a coefficient patch modeled as a sample of an OAGSM with a particular value ϕ for the rotation hidden variable, rotating the underlying image signal by Δϕ and recomputing the filter responses will give a patch that is equivalent to an OAGSM sample with hidden variable ϕ + Δϕ. The true values of the rotator hidden variables ϕ_i corresponding to the given ensemble of raw patches are unknown, but can be estimated by computing the dominant neighborhood orientation $ϕ_{i}^{*}$ of the noisy patches (see next subsection).

Given a set of noisy patches w_i with measured neighborhood orientations $ϕ_{i}^{*}$ , we rotate each patch by $θ - ϕ_{i}^{*}$ to produce an ensemble of patches that is approximately equivalent to one produced by the OAGSM process with a fixed value θ for the rotator variable. Taking the sample outer product of this rotated ensemble then yields, following (12)

E [(R (θ - ϕ_{i}^{*}) w_{i}) {(R (θ - ϕ_{i}^{*}) w_{i})}^{T}] = E [z] C (θ) + C_{n}

(13)

from which C(θ) can be calculated. This computation must be repeated for each value θ of for which C(θ) is required. In practice, we sample θ at a relatively small number of values.

1) Estimating Neighborhood Orientation

Computing the dominant orientation at each point in space requires a measure of the image gradient. We use the SP with two orientation bands to estimate the local orientations. Note that this does not restrict the order of the SP transform used for denoising, as the estimated orientations may be used to rotate coefficient patches for a SP with a different number of orientation subbands. Consider an m × m patch of two-band SP coefficients v as a collection of m² gradient vectors h_i for i = 1…m². We define the neighborhood orientation ϕ for the patch as the angle of the unit vector k(ϕ) = (cos(ϕ), sin(ϕ))^T that maximizes the sum of squares of inner products

\sum_{i = 1}^{m^{2}} {(k {(ϕ)}^{T} h_{i})}^{2} .

(14)

This is equivalent to the orientation of the eigenvector corresponding to the largest eigenvalue of the 2 × orientation response matrix $M = \sum h_{i} h_{i}^{T}$ . Writing h_i = (a_i, b_i)^T, the dominant orientation is given explicitly by

ϕ = \frac{1}{2} ∠ (2 \sum a_{i} b_{i}, \sum (a_{i}^{2} - b_{i}^{2}))

(15)

where ∠ indicates the angle of the vector whose components are specified by the two arguments.

2) Patch Rotation

We describe the precise form of the patch rotation operator R(θ) first mentioned in (4). Rotating a patch of wavelet coefficients is equivalent to finding the coefficients that would arise if the underlying image signal was rotated around the center of the patch. In order to write this precisely, introduce the following notation. Let f(x,y) be the original image signal. Assume that the number of orientation bands for the steerable pyramid transform being used is fixed. Let $B_{ϕ}^{s} (x, y)$ denote the SP filter in the space domain centered at the origin, with orientation ϕ at scale s.

Let i = 1 …d denote the different coefficients of the wavelet patch. Each coefficient corresponds to a SP filter at a particular location, orientation and scale. Let (x_i,y_i) denote the position of the filter for the th coefficient, relative to the center of the patch. In this work, the filters corresponding to a given patch have the same orientation ϕ. As we are considering patches that include parent coefficients, however, the spatial scale of the filter will depend on the patch location index i.

Let v be the wavelet patch centered at the origin. We then have

v_{i} = \int \int f (x, y) B_{ϕ}^{s_{i}} (x - x_{i}, y - y_{i}) dxdy .

(16)

We are now prepared to calculate R(θ)v. By the definition of patch rotation, we have

{(R (θ) v)}_{i} = \int \int f (r_{- θ} (x, y)) B_{ϕ}^{s_{i}} (x - x_{i}, y - y_{i}) dxdy

(17)

where r_θ(x,y) = (x cos θ + y sin θ, −x x sin θ + y cos θ). This integral will be invariant under the change of variables given by rotating the coordinate axes by x cos θ. This yields

{(R (θ) v)}_{i} = \int \int f (x, y) B_{ϕ - θ}^{s_{i}} (x - x_{i}^{'}, y - y_{i}^{'}) dxdy

(18)

where $(x_{i}^{'}, y_{i}^{'}) = r_{θ} (x_{i}, y_{i})$ . Thus, the i^th coefficient of the transformed patch is given by the response of the original, unrotated image to a filter with orientation ϕ − θ at location ( $x_{i}^{'}, y_{i}^{'}$ ).

Performing patch rotation thus requires computing the response of rotated filters that may not lie on the original sample lattice. These may be computed from the original transform coefficients using the steerability and shift invariance properties of the steerable pyramid. Steerability implies that a filter rotated to any orientation can be decomposed as a sum of K filters at the “standard orientations” ϕ_k = ((k −1)π/K) for k = 1…K. This implies that there exist “steering functions” c_k(·) such that

B_{ϕ - θ}^{s} = \sum_{k = 1}^{K} c_{k} (ϕ - θ) B_{ϕ_{k}}^{s} .

(19)

Details of calculating these c_k(·) may be found in [24], [17].

The shift invariance property of the transform follows from the design of the SP filter responses, which decay smoothly to zero for frequencies approaching π. This implies that there is no aliasing in each subband, so that the responses for each subband to any shifted signal may be computed by interpolating that subband directly. This should be contrasted with the behavior of critically sampled orthogonal wavelet transforms, which have severe aliasing which prevents spatial interpolation off of the sample lattice.

In this work, interpolation is done by first upsampling by a factor of 2^γ in each direction by zero padding the Fourier transform of each subband and inverting, followed by bilinear interpolation. All calculations in this paper where performed using γ =3. Using this shift invariance property, the responses of the “standard orientation” filters can be first interpolated at locations ( $x_{i}^{'}, y_{i}^{'}$ ), off of the original sample lattice, and then transformed using (19) to give the filter responses corresponding to each element (R(θ)v)_i of the rotated patch.

B. OAGSM/NC Parameters

The remaining model parameters for the full OAGSM/NC model are the nonoriented covariance C_nor and the orientedness prior β. C_nor is computed as in [14], by taking the average outer product of the raw, nonrotated coefficient patches. This is essentially the same procedure as described in the previous section, but without applying patch rotation.

We compute β by maximizing the likelihood of the model. The complete OAGSM/NC model can be written as

p (v) = (1 - β) p_{nor} (v) + β p_{ori} (v)

(20)

where, in the presence of noise, we have

\begin{array}{l} p_{nor} (v) = \int g (v; {z C}_{nor} + C_{n}) p (z) d z \\ p_{ori} (v) = \int g (v; z C (θ) + C_{n}) p (z) p (θ) dzd θ . \end{array}

(21)

Given a collection of m patches v^k, typically all of the patches from a particular subband, the log-likelihood function will be

L (β) = \sum_{k = 1}^{m} log ((1 - β)) p_{nor} (v^{k}) + β p_{ori} (v^{k}) .

(22)

Directly maximizing this function over β is problematic, because of the sum present inside of the logarithm. Instead, we employ the Expectation Maximization (EM) algorithm, a widely used iterative method for performing maximum likelihood estimation [25]. The E-step consists of computing the expectation, by integrating over the so-called “missing data,” of the “full data” likelihood given the previous iterate of the parameters to be estimated. The resulting average likelihood function no longer contains any reference to the hidden variables, but is a function of the parameters to be fit. The M-step consists of computing the arg-max of this function, which gives the values of the next iterates of the parameters.

The following brief explanation of EM for estimating the component weights of a mixture model follows the treatment in [26]. For this problem, the “missing data” consists of binary indicator variables $τ_{k}^{δ}$ , where k =1 …m and δ = 0, 1. For each value of k, exactly one of these variables will equal 1, indicating which component the k^th sample arose from, e.g., we have $τ_{k}^{0} = 1$ and $τ_{k}^{1} = 0$ if the kth sample arose from the nonoriented component, or $τ_{k}^{0} = 0$ and $τ_{k}^{1} = 1$ if the kth sample arose from the oriented component.

Let $τ_{k} = (τ_{k}^{0}, τ_{k}^{1})$ , and let τ⃗ indicate the entire set of indicator variables. The complete-data likelihood will be a product of the terms p(v^k, τ_k|β) = p(v^k|τ_k, β)p(τ_k|β)). As τ_k indicates which component v^k arose from, we have

\begin{array}{l} p (v^{k} ∣ \vec{τ}, β) = {(p_{nor} (v^{k}))}^{τ_{k}^{0}} {(p_{ori} (v^{k}))}^{τ_{k}^{1}} \\ p (τ_{k} ∣ β) = {(1 - β)}^{τ_{k}^{0}} β^{τ_{k}^{1}} . \end{array}

(23)

The complete data log-likelihood log(p(V, τ|β)) may then be written as

\sum_{k} τ_{k}^{0} log (p_{nor} (v^{k})) + τ_{k}^{1} log (p_{ori} (v^{k})) + τ_{k}^{0} log (1 - β) + τ_{k}^{1} log (β) .

(24)

As the “missing data” variables τ appear linearly in the above expression, the expectation operation for the E step can be passed into the above sum. The E step may thus be done by replacing each occurrence of $τ_{k}^{δ}$ by $t_{k}^{δ} (β) = E [τ_{k}^{δ} ∣ β, v^{k}] = p (τ_{k}^{δ} = 1 ∣ β, v^{k})$ . By Bayes’ rule this is $p (v^{k} ∣ τ_{k}^{δ} = 1, β) p (τ_{k}^{δ} = 1 ∣ β) / p (v^{k} ∣ β)$ . Evaluating for δ = 0,1 shows

\begin{array}{l} t_{k}^{0} (β) = \frac{p_{nor} (v^{k}) (1 - β)}{(1 - β) p_{nor} (v^{k}) + β p_{ori} (v^{k})} \\ t_{k}^{1} (β) = \frac{p_{ori} (v^{k}) β}{(1 - β) p_{nor} (v^{k}) + β p_{ori} (v^{k})} . \end{array}

(25)

The first two terms of (24) do not contain β and may be ignored. The M step to find the nth iterate for β consists of finding the maximum of

Q (β, β^{n - 1}) = \sum_{k = 1}^{m} t_{k}^{0} (β^{n - 1}) log (1 - β) + t_{k}^{1} (β^{n - 1}) log (β) .

(26)

A simple calculation yields the maximum at

β^{n} = \frac{1}{m} \sum_{k} t_{k}^{1} (β^{n - 1}) .

(27)

Thus, at every step, β is replaced by the average expected probability that each sample arose from the oriented component, conditioned on the previous iterate of β.

In general, the EM algorithm is only guaranteed to converge to a local maximum of the likelihood function. For this problem, however, L(β) is concave down as its second derivative

\frac{d^{2} L}{d β^{2}} = \sum_{k} \frac{- {(p_{ori} (v^{k}) - p_{nor} (v^{k}))}^{2}}{{((1 - β) p_{nor} (v^{k}) + β p_{ori} (v^{k}))}^{2}}

(28)

is negative for all β ∈ (0,1). Thus, L can have only a single maximum, and the EM procedure will converge to the ML estimate for β. In practice for the OAGSM/NC model, roughly 20 iterations are required for reasonable convergence.

IV. Denoising Algorithm

As both a validation of the OAGSM/NC model, as well as a useful practical application, we study the problem of removing additive Gaussian noise from photographic images. We follow a Bayesian formalism, where the OAGSM/NC model is used as a prior distribution for the statistics of clean signal transform coefficients. All denoising calculations are performed in the space of steerable pyramid coefficients.

A. Bayesian Estimator

We model a generalized patch of noisy wavelet coefficients y ∈ ℝ^d as y = x + n where x is the original signal and n is the additive Gaussian noise. Let C_n be the d×d covariance matrix of the noise process in the wavelet domain. Note that as the SP transform is not orthogonal, the noise process in each subband will be correlated even if it is white in the pixel domain.

In the Bayesian framework, both the desired signal x and the noise process n are considered to be random variables, and denoising is performed by statistical estimation. For a particular noisy observation y, there are infinitely many possible signals x that may have produced the given observation. These candidates are not all equally likely, however, and follow the posterior probability distribution p(x|y). One may view the estimating function x̂(y) as selecting an element of this set according to some criterion. One common criterion is to chose x̂(y) minimizing the expected squared error E = ∫ p(x|y)||x̂(y)−x||² dx. The minimum error is achieved for x̂(y) = ∫xp(x|y)dx, the a posterior mean. This Bayesian Minimum Mean Square Error (MMSE) estimator is used for the denoising calculations in the current work.

Both the OAGSM and the OAGSM/NC models described in this paper, as well as the original GSM model, consist of mixtures of Gaussian components parametrized by a set of hidden variables. For models of this form, an exact form for the Bayesian MMSE estimator can be calculated as an integral over the hidden variables. Let κ⃗ denote the hidden variables, where then κ⃗ = z for the original GSM, κ⃗ = (z,θ) for the OAGSM and κ⃗ = (z,θ, δ) for the OAGSM/NC model. By conditioning on the hidden variables the posterior probability can be written as p(x|y) = ∫p(x|κ⃗,y)p_κ⃗(κ⃗|y)dκ⃗. Inserting this into the expression for the MMSE estimator and exchanging the order of integration yields

\hat{x} (y) = \int x (\int p (x ∣ y, \vec{κ}) p (\vec{κ} ∣ y) d \vec{κ}) d x

(29)

= \int p (\vec{κ} ∣ y) (\int x p (x ∣ y, \vec{κ}) d x) d \vec{κ} .

(30)

A key point is that the signal description is Gaussian when conditioned on the hidden variables. This implies that the inner integral on the r.h.s. of (30) is precisely the expression for the MMSE of the Gaussian signal with covariance C(κ⃗) corrupted by the Gaussian noise process n. This is a well known problem, with a linear (Wiener) solution W_κ⃗y = C(κ⃗) (C_n + C(κ⃗))⁻¹y. We thus have

\hat{x} (y) = \int p (\vec{κ} ∣ y) (W_{\vec{κ}} y) d \vec{κ} .

(31)

This is a weighted average of different Wiener estimates, where the weighting is controlled by p(κ⃗|y). It is really this weighting which allows the denoising algorithm to adapt to different local signal conditions. For noisy signal patches that are best described with power z^*, orientation θ^* and orientedness δ^*, the weights p(z, θ, δ|y) will be larger for values closer to (z, θ, δ), and smaller otherwise. The Wiener estimates W_κ⃗y will be more accurate when the hidden variables are closer to those that best describe the signal. As a result, the full estimate x̂(y) will contain more contribution from the Wiener estimates that are more appropriate for the current noisy signal.

Computing the weightings p(κ⃗|y) is straightforward. Applying Bayes theorem gives

p (\vec{κ} ∣ y) = \frac{p (y ∣ \vec{κ}) p (\vec{κ})}{p (y)} = \frac{p (y ∣ \vec{κ}) p (\vec{κ})}{\int p (y ∣ \vec{κ}) p (\vec{κ}) d \vec{κ}} .

(32)

When conditioned on κ⃗, y is the sum of the Gaussian signal with covariance C(κ⃗) and the noise process. As the signal and noise are assumed independent, their sum is a Gaussian with covariance C(κ⃗) +C_n. This implies p(y|κ⃗) = g(y;C(κ⃗)+C_n). The term p(κ⃗) is the prior probability of the hidden variables, as described previously in Section II-D. Substituting these into (31) gives the full Bayes MMSE estimator

\hat{x} (y) = \frac{1}{Z} \int (W_{\vec{κ}} y) g (y; C_{n} + C (\vec{κ})) p (\vec{κ}) d \vec{κ}

(33)

where Z = ∫ g(y; C_n + C(κ⃗))p(κ⃗)dκ⃗.

In practice, we discretize the continuous hidden variables z and θ and the integral in (33) is approximated using a finite sum. For the results presented in this work, we used 13 sample values for z and 16 sample values for θ. As mentioned in Section II-D, the Jeffrey’s pseudo-prior p(z) ∝ (1/z) used for z is formally equivalent to placing a uniform density on log z. As z is sampled finitely, this is implemented by choosing sample values z_n uniformly logarithmically spaced between z_min and z_max. The results presented here employ log(z_min) = −20.5 and log(z_max) = 3.5, the same parameters used in [14].

B. Implementation Details

Noisy images are generated by adding synthetic white Gaussian noise to an original natural image. These are then decomposed with the steerable pyramid transform with a specified number of orientation subbands, K. We handle the image boundary by first extending the image by mirror reflection by 20 pixels in each direction, and then performing all convolutions for the steerable pyramid transform using circular boundary handling. After processing, this boundary segment is removed. Generalized patches are then extracted at each spatial scale. We used SP coefficient patches of 5 × 5 siblings plus a parent coefficient (see Fig. 1), which were found to give the best denoising performance. From these noisy coefficients, both the oriented and nonoriented covariances were calculated as described in Section III. Recall that a two-band SP is used to measure the dominant local orientation that is used for rotating patches that are then used to compute oriented covariances. The denoising calculations are performed on a K-band SP (note, if K ≠ 2 then two SP transforms must be computed). The patches are then denoised at each scale using the MMSE estimator described above. This estimator x̂(y) produces an estimate of the entire generalized patch. One could partition the transform domain into nonoverlapping square patches, denoise them separately, then invert the transform. However, doing this would introduce block boundary artifacts in each subband. An alternative approach, used both here and in [14], is to apply the estimator to overlapping patches and use only the center coefficient of each estimate. In this way each coefficient is estimated using a generalized neighborhood centered on it.

The highpass and lowpass residual scalar subbands are treated differently. As in [14], we have used a modified steerable pyramid transform with the highpass residual band split into oriented subbands. This modified transform gives a denoising gain of about 0.2 dB over using the standard transform with a single (nonoriented) highpass residual. However, even subdivided into orientation subbands, the highpass filters are not steerable or translation invariant. This makes it difficult to obtain oriented covariances by the rotation method described in this paper. Accordingly, the highpass bands are denoised according to a simple GSM model, without using the orientation or orientedness hidden variables.

The lowpass band typically has the highest signal-to-noise ratio. This follows as the power spectrum of natural images typically display a power-law decay with frequency, while the white noise process has constant power at all frequencies [23]. Additionally, for coarser spatial scales there are fewer available signal patches from which to fit the model parameters. At some point, the induced error from incorrectly estimating model parameters may become comparable to the benefit of denoising itself. This suggests that there is an effective limit to the number of spatial scales one can denoise. For this work, the pyramid representation is built to a depth of five spatial scales, with no denoising done for the lowpass band. After the estimation is done for the highpass and bandpass bands, the entire transform is inverted to give the denoised image.

C. Results

We computed simulated denoising results on a collection of five standard 8-bit greyscale test images [14], four of size 512 × 512 pixels and one of size 256 × 256 pixels. In order to quantify the advantages of adaptation to orientation and orientedness, we provide comparisons against the results of denoising with the GSM model of [14]. In order to maintain consistency with the OAGSM/NC results, the GSM results presented here use a 5 × 5 (plus parent) patch, as opposed to the 3 × 3 (plus parent) patch used in [14]. We examined performance for steerable pyramids with both K = 2 and K = 8 oriented bands. We examined performance for five different noise levels. Numerical results, presented as peak signal-to-noise ratio (PSNR), averaged over five realizations of the noise process, are presented in Table I.

TABLE I.

Denoising Performance for Six Images With Four Different Noise Levels. Results are Shown for Algorithms Using Two Oriented SP Bands (top Half) Eight Oriented SP Bands(Bottom Half). Values Given are Averaged Over Five Trials. In Each Cell, Number on Left is for the OAGSM/NC Algorithm, Number on Right is for the GSM Algorithm. All Values Indicate PSNR, Computed as 20 log₁₀(255/σ_e), Where $σ_{e}^{2}$ is the Error Variance

σ (PSNR)	Lena	Barbara	Boats	House	Peppers
10 (28.13)	35.52 / 35.21	34.00 / 33.39	33.48 / 33.34	35.27 / 34.94	33.70 / 33.29
20 (22.11)	32.57/32.15	30.29 / 29.54	30.24 / 30.08	32.29/31.89	30.26 / 29.79
25 (20.17)	31.58/31.16	29.07 / 28.36	29.21 / 29.06	31.28/30.89	29.16 / 28.71
50 (14.15)	28.46 / 28.07	25.29 / 24.93	26.17 / 26.05	28.17 / 27.82	25.76 / 25.46
75 (10.63)	26.67 / 26.37	23.48 / 23.27	24.56 / 24.49	26.34 / 26.02	23.81 / 23.60
10 (28.13)	35.67 / 35.51	34.26 / 33.96	33.56 / 33.49	35.43 / 35.27	33.78 / 33.54
20 (22.11)	32.78 / 32.55	30.68 / 30.27	30.37 / 30.29	32.48 / 32.25	30.39 / 30.06
25 (20.17)	31.81 / 31.58	29.51 / 29.11	29.36 / 29.29	31.50/31.25	29.30 / 28.97
50 (14.15)	28.70 / 28.48	25.92 / 25.67	26.34 / 26.29	28.33 / 28.10	25.95 / 25.71
75 (10.63)	26.90 / 26.72	23.92 / 23.81	24.74 / 24.71	26.52 / 26.29	24.03 / 23.88

Open in a new tab

The table shows a consistent improvement of OAGSM/NC over GSM, typically between 0.1 and 0.6 dB. The OAGSM/NC improvements are largest for images that have significant local orientation content, such as the “Barbara” image (which contains fine oriented texture in many regions), or the “Peppers” image. For images that have significantly more nonoriented texture, such as the Boats image, the improvement is often much smaller. Note also that the OAGSM/NC method offers a more substantial improvement over the GSM method for the 2-band representation.

The OAGSM/NC method also shows substantial visual improvement. Not surprisingly, this is most noticeable along strongly oriented image features. Details of two denoised images, with noise standard deviation σ = 25, are shown in Figs. 5 and 6. For the boat image, one can compare the appearance of the oblique mast. The contours of this object are clearly better preserved for the OAGSM/NC method than for the GSM method. The 2-band GSM denoised image shows clear artifacts due to the horizontal and vertical orientations of the underlying filters. Note that the OAGSM/NC method is able to significantly reduce these by adapting to local orientation, even though the estimation is performed using the same set of filters. The visual differences between the 8-band GSM and the 8-band OAGSM/NC methods are more subtle; however, the OAGSM/NC method suffers from less ringing along isolated edges such as the boat masts. Additionally, the appearance of the oriented texture on the shawl of the barbara image is more clearly preserved with the OAGSM/NC model.

Fig. 5 — 128 × 128 detail from “Barbara” image, for noise with σ = 25. Top : Original, noisy (20.17 dB). Middle : gsm2 (28.36 dB), oagsmnc2 (29.07 dB). Bottom : gsm8 (29.11 dB), oagsmnc8 (29.51 dB).

Fig. 6 — 128 × 128 detail from “Boats” image, for noise with σ = 25. Top : original, noisy (20.17). Middle : gsm2 (29.06 dB), oagsmnc2 (29.21 dB). Bottom : gsm8 (29.29 dB), oagsmnc8 (29.36 dB).

The improvement in denoising performance afforded by the adaptation to orientation, while significant, may seem modest in light of the added complexity and computational cost of the full OAGSM/NC method. One explanation of this observation is that performing the standard GSM algorithm using filters that are highly tuned in orientation implicitly performs a type of adaptation to local orientation. For such a bank of oriented filters, the GSM covariance for a particular orientation subband will be computed by averaging over patches of filter responses from areas with different local orientations. However, the contribution from areas with local orientation significantly different than the orientation of the filters will be attenuated, due to the orientation tuning of the filters. Similarly, during denoising, image content in strongly oriented regions will be represented mostly by a few oriented subbands with similar orientation. The GSM denoising estimator will then mostly use these covariances for the denoising calculations, which provides implicit adaptation to the local signal orientation. The orientation tuning of the Steerable Pyramid filters becomes tighter with increasing number of orientation bands. This line of reasoning helps explain why the performance gains for the OAGSM/NC over the GSM model are more dramatic for the 2-band case than for the 8-band case.

The OAGSM/NC method introduces two hidden variables, allowing adaptation to both orientation and orientedness. In order to determine the importance of the adaptation orientedness, we compared the performance of the OAGSM denoising algorithm without the nonoriented component to the full OAGSM/NC model. These differences are shown in Table II. In the 8-band case, allowing adaptation to orientedness usually helped denoising PSNR by a modest amount (.02–0.1 dB). In general, allowing adaptation to orientedness helps more for images with significant nonoriented content.

TABLE II.

PSNR Performance Penalty Incurred by Removing Adaptation to Orientedness, for 2 SP Bands (top) and for 8 SP Bands (Bottom)

σ (PSNR)	Lena	Barbara	Boats	House	Peppers
25	0.085	0.019	0.085	0.065	0.009
50	0.109	−0.044	0.145	0.097	0.003
25	0.017	0.019	0.071	−0.005	−0.028
50	0.066	0.008	0.135	0.045	−0.025

Open in a new tab

The adaptation to orientedness is mediated by the δ binary hidden variable, whose prior distribution is determined for each subband by β, the probability of each patch arising from the oriented component. Values of the β parameter, estimated at noise level σ = 25 and averaged over all subbands, are given in Table III. These estimated β values reflect how well the image data are described by the oriented versus the nonoriented model components, and thus intuitively provides a measure the amount of oriented content in the image. For the test images shown, average β values were highest for the barbara image, which has significant oriented content such as the striped patterns on the shawl, and lowest for the boats image which contains significant nonoriented texture content. Additionally, the improvement in denoising performance from moving to the GSM to the OAGSM/NC model is strongly correlated with the average β. Simple linear regression of this performance difference in dB versus average β for the five test images used, with σ = 25 and two SP bands, gives a regression line with slope 1.24(r² = 0.70).

TABLE III.

Estimated Values of (β) Averaged Over all ten Oriented Subbands, for 2-Band Pyramid With Noise Level σ =25. Standard Deviations Given in Parenthesis

Lena	Barbara	Boats	House	Peppers
0.703 (0.19)	0.839 (0.07)	0.501 (0.18)	0.645 (0.23)	0.556 (0.24)

Open in a new tab

The OAGSM/NC denoising algorithm is significantly more computationally intensive than the GSM model. These costs can be divided into estimating model parameters and for performing the MMSE estimation calculations. Estimating the oriented covariance matrices by patch rotation is quite intensive. For the 2-band (8-band) algorithm we find the majority of the total computation time divided as follows : 40% (50%) for computing the oriented covariances, 25% (22%) for running the EM algorithm, and 28% (25%) for performing the MMSE estimation, with remaining time used for computing the forward and inverse SP transform and other overhead.

The MMSE estimator is computed as a numerical integral over the hidden variables z, θ and δ. Integrating over z alone gives most of the computational cost for the GSM estimate. Integrating again over θ and δ would seem to imply this cost would increase by a factor equal to the product of the number of sample points for θ and δ. However, when δ = 0 the signal covariances do not depend on θ, so integration over θ is unnecessary. This implies that the cost of computing the MMSE estimate for the OAGSM/NC is N_θ +1 times the cost of the original GSM model, where N_θ is the number of sample values taken for the θ hidden variable. On a 2.8 GHz AMD Opteron processor, using a nonoptimized Matlab implementation, denoising a 512 × 512 pixel image takes approximately 10 minutes using the 2-band OAGSM/NC algorithm, and 60 minutes using the 8-band OAGSM/NC algorithm. Denoising a 256 × 256 image took one quarter of these times.

D. Comparisons to Other Methods

We have compared the numerical results of our algorithm to four recent state-of-the-art methods, representing a variety of approaches to the image denoising problem. Amongst these include an algorithm based on a global statistical model, one based on sparse approximation methods, and two that are related to nonlocal averaging.

The Field of Experts model developed by Roth and Black [27] consists of a global Markov Random Field model, where the local clique potentials are given by products of simple “expert” distributions. These expert distributions are constructed as univariate functions of the response of a local patch to a particular filter. The structure of the model allows for each of these expert-generating filters to be trained from example data, yielding a flexible framework for learning image prior models. Denoising is then done with MAP estimation, by gradient ascent on the posterior probability density.

The KSVD method of Elad et al. [28] relies on the idea that natural images can be represented sparsely with an appropriate overcomplete set, or dictionary, of basis waveforms. Denoising is performed by using orthogonal matching pursuit to select an approximation of the noisy signal using a small number of dictionary elements. Intuitively, if the desired signal can be expressed sparsely, while the noise cannot, then the resulting approximation will preferentially recover the true signal. The KSVD method gives a consistent framework for both learning an appropriate dictionary for image patches, and integrating the resulting local patch approximations into a global denoising method.

The work of Kervrann and Boulanger [29] constructs a denoising estimate where each pixel is computed as a weighted average over neighboring locations, where the weights are computed using a similarity measure of the surrounding patches. The success of this method relies on the property that similar, repeated patterns occur in areas nearby the current location to be denoised. By averaging over such similar patches, noise is reduced while desired signal is retained. This paper also details a method for adaptively computing the size of the region over which neighboring patches are considered is at each location, allowing greater exploitation of repeated patterns where appropriate.

The collaborative filtering approach of Dabov et al. [30] also exploits this property, by explicitly grouping similar patches into 3-D “blocks” that are then jointly processed by wavelet shrinkage. Elements of the 3-D wavelet basis that correlate highly with features common across patches in the block will be preserved by the shrinkage operation, providing a powerful way of implicitly identifying and preserving structure across the patches. As the patches forming each such block may come from disparate spatial locations, this method can be understood as performing a type of nonlocal averaging. This method performs extremely well, giving the best denoising results currently known to the authors.

The differences in output PSNR between our OAGSM/NC algorithm and these four other denoising methods are shown graphically in Fig. 7. For the comparison with the KSVD method of Elad et al. [28], we computed PSNR values using their publicly available code and exactly the same pseudo-random noise process as used for the OAGSM/NC simulations. PSNR values for the other results were taken directly from the corresponding publications. We see that OAGSM/NC consistently outperforms all of the methods except for that of Dabov et al. [30].

The collaborative filtering approach of Dabov works by exploiting self similarity among image patches in different regions. While this approach yields very good denoising performance, the image structures that are restored through denoising are never explicitly described, but rather emerge implicitly through the grouping of similar patches. In contrast, the OAGSM/NC model attempts to much more explicitly describe the local image features. Rather than allowing descriptions of different image structures to emerge by partitioning into collections of similar patches, the patch rotation method for fitting the oriented covariances can be viewed as extracting an explicit description of a single oriented structure from patches at different orientations. As the OAGSM/NC is an explicit probabilistic model for image structure, its successful application for denoising helps provide insight into nature of the underlying image signal.

V. Conclusions and Future Work

We have introduced a novel OAGSM/NC image model that explicitly adapts to local image orientation. This model describes local patches of wavelet coefficients using a mixture of Gaussians, where the covariance matrices of the components are parametrized by hidden variables controlling the signal amplitude, orientation and orientedness. The model may be viewed as an extension of the Gaussian Scale Mixture model, which has a similar structure but with adaptation only to the local signal amplitude. We have developed methods for fitting the parameters of the OAGSM/NC model. A Bayes Least Squares optimal denoising estimator has been developed using the OAGSM/NC model which shows noticeable improvement in both PSNR and visual quality compared to the GSM model, and compares favorably to several recent state-of-the-art de-noising methods.

We believe that a number of other aspects of model implementation could also be improved. One shortcoming of the current method of computing the oriented and nonoriented signal covariance matrices does not make any explicit effort to separate the contributions from the oriented and nonoriented processes. As the forward model describes each patch as a sample of either one process or the other, ideally the oriented covariances should be computed using only samples from the oriented process, and vice-versa. One possible way to account for this would be to estimate the signal covariances along with inside of the EM procedure. Such a procedure is straightforward for a Gaussian mixture model, in which case the covariance matrices are given at each step as a weighted mixture of outer products of data points [26]. However, for the GSM and OAGSM densities direct application of EM to fit the covariances is more difficult, as the objective function then contains logarithms of sums where single covariance matrices appear multiple times with different scalar factors. For these densities, it is not the case that the maximum can be simply computed as a weighted average of outer products of the data points. It may nonetheless be possible to perform the M step for the full OAGSM/NC model by numerical gradient ascent.

As a preliminary step towards addressing this shortcoming, we have examined an ad-hoc scheme for computing the oriented and nonoriented covariances by weighting patches with the weights $t_{k}^{0}$ and $t_{k}^{1}$ from Section III-B. While this is not a true EM scheme for the reasons outlined above, it is intuitively appealing as it serves to softly partition the input patches into the oriented and nonoriented processes before computing the covariances. However, we found that this modification gave only very slight improvement in the denoising results (0.01–0.04 dB), at the expense of significant computational cost. Developing an efficient and principled method for computing the oriented and nonoriented covariances that better separates these two processes is an interesting question for further study.

Several other possible improvements remain. The current patch rotation method forms the oriented covariances using patches of actual image data that are not perfectly oriented. While this method gives good results, it would be interesting to consider more explicitly imposing the constraint on the oriented covariances that would arise from perfectly oriented, i.e., locally 1-D, signal patches. Another model aspect that could lead to improvements is the choice of neighborhood. Currently, co-localized patches from each oriented band are considered independently even though they represent information about the same localized features. Using neighborhoods that extend into adjacent orientation bands could thus improve performance. Another potentially worthwhile extension would be to incorporate additional hidden variables to capture signal characteristics other than energy, orientation and orientedness. For example, it should be possible to incorporate descriptions of local phase and local spatial frequency in an extension of the current OAGSM/NC model.

Finally, the model presented in this paper is a local model for image content, and does not take into account the interactions between patches. There is potentially great room for improvement in forming a consistent global model based on the local OAGSM/NC structure. The local Gaussian variables (u) should be embedded in a Gaussian Markov Random field. This was recently done for the simple GSM local model [31], where it led to substantial improvements in denoising, albeit a significant computational cost. In addition, the hidden energy, orientation and orientedness variables are treated independently in the current model, despite the fact that they exhibit strong dependencies at adjacent locations, scales, and orientations. To account for such interactions, these hidden variables should also be embedded in random fields or trees, as has been done for GSMs [8], [32], [31].

Footnotes

David K. Hammond, photograph and biography not available at the time of publication.

Eero P. Simoncelli, photograph and biography not available at the time of publication.

Contributor Information

David K. Hammond, Email: david.hammond@epfl.ch, Ecole Polytechnique Federale de Lausanne, 1015 Lausanne, Switzerland

Eero P. Simoncelli, Email: eero.simoncelli@nyu.edu, Center for Neural Science, New York University, New York, NY 10003 USA

References

1.Pratt WK. Digital Image Processing. 2. New York: Wiley Interscience; 1991. [Google Scholar]
2.Mallat S. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Pattern Anal Mach Intell. 1989;11(7) [Google Scholar]
3.Antonini M, Gaidon T, Mathieu P, Barlaud M. Wavelets in Image Commmunication. ch 3. New York: Elsevier; 1994. pp. 65–188. [Google Scholar]
4.Simoncelli EP, Adelson EH. Noise removal via bayesian wavelet coring. presented at the 3rd IEEE Int. Conf. Image Processing; 1996. [Google Scholar]
5.LoPresto SM, Ramchandran K, Orchard MT. Wavelet image coding based on a new generalized Gaussian mixture model. presented at the Data Compression Conf; Snowbird, UT. Mar. 1997. [Google Scholar]
6.Simoncelli EP. Statistical models for images: Compression, restoration and synthesis. Proc. 31st Asilomar Conf. Signals, Systems and Computers; Pacific Grove, CA. Nov. 2–5, 1997; pp. 673–678. [Google Scholar]
7.Crouse MS, Nowak RD, Baraniuk RG. Wavelet-based statistical signal processing using hidden markov models. IEEE Trans Signal Process. 1998 Apr;46:886–902. [Google Scholar]
8.Mihcak M, Kozintsev I, Ramchandran K, Moulin P. Low-complexity image denoising based on statistical modeling of wavelet coefficients. IEEE Signal Process Lett. 1999 Jan;6(1):300–303. [Google Scholar]
9.Wainwright MJ, Simoncelli EP. Scale mixtures of Gaussians and the statistics of natural images. presented at the Advances in Neural Information Processing Systems (NIPS); 2000. [Google Scholar]
10.Figueiredo M, Nowak R. Wavelet-based image estimation: An empirical Bayes approach using Jeffrey’s noninformative prior. IEEE Trans Image Process. 2001 Sep;10(9):1322–1331. doi: 10.1109/83.941856. [DOI] [PubMed] [Google Scholar]
11.Hyvärinen A, Hurri J, Väyrynen J. Bubbles: A unifying framework for low-level statistical properties of natural image sequences. J Opt Soc America A. 2003 Jul;20(7) doi: 10.1364/josaa.20.001237. [DOI] [PubMed] [Google Scholar]
12.Karklin Y, Lewicki MS. A hierarchical bayesian model for learning nonlinear statistical regularities in nonstationary natural signals. Neural Comput. 2005 Feb;17(2):397–423. doi: 10.1162/0899766053011474. [DOI] [PubMed] [Google Scholar]
13.Andrews D, Mallows C. Scale mixtures of normal distributions. J Roy Statist Soc B. 1974;36:99–102. [Google Scholar]
14.Portilla J, Strela V, Wainwright MJ, Simoncelli EP. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans Image Process. 2003;12:1338–1351. doi: 10.1109/TIP.2003.818640. [DOI] [PubMed] [Google Scholar]
15.Guerrero-Colón JA, Mancera L, Portilla J. Image restoration using space-variant Gaussian scale mixture in overcomplete pyramids. IEEE Trans Image Process. 2008 Jan;17(1):27–41. doi: 10.1109/tip.2007.911473. [DOI] [PubMed] [Google Scholar]
16.Hammond D, Simoncelli E. Image denoising with an orientation-adaptive Gaussian scale mixture model. presented at the 13th IEEE Int. Conf. Image Processing; 2006. [Google Scholar]
17.Simoncelli EP, Freeman WT, Adelson EH, Heeger DJ. Shiftable multi-scale transforms. IEEE Trans Inf Theory. 1992 Mar;38(2):587–607. special Issue on Wavelets. [Google Scholar]
18.Shapiro JM. Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans Signal Processing. 1993 Dec;41(12):3445–62. [Google Scholar]
19.Xiong Z, Orchard MT, Zhang YQ. A deblocking algorithm for jpeg compressed images using overcomplete wavelet representations. IEEE Trans Circuits Syst Video Technol. 1997;7:433–437. [Google Scholar]
20.Buccigrossi RW, Simoncelli EP. Image compression via joint statistical characterization in the wavelet domain. IEEE Trans Image Process. 1999 Dec;8(12):1688–1701. doi: 10.1109/83.806616. [DOI] [PubMed] [Google Scholar]
21.Romberg JK, Choi H, Baraniuk RG. Bayesian tree-structured image modeling using wavelet-domain hidden markov models. IEEE Trans Image Process. 2001;10:1056–1068. doi: 10.1109/83.931100. [DOI] [PubMed] [Google Scholar]
22.Liu J, Moulin P. Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefficients. IEEE Trans Image Process. 2001 Nov;10(11):1647–1658. doi: 10.1109/83.967393. [DOI] [PubMed] [Google Scholar]
23.Ruderman DL, Bialek W. Statistics of natural images: Scaling in the woods. Phys Rev Lett. 1994;73(6):814–817. doi: 10.1103/PhysRevLett.73.814. [DOI] [PubMed] [Google Scholar]
24.Freeman WT, Adelson EH. The design and use of steerable filters. IEEE Trans Pattern Anal Mach Intell. 1991 Sep;13(9):891–906. [Google Scholar]
25.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Statist Soc B. 1977;39:1–38. [Google Scholar]
26.Bilmes JA. Tech Rep (TR-97-021) Dept. Elect. Eng. Comput. Sci., Univ. California; Berkeley: 1998. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden markov models. [Google Scholar]
27.Roth S, Black M. Fields of experts: A framework for learning image priors. Comput Vis Pattern Recognit. 2005 Jan; [Google Scholar]
28.Elad M, Aharon M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process. 2006 Jan;15(1):3736–3745. doi: 10.1109/tip.2006.881969. [DOI] [PubMed] [Google Scholar]
29.Kervrann C, Boulanger J. Optimal spatial adaptation for patch-based image denoising. IEEE Trans Image Process. 2006 Jan;15(1):2866–2878. doi: 10.1109/tip.2006.877529. [DOI] [PubMed] [Google Scholar]
30.Dabov K, Foi A, Katkovnik V, Egiazarian K. Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Trans Image Process. 2007 Jan;16(1):2080–2095. doi: 10.1109/tip.2007.901238. [DOI] [PubMed] [Google Scholar]
31.Lyu S, Simoncelli EP. Modeling multiscale subbands of photographic images with fields of Gaussian scale mixtures. IEEE Trans Pattern Anal Mach Intell. 2008 doi: 10.1109/TPAMI.2008.107. to be published. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Wainwright MJ, Simoncelli EP, Willsky AS. Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. Appl Comput Harmon Anal. 2001 Jul;11(1):89–123. [Google Scholar]

[R1] 1.Pratt WK. Digital Image Processing. 2. New York: Wiley Interscience; 1991. [Google Scholar]

[R2] 2.Mallat S. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Pattern Anal Mach Intell. 1989;11(7) [Google Scholar]

[R3] 3.Antonini M, Gaidon T, Mathieu P, Barlaud M. Wavelets in Image Commmunication. ch 3. New York: Elsevier; 1994. pp. 65–188. [Google Scholar]

[R4] 4.Simoncelli EP, Adelson EH. Noise removal via bayesian wavelet coring. presented at the 3rd IEEE Int. Conf. Image Processing; 1996. [Google Scholar]

[R5] 5.LoPresto SM, Ramchandran K, Orchard MT. Wavelet image coding based on a new generalized Gaussian mixture model. presented at the Data Compression Conf; Snowbird, UT. Mar. 1997. [Google Scholar]

[R6] 6.Simoncelli EP. Statistical models for images: Compression, restoration and synthesis. Proc. 31st Asilomar Conf. Signals, Systems and Computers; Pacific Grove, CA. Nov. 2–5, 1997; pp. 673–678. [Google Scholar]

[R7] 7.Crouse MS, Nowak RD, Baraniuk RG. Wavelet-based statistical signal processing using hidden markov models. IEEE Trans Signal Process. 1998 Apr;46:886–902. [Google Scholar]

[R8] 8.Mihcak M, Kozintsev I, Ramchandran K, Moulin P. Low-complexity image denoising based on statistical modeling of wavelet coefficients. IEEE Signal Process Lett. 1999 Jan;6(1):300–303. [Google Scholar]

[R9] 9.Wainwright MJ, Simoncelli EP. Scale mixtures of Gaussians and the statistics of natural images. presented at the Advances in Neural Information Processing Systems (NIPS); 2000. [Google Scholar]

[R10] 10.Figueiredo M, Nowak R. Wavelet-based image estimation: An empirical Bayes approach using Jeffrey’s noninformative prior. IEEE Trans Image Process. 2001 Sep;10(9):1322–1331. doi: 10.1109/83.941856. [DOI] [PubMed] [Google Scholar]

[R11] 11.Hyvärinen A, Hurri J, Väyrynen J. Bubbles: A unifying framework for low-level statistical properties of natural image sequences. J Opt Soc America A. 2003 Jul;20(7) doi: 10.1364/josaa.20.001237. [DOI] [PubMed] [Google Scholar]

[R12] 12.Karklin Y, Lewicki MS. A hierarchical bayesian model for learning nonlinear statistical regularities in nonstationary natural signals. Neural Comput. 2005 Feb;17(2):397–423. doi: 10.1162/0899766053011474. [DOI] [PubMed] [Google Scholar]

[R13] 13.Andrews D, Mallows C. Scale mixtures of normal distributions. J Roy Statist Soc B. 1974;36:99–102. [Google Scholar]

[R14] 14.Portilla J, Strela V, Wainwright MJ, Simoncelli EP. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans Image Process. 2003;12:1338–1351. doi: 10.1109/TIP.2003.818640. [DOI] [PubMed] [Google Scholar]

[R15] 15.Guerrero-Colón JA, Mancera L, Portilla J. Image restoration using space-variant Gaussian scale mixture in overcomplete pyramids. IEEE Trans Image Process. 2008 Jan;17(1):27–41. doi: 10.1109/tip.2007.911473. [DOI] [PubMed] [Google Scholar]

[R16] 16.Hammond D, Simoncelli E. Image denoising with an orientation-adaptive Gaussian scale mixture model. presented at the 13th IEEE Int. Conf. Image Processing; 2006. [Google Scholar]

[R17] 17.Simoncelli EP, Freeman WT, Adelson EH, Heeger DJ. Shiftable multi-scale transforms. IEEE Trans Inf Theory. 1992 Mar;38(2):587–607. special Issue on Wavelets. [Google Scholar]

[R18] 18.Shapiro JM. Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans Signal Processing. 1993 Dec;41(12):3445–62. [Google Scholar]

[R19] 19.Xiong Z, Orchard MT, Zhang YQ. A deblocking algorithm for jpeg compressed images using overcomplete wavelet representations. IEEE Trans Circuits Syst Video Technol. 1997;7:433–437. [Google Scholar]

[R20] 20.Buccigrossi RW, Simoncelli EP. Image compression via joint statistical characterization in the wavelet domain. IEEE Trans Image Process. 1999 Dec;8(12):1688–1701. doi: 10.1109/83.806616. [DOI] [PubMed] [Google Scholar]

[R21] 21.Romberg JK, Choi H, Baraniuk RG. Bayesian tree-structured image modeling using wavelet-domain hidden markov models. IEEE Trans Image Process. 2001;10:1056–1068. doi: 10.1109/83.931100. [DOI] [PubMed] [Google Scholar]

[R22] 22.Liu J, Moulin P. Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefficients. IEEE Trans Image Process. 2001 Nov;10(11):1647–1658. doi: 10.1109/83.967393. [DOI] [PubMed] [Google Scholar]

[R23] 23.Ruderman DL, Bialek W. Statistics of natural images: Scaling in the woods. Phys Rev Lett. 1994;73(6):814–817. doi: 10.1103/PhysRevLett.73.814. [DOI] [PubMed] [Google Scholar]

[R24] 24.Freeman WT, Adelson EH. The design and use of steerable filters. IEEE Trans Pattern Anal Mach Intell. 1991 Sep;13(9):891–906. [Google Scholar]

[R25] 25.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Statist Soc B. 1977;39:1–38. [Google Scholar]

[R26] 26.Bilmes JA. Tech Rep (TR-97-021) Dept. Elect. Eng. Comput. Sci., Univ. California; Berkeley: 1998. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden markov models. [Google Scholar]

[R27] 27.Roth S, Black M. Fields of experts: A framework for learning image priors. Comput Vis Pattern Recognit. 2005 Jan; [Google Scholar]

[R28] 28.Elad M, Aharon M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process. 2006 Jan;15(1):3736–3745. doi: 10.1109/tip.2006.881969. [DOI] [PubMed] [Google Scholar]

[R29] 29.Kervrann C, Boulanger J. Optimal spatial adaptation for patch-based image denoising. IEEE Trans Image Process. 2006 Jan;15(1):2866–2878. doi: 10.1109/tip.2006.877529. [DOI] [PubMed] [Google Scholar]

[R30] 30.Dabov K, Foi A, Katkovnik V, Egiazarian K. Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Trans Image Process. 2007 Jan;16(1):2080–2095. doi: 10.1109/tip.2007.901238. [DOI] [PubMed] [Google Scholar]

[R31] 31.Lyu S, Simoncelli EP. Modeling multiscale subbands of photographic images with fields of Gaussian scale mixtures. IEEE Trans Pattern Anal Mach Intell. 2008 doi: 10.1109/TPAMI.2008.107. to be published. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Wainwright MJ, Simoncelli EP, Willsky AS. Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. Appl Comput Harmon Anal. 2001 Jul;11(1):89–123. [Google Scholar]

PERMALINK

Image Modeling and Denoising With Orientation-Adapted Gaussian Scale Mixtures

David K Hammond

Eero P Simoncelli

Abstract

I. Introduction

II. Statistical Models

A. Steerable Pyramid Representation

B. Local Coefficient Patches and the GSM Model

Fig. 1.

Fig. 2.

C. OAGSM Model

Fig. 3.

Fig. 4.

D. OAGSM/NC Model

III. Estimating Model Parameters

A. Oriented Covariances

1) Estimating Neighborhood Orientation

2) Patch Rotation

B. OAGSM/NC Parameters

IV. Denoising Algorithm

A. Bayesian Estimator

B. Implementation Details

C. Results

TABLE I.

Fig. 5.

Fig. 6.

TABLE II.

TABLE III.

D. Comparisons to Other Methods

Fig. 7.

V. Conclusions and Future Work

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Image Modeling and Denoising With Orientation-Adapted Gaussian Scale Mixtures

David K Hammond

Eero P Simoncelli

Abstract

I. Introduction

II. Statistical Models

A. Steerable Pyramid Representation

B. Local Coefficient Patches and the GSM Model

Fig. 1.

Fig. 2.

C. OAGSM Model

Fig. 3.

Fig. 4.

D. OAGSM/NC Model

III. Estimating Model Parameters

A. Oriented Covariances

1) Estimating Neighborhood Orientation

2) Patch Rotation

B. OAGSM/NC Parameters

IV. Denoising Algorithm

A. Bayesian Estimator

B. Implementation Details

C. Results

TABLE I.

Fig. 5.

Fig. 6.

TABLE II.

TABLE III.

D. Comparisons to Other Methods

Fig. 7.

V. Conclusions and Future Work

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases