Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Jan 13.
Published in final edited form as: IEEE Trans Image Process. 2008 Jun;17(6):847–856. doi: 10.1109/TIP.2008.920795

Dynamic Denoising of Tracking Sequences

Oleg Michailovich 1, Allen Tannenbaum 2,3
PMCID: PMC2805914  NIHMSID: NIHMS159643  PMID: 18482881

Abstract

In this paper, we describe an approach to the problem of simultaneously enhancing image sequences and tracking the objects of interest represented by the latter. The enhancement part of the algorithm is based on Bayesian wavelet denoising, which has been chosen due to its exceptional ability to incorporate diverse a priori information into the process of image recovery. In particular, we demonstrate that, in dynamic settings, useful statistical priors can come both from some reasonable assumptions on the properties of the image to be enhanced as well as from the images that have already been observed before the current scene. Using such priors forms the main contribution of the present paper which is the proposal of the dynamic denoising as a tool for simultaneously enhancing and tracking image sequences. Within the proposed framework, the previous observations of a dynamic scene are employed to enhance its present observation. The mechanism that allows the fusion of the information within successive image frames is Bayesian estimation, while transferring the useful information between the images is governed by a Kalman filter that is used for both prediction and estimation of the dynamics of tracked objects. Therefore, in this methodology, the processes of target tracking and image enhancement “collaborate” in an interlacing manner, rather than being applied separately. The dynamic denoising is demonstrated on several examples of SAR imagery. The results demonstrated in this paper indicate a number of advantages of the proposed dynamic denoising over “static” approaches, in which the tracking images are enhanced independently of each other.

Keywords: Bayesian estimation, Kalman filtering, predictive tracking, wavelet denoising

I. Introduction

In this paper, we consider the standard problem in which a sequence of images of a maneuvering target is given with the goal being to estimate the target dynamics as precisely as possible. Such an estimation task typically consists of establishing a functional relationship between the appearance of the target along consecutive image frames. In particular, depending on a specific application at hand, the apparent motion of the target can be perceived through a deformation of its boundary, which can be described by means of either active contours [1] or active polygons [2]. Another possibility for describing the target motion is to represent the latter by apparent displacements of a set of control points pertaining to target's image [3]. The method of optical flows constitutes another practically important solution to the problem of modeling the apparent motion [4].

Given a specific description of the apparent motion of an object of interest (as well as a relation of this motion to the actual 3-D motion of the object in space), the problem of predictive tracking could be readily solved by fitting (in a proper sense) a dynamic model to available observations [5, Ch.2]. Such a direct approach, however, may not be satisfactory in the settings when the levels of measurement noises are particularly high. In such cases, applying an image enhancement procedure should be considered as an important prerequisite which could substantially improve the robustness of tracking [6]. On the other hand, applying a denoising procedure to each image of the sequence independently seems to be a suboptimal strategy, as well, since it discards the information which has been observed up to the current scene.

A more accurate solution to the aforementioned problem seems to be possible using a method that takes advantage of all the useful information about the tracked object which is contained in tracking images up to the present time point. Moreover, such information should be used for both enhancing the current image and updating the dynamic parameters of the target. In such a case, a basic processing step may include enhancing the current image using all the available estimates of the previous images and of their associated dynamic parameters, followed by updating the dynamic parameters pertaining the current image using its enhanced version.

An algorithm that uses the principle described above is proposed in the present paper. The enhancement part of the proposed method is based on wavelet denoising, initially proposed in [7] as a powerful method of recovering nonstationary signals. A particularly useful feature of wavelet denoising consists of its ability to recover spatially nonhomogeneous signals without oversmoothing their details, while barely exceeding the computational demands of ordinary linear filtering. In this paper, the wavelet denoising is performed within the Bayesian framework [8], which allows incorporating any a priori information on the signal to be recovered. In the current implementation, such a priori knowledge comes from the statistical model assumed for the wavelet coefficients of tracking images, as well as from their already estimated predecessors.

The dynamic part of the algorithm is based on modeling the target motion using an affine optical flow, whose time evolution is governed by a Kalman filter [9]. More specifically, a two-step Kalman filter is applied in order to estimate and predict the apparent target motion. It is worthwhile noting that this is the particular structure of the Kalman filtering, which includes prediction as its intrinsic stage, is what allows the wavelet denoising and tracking to be effectively combined together. This construction will be described in much greater details in the sections that follow.

Finally, in the experimental part of this study, the applicability of the proposed dynamic denoising to synthetic aperture radar (SAR) imagery is demonstrated. It is worthwhile noting that the use of SAR imaging for the purpose of tracking is currently considered to be a problem of substantial importance. This is mainly due to the belief that the SAR imagers may come as an attractive alternative to existing optical devices, primarily because of the exceptional ability of the former to be all-weather operational. Unfortunately, target tracking using SAR data still seems to pose a formidable challenge to both system and image processing engineers, mainly because of the relatively high levels of noises contaminating this type of imagery data. This is why the SAR imaging has been chosen as an example of image modality that, we believe, could benefit from the methodology proposed in this paper.

The present paper is organized as follows. Section II introduces the Bayesian framework for wavelet denoising. The way through which the information on previously enhanced images is “injected” into the current image estimate is detailed in Section III. Section IV introduces the dynamical model for affine optical flows. The overall algorithm structure is defined in Section V. Finally, some relevant reconstruction examples are demonstrated in Section VI of the paper, while Section VII finalizes the paper with a discussion and conclusions.

II. Bayesian Wavelet Denoising

A. Diagonal Estimation and Wavelet Shrinkage

In this section, the problem of enhancement of a single image g(x) (with xΩR2) is considered first. To this end, g(x) is assumed to be an observation of the original scene f(x) contaminated by additive white Gaussian noise. Consequently, f(x) can be estimated via applying to g(x) a diagonal operator of the following form [10, Ch.X]

f(x)kZakg,φkφk(x) (1)

with {φk(x)}kZ being an orthonormal basis in L2(Ω), ⟨·,·⟩ denoting the standard inner product, and {ak}kZ being a sequence of scalars satisfying |ak| ≤ 1 for all k. It is of importance to note that, if the above basis provides a sparse representation of the image of interest, then the diagonal estimator (1) can be shown to be nearly optimal (i.e., resulting in the smallest variance) among all nonlinear estimators [10, Ch.X]. As a result, wavelet orthonormal bases have become here the most preferable choice, as providing sparse representations for bounded-variation signals—the functional class, to which most of real-life images are generally assumed to belong. It should also be noted that there exits a number of Bayesian approaches to the problem of image denoising which eventually lead to the estimation (1). By virtue of the property of sequence {ak}kZ to be positive and bounded by unity, such methods are commonly referred to as shrinkage estimators [11].

Most of the Bayesian wavelet shrinkage methods proposed so far can be divided into two main groups based on the way they treat dependencies between the wavelet coefficients of f(x). The first group encompasses the methods that assume the coefficients to be nearly independent due to the decorrelation property of wavelet transforms [12]–[17]. It is worthwhile noting that, in this case, the assumption of independency allows one to estimate each wavelet coefficient separately of the others. On the other hand, the methods of the second group attempt to employ the information contained in the joint behavior of wavelet coefficients. For instance, [18]–[20] perform the shrinkage adaptively to each coefficient using the concept of an activity function of a wavelet coefficient defined over a local neighborhood of the latter. The inter-scale dependencies between wavelet coefficients can be taken into account using either the fusion procedure of [21] or the approaches of [22], [23] which are based on the theory of Markov random fields.

Although the methods of the second group are more general and, hence, can provide more accurate denoising results as compared with the methods of the first group, the latter may still be preferable in the situations when computational complexity is of utmost importance. Since this is almost always the case in tracking applications, we assume the wavelet coefficients of f(x) to be mutually independent—an assumption whose modification, removal, or mitigation can be derived in subsequent extensions of the method proposed below. Moreover, as a basis for our developments, we will use the wavelet shrinkage method exploited in [15] and [16].

B. MMSE Solution for Wavelet Shrinkage

Let us consider the 1-D case first. A 1-D wavelet transform provides a representation of the signal of interest in terms of the basis functions ψj,k(x)=def2j2ψ(2jxk), which are dilations at scale j and translations by 2jk of the mother wavelet function ψ(x). The discrete wavelet transform (DWT) of a 1-D signal is implemented by two-channel subband filtering followed by downsampling by a factor of 2 [10, Ch.VII]. The transformation results in a sequence of wavelet coefficients (doubly indexed by the pair (k,j)) which represent the energetic contents of the signal in certain locations and resolutions.

Let Gj,k, Fj,k, and Uj,k denote the wavelet coefficients of the observed signal g(x), the original signal f(x), and the noise process u(x), respectively. Then, due to the linearity of the DWT, we have

Gj,k=Fj,k+Uj,k. (2)

In order to construct an estimator for Fj,k using the Bayesian framework, the probabilistic models for the signals in (2) should be specified first. To this end, the noise samples are assumed to be normal with mean 0 and variance σ2. Namely, the conditional probability density function (pdf) of Uj,k given Fj,k and σ2 is assumed to obey

p(Uj,kFj,k,σ2)=N(Fj,k,σ2),k,j. (3)

At the same time, the signal coefficients Fj,k are assumed to be independently distributed according to

p(Fj,kγj,k,σ2)=N(0,γj,kcjσ2),k,j (4)

where cj > 0 is a scale dependent constant, and γj,k are independent random variables that obey

p(γj,k)=Bernoulli(πj),k,j. (5)

We note that the model above implies that the signal coefficients are normally distributed with variance cjσ2 when γj,k = 1 and degenerate to zero, when γj,k = 0. The practical meaning of γj,k is quite intuitive: the set of indices where it is equal to 1 designates significant coefficients of the useful signal f(x), whereas the complementary set is associated with the noise alone. Note that such a model well agrees with the generally parsimonious representation of signals in the wavelet domain, where only a relatively small number of signal coefficients bear the most significant portion of its energy.

Let p(Gj,k | γj,k = 1) and p(Gj,k | γj,k = 0) denote the pdf of Gj,k conditional on the presence and absence of the useful signal in the coefficient indexed by (j,k), respectively. The posterior probability π(γj,k = 1 | Gj,k) that the coefficient contains the useful signal can be expressed as a function of the posterior odds Oj,k given by

π(γj,k=1Gj,k)=Oj,k1+Oj,k (6)

where

Oj,k=p(γj,k=1)p(Gj,kγj,k=1)p(γj,k=0)p(Gj,kγj,k=0)=πj1πjcj1+cjexp{12σ2cj1+cjGj,k2}. (7)

Consequently, the posterior mean of Fj,k, which is nothing else but its minimum mean square estimate (MMSE) F̂j,k, can be defined as

F^j,k=defε{Fj,kGj,k}=[cj1+cjπ(γj,k=1Gj,k)]Gj,k=[cj1+cjOj,k1+Oj,k]Gj,k. (8)

Denoting by aj,k = cj(1 + cj)−1Oj,k(1 + Oj,k)−1 (so that F̂j,k = aj,kGj,k), one can see that the MMS estimate (8) is indeed of the diagonal form (1).

Finally, we note that, whenever the noise standard deviation σ is unknown, it can be estimated from the available data as σ ≈ mediank|G1,k|/0.6745 [7].

C. Inclusion of Spatial Priors

It is interesting to note that F̂j,k in (8) can be alternatively derived as a solution to the problem of simultaneously detecting and estimating signals in a noisy environment as it was originally addressed by Middleton and Esposito in [24]. This perspective provides a very useful insight into the structure of F̂j,k in (8), which appears as a product of the MMS estimate of F̂j,k conditional on γj,k = 1 (which is the “estimation part” of the combined estimate) and the Bayesian detection term Oj,k/(1 + Oj,k), with Oj,k being the familiar generalized likelihood ratio.

From the discussion above, it follows that if one was given additional information as to the presence of the useful signal Fj,k in Gj,k then this information could be incorporated in the “detection part” of (8). This can be done by replacing the unconditional prior probabilities pj,k = 1) and pj,k = 0) in (7) with the priors pj,k = 1 | Ij,k), and pj,k = 0 | Ij,k) respectively, which are conditional on this additional information Ij,k. In this case, according to the Bayes rule, the prior ratio (to be substituted in (7) in place of πj/(1 − πj)) is given by

rj,k=p(γj,k=1)p(Ij,kγj,k=1)p(γj,k=0)p(Ij,kγj,k=0)=πj,k1πj,kp(Ij,kγj,k=1)p(Ij,kγj,k=0). (9)

In the present work, Ij,k is defined to represent the information about the positions of the signal coefficients Fj,k, so that the conditional densities p(Ij,k | γj,k = 1) and p(Ij,k | γj,k = 0) (which are defined over the index set (j,k)) quantify the probability of either a signal or noise wavelet coefficient to be present at a specific scale/location indexed by j and k, respectively. In this case, to proceed quantitatively, the question to be addressed next is: Where are the above “position probabilities” coming from? The noise part is the easiest. Since the noise is assumed to be white and Gaussian, its presence at any data coefficient indexed by (j,k) is equiprobable, implying that p(Ij,k | γj,k = 0) = ∥(j,k)∥−1, where ∥(j,k)∥ is the cardinality of the set (j,k). The situation with p(Ij,k | γj,k = 1), however, is not that simple, and this is where the dynamics comes into play. The Section III-A provides a simple recipe for constructing this probability for an image, based on its time-delayed estimate.

III. Dynamic Denoising

A. Two-Dimensional Separable DWT

A 2-D separable DWT decomposes an image into a hierarchy of four subbands. At each scale j (also referred to as a decomposition level), the subbands consist of an approximation subband LLj and three detail subbands LHj, HLj, and HHj. While the approximation subband LLj contains the low-frequency portion of the original image, the detail subbands LHj, HLj, and HHj capture the image details extending in the horizontal, vertical, and diagonal directions, respectively. In the course of the decomposition, subband LLj is used as an input for the decomposition level j + 1, with LL0 representing the original image.

In the case of a separable DWT, there is a “canonical” way to organize the wavelet coefficients as it is shown in Fig. 1(A) for the case of three decomposition levels [Fig. 1(B) exemplifies this structure by showing the DWT coefficients of the standard “Lena” image]. In the course of wavelet-based image enhancement, the approximation coefficients are usually not processed. On the other hand, the detail coefficients of the image can be processed using the wavelet shrinkage procedure (8) under replacing the scale index j by a double index (j,m), with m accounting for a specific orientation.

Fig. 1.

Fig. 1

(A) “Canonical” arrangement of (orthogonal) wavelet coefficients of an image. (B) DWT decomposition of the Lena image.

B. Estimation of “Location” Probabilities

In what follows, the images under consideration are assumed to be square N × N images with N defined to be an integral power of 2. Consequently, according to the “canonical” structure of Fig. 1(A), the wavelet coefficients of image f at scale j and orientation m can be viewed as a 2j N × 2j N matrix. Let this matrix be denoted by Fj,m, with its (k1,k2)th element (with k1,k2 = 0, 1, …, 2jN) being equal to the wavelet coefficient of f corresponding to scale j, orientation m, and position (k1,k2).

Additionally, let Pj,m1 and Pj,m0 be two square matrices of the same size as Fj,m. These matrices will contain the “location probabilities” p(Ij,k | γj,k = 1) and p(Ij,k | γj,k = 0), respectively, as discussed in Section II-C. In order to specify the above matrices, let us first define an 2j N × 2j Nindicator matrix Ij,m as

Ij,m(k1,k2)={1,ifFj,m(k1,k2)τ0,ifFj,m(k1,k2)<τ} (10)

whose nonzero entries indicate the positions of the significant wavelet coefficient, the absolute values of which exceed a predefined threshold τ > 0. Subsequently, given an 2j N × 2j N matrix Kj of discrete values of the isotropic Gaussian density function

Kj(x,y)=12πhjexp{12(x2+y2)hj2} (11)

the matrix Pj,m1 can be defined as

Pj,m1=(#Ij,m)1(Ij,mKj) (12)

where #Ij,m = Σk1,k2 Ij,m (k1, k2), and * stands for the convolution operator. It deserves noting that the above construction is, in fact, a kernel-based estimation of the probability of location of the significant wavelet coefficients within the chosen subband [25]. We also note that the convolution in (12) may be cyclic if the periodized, interval-adapted DWT of [26] is used to compute the wavelet coefficients (as it is implemented by the WaveLab® package of Donoho). In this case, the normalization guarantees that k1,k2Pj,m1(k1,k2)=1. Moreover, for the smoothing effect imposed by the convolution with Kj to be similar for different DWT levels, the band-width parameter hj of the Gaussian function should be defined as

hj=2jh0 (13)

where the bandwidth h0 > 0 can be precomputed using any of the standard methods described, e.g., in [25].

It goes without saying that the local dependencies between adjacent wavelet coefficients could be better accounted for, if the kernel function in (11) was chosen to be anisotropic [27]. In this paper, however, this possibility has not been explored for its being more computationally involved as compared to the simple choice in (11), which seems to be preferable for real-time tracking applications.

The “location” distribution matrix Pj,m0 pertaining to the noise process should be defined next. Since the noise is assumed to be Gaussian and white, the uniform distribution is the only reasonable choice here. Formally

Pj,m0(k1,k2)=4jN2,k1,k2 (14)

so that k1,k2Pj,m0(k1,k2)=1.

Finally, we note that the matrices Pj,m1 and Pj,m0 defined above play the role of the conditional probabilities p(Ij,k | γj,k = 1) and p(Ij,k | γj,k = 0) in (9), respectively. Consequently, once these matrices have been computed, the corresponding prior ratio [as defined by (9)] can be substituted in (7) instead of πj/(1 − πj) to compute the MMSE (8). Fig. 2 shows an interim example that compares the performance of the Bayesian shrinkage without and with the “location” priors. For the reason which will become clear shortly, we refer to the above estimates as “static” and “dynamic” MMSE, respectively. These estimates are shown in Fig. 2(C) and (D), whereas Fig. 2(A) and (B) shows the original test image and its noise-contaminated version (SNR = 11.8 dB), respectively. Note that, in this example, the indicator functions (10) corresponded to about 20% of the largest wavelet coefficients of “Lena” for each j and m. One can see that using the additional information allows considerably improving the SNR from 17.6 to 20.1 dB.

Fig. 2.

Fig. 2

(A) Original image of “Lena”; (B) noisy image of “Lena” (SNR = 11.8 dB); (C) image estimation using the Bayesian shrinkage without the “location” priors (SNR = 17.6 dB); (D) image estimation using the Bayesian shrinkage with the “location” priors (SNR = 20.1 dB).

C. Dynamically Learned Priors

The Bayesian estimation described in the preceding section would be only possible if we knew the locations of (significant) wavelet coefficients of the noise-free image f, which, of course, makes this approach rather impractical. However, in tracking applications, a noisy measurement of f is normally available along with its time-delayed version. Suppose now that the latter has already been enhanced. Then, using predicted dynamic parameters of the target motion, this time-delayed estimate of f could be warped-forward to obtain a prediction of the true image f at the present time. Needless to say, this prediction can be used for neither averaging it with the present observation of f nor for any other kind of “compounding.” Because of the error in estimating the target dynamics, the features (e.g., edges) in the predicted image can never be expected to be perfectly aligned with those in f. Consequently, the averaging would have unavoidably smeared the resulting estimates. However, the above prediction of f seems to be a reasonably good candidate that one can use for computing the indicator functions (10) and the related “location” probability matrices Pj,m1 and Pj,m0, and this is how these matrices are computed in this study.

It should be noted that precaution should be taken in regard to using the above estimation approach. As it was noted before, because of the errors in estimating the target motion, the features of the current image f and its prediction are never perfectly aligned. Moreover, the prediction is prone to errors, since it is derived from a time-delayed, enhanced version of f, which is, by itself, an estimate. However, all these potential sources of inaccuracies have already been implicitly taken into account by the algorithm. First, the “location” probabilities are computed based only on the significant wavelet coefficients of the prediction (i.e., those exceeding a predefined threshold τ), which can be reasonably assumed to contain the useful signal. This makes the algorithm resistant to the errors in estimating the time-delayed version of f, based on which the prediction is computed. Second, to compute Pj,m1 and Pj,m0, the indicator functions are smoothed by the kernel functions Kj, thereby accounting for the uncertainty as to the location of the wavelet coefficients of f—the uncertainty that has to be expected because of the inaccuracies in recovering the associated dynamics. Moreover, to achieve a more stable convergence of the algorithm, we replace the time-invariant bandwidth h0 in (13) by a time-dependent bandwidth h(t) defined as

h(t)=h0((A1)αt+1) (15)

where A ≥ 1, 0 < α < 1, and t stands for the time (or, equivalently, time-index) of a given image. In this case, when t = 0, the bandwidth of Kj, is equal to A2jh0, and, as a result, if A is much greater than 1, the resulting probabilities Pj,m1 will be substantially oversmoothed. Consequently, the ratios Pj,m1Pj,m0 will change relatively slowly as functions of (k1,k2), implying that the algorithm “ignores” the locational priors, which may be rather unreliable in the beginning of convergence (it should be noted that, at this stage, the algorithm enhances the tacking images almost independently of each other). As time goes on, the estimates of the target dynamics converge to their optimal values, implying that the prediction by “warping forward” is now performed with a relatively high accuracy. This fact is reflected in the behavior of h(t) that approaches h0 as t → ∞.

Apparently, the optimal values of A and α in (15) should depend on the posterior error covariance of the dynamic parameters as computed by, e.g., a Kalman filter (see the discussion below). Finding such an analytical relationship represents an interesting direction of our future research. In the present paper, however, the parameters A and α are left to be user-defined.

IV. Motion Estimation by Kalman Filtering

From the above discussion, it follows that estimating the target motion constitutes an integral part of the image enhancement process. It is true as well that the enhancement allows one to estimate the dynamics with a greater precision. Thus, in this section, we briefly describe the method used for the motion prediction and estimation.

In the current paper, the estimation of (apparent) target motion is based on the assumption that any subsequent image of a tracking sequence can be approximated as a locally translated version of its “predecessor,” where the translation is governed by a displacement field, also known as an optical flow [4]. Moreover, whenever the tracking of rigid targets (e.g., tanks, armored troop-carriers) is of concern, assuming affine displacement fields may be quite reasonable. In this case, denoting by ctR6 the column vector of affine parameters corresponding to the image gt observed at time t, and by Δgtt and ∂tgt the gradient and the time derivative of the latter, respectively, it can be shown that, on the assumption of small displacements, it holds that [28]

AtT(x,y)cttgt(x,y),(x,y)Ω (16)

where

AtT(x,y)=[1xy0000001xy]TΔgt(x,y) (17)

with the gradient Δgt(x,y) being viewed as a column vector in R2 at every (x,y) ∈ Ω. Note that (16) holds at every point of the domain of image definitiion Ω, while the vector ct is position independent. Therefore, finding the optimal solution ct for ct amounts to solving an overcompleted system of equation, which can be done using the standard Moore–Penrose pseudoinverse.1 Moreover, for the case when the model errors in (16) can be assumed to be Gaussian and white, this solution can be rigorously proven to be the MMS estimate of ct [28].

The optimal solution ct derived above provides a momentous estimate of the target dynamics, which does not take into account the time-coherence of target motion. To account for the time evolution of the affine parameters, the true vector ct can be assumed to be a noisy version of ct, viz.

ct=ct+ut (18)

where ut is a noise process. Moreover, using the first-order Markov assumption, ct itself can be modeled as a noisy version of its “predecessor” ct−1, namely

ct=ct1+vt (19)

where vt is an another noise process, different from ut.

Equations (18) and (19) form the system of equations, which can be efficiently solved by means of Kalman filtering [9]. It is important to emphasize that the Kalman filter always estimates the state ct in two steps. First, a prediction of the state at time t is computed based on its estimate at time t − 1 and (19). Second, this prediction is updated using the information brought in with a new observation ct as described by (18). Thus, prediction of the affine parameters is inherently integrated in the filter structure, appearing explicitly as one of its stages.

V. Overall Diagram of DDN

In this section, some necessary details on the overall organization of the proposed dynamic denoising (DDN) are provided. As before, it is assumed that any image gt of the tracking sequence {gt}t≥0 represents a noise-free image ft contaminated by white Gaussian noise. The algorithm is applied recursively, and its input at time t is formed by: 1) the noisy data image gt; 2) an estimate f̃t−1 of ft−1; and 3) an estimate of the optical flow parameters t−1 pertaining to time t − 1. At the output, the algorithm returns: 1) an estimate f̃t of ft and 2) an estimate t of the optical flow parameters at time t.

One recursion of the algorithm is carried out according to the steps shown in Fig. 3. First, the estimated affine parameters t−1 at time t − 1 are used to compute a prediction ĉt of the parameters for time t according to the first stage of Kalman filtering, followed by using this prediction to obtain a prediction f̂t of ft via warping forward the previously estimated image f̃t−1. Subsequently, the wavelet coefficients of the prediction f̂t are used for computing the “location” probabilities Pj,m1 and Pj,m0 as detailed in Section III. These probabilities along with the data image gt are then passed on to the Bayesian shrinkage procedure of Sections II, resulting in the estimate f̃t of ft, which is the first output of the algorithm. Finally, the estimates f̃t−1 and f̃t are used to compute the MMS estimate of the affine parameters (see Section IV), which is used by the Kalman filter to update the prediction ĉt resulting in a new estimate of the optical flow parameters t. The latter forms the second output of the algorithm.

Fig. 3.

Fig. 3

Block-diagram of the DDN algorithm.

At the beginning of the recursion, the “location” probabilities can be defined to be uniform, which can be viewed as using noninformative priors as to the position of signal's coefficients. The parameters cj and πj of the probability densities in (4) and (5) are supposed to be either learned through training or estimated directly from data by means of the maximum-likelihood algorithm as detailed, e.g., in [13].

Finally, we note that once the parameters cj and πj (or estimates thereof) are available, each recursion of the DDN algorithm can be performed with logarithmic complexity in the number of image samples. Indeed, the most computationally expensive procedures which are involved in the estimation according to the block-diagram of Fig. 3 are warping forward, computing the “location” priors according to (12), and wavelet transformation. While the last two of the aforementioned procedures require only a few convolution operations, the complexity of warping depends on the type of image interpolation used. Fortunately, there exist computationally efficient methods which allow one to perform this procedure in logarithmic time, as well [29]. Consequently, the overall complexity of DDN remains logarithmic, which implies the possibility of its real-time implementation.

VI. Experimental Results

A. Data Preprocessing

In the experimental part of this study, the performance of the proposed DDN algorithm is tested using a set of real-world synthetic aperture radar (SAR) images of military targets. Before demonstrating some specific reconstruction examples, it should be noted that, due to the bandlimitedness of radars' transfer functions as well as because of the coherent nature of SAR image formation, such images are normally contaminated by speckle noise [30]. The statistical properties of this noise has been addressed in numerous studies, with the Rayleigh, K-, generalized gamma, and Nakagami distributions being among the noise models proposed hitherto.

Unfortunately, none of the existing statistical models for speckle noise suggests that the latter can be approximated by white Gaussian noise, and, hence, the results of the preceding sections do not seem to be applicable to the SAR data. To overcome this difficulty, it is common to use the property of the speckle noise of being a multiplicative noise, which allows one to convert the multiplicative noise into additive by means of the logarithmic transformation. Subsequently, the log-transformed speckle noise can be rejected via a filtering procedure, followed by the exponential transformation that takes the result thus obtained to the original domain.

The procedure explained above is known as homomorphic despeckling [31], and it has long been used to improve the quality of coherent-type imagers, including SAR units [32] and ultrasound scanners [33]. The main deficiency of the algorithms based on this technique, however, results from their assuming the log-transformed speckle noise to be Gaussian and white, while, in practice, this assumption seems to be rarely valid. In particular, the log-transformed speckle noise can be shown to be correlative and obeying a bi-exponential, Fisher–Tippett type distribution [34]. Since the pdf of such a distribution is heavy tailed, the corresponding noise is necessarily of a spiky type. Needless to say, that such a noise may not be effectively rejected by an algorithm that has been designed under the assumption of noise's “whiteness” and “Gaussianity.”

Fortunately, the log-transformed speckle noise can be “gaussianized” by means of the two step procedure proposed in [33]. At the first step of this procedure, the (complex-valued) data images are subjected to blind equalization that makes the image samples nearly uncorrelated. At the second step, the log-transformed magnitudes of the resulting images are subjected outlier-shrinkage that suppresses the spiky component of the noise. It can be shown that this preprocessing is capable of transforming the log-transformed speckle noise into nearly white Gaussian noise, while retaining the underlying image virtually intact. Moreover, although in [33] the above method was demonstrated for ultrasound images, it is also applicable to SAR imaging due to the similarity of the image formation mechanisms exploited by these imaging modalities. This fact is further supported by the example shown in Fig. 4(A1) and (B1) of which demonstrate a section of the log-domain images of a military target (a T-72 tank) before and after applying the above preprocessing, respectively.

Fig. 4.

Fig. 4

(A1)–(A3) Original image of a T-72 target in the log-domain, its auto-correlation and histogram, respectively. (B1)–(B3) Corresponding preprocessed image, its auto-correlation and histogram, respectively.

The effect of suppressing the “spiky” component of the log-transformed speckle noise can be appreciated through comparing the corresponding histograms of the images2 which are shown in Fig. 4(A2) and (B2). One can see that the histogram of the preprocessed image has the shape of a Gaussian probability density as opposed to the “heavy-taled” behavior of the histogram of the original image. Additionally, the auto-correlation function of the preprocessed image [Fig. 4(B3)] converges to zero considerably fasted as compared to the auto-correlation function of the original image [Fig. 4(A3)]. This implies that the samples of the preprocessed image are much less correlated than in the case of the original image.

In Section VI-B, all the images are enhanced in the log-transform domain after their are preprocessed by the procedure explained above. After the enhancement is completed, the resulting images may be back-transfromed to the original (i.e., linear) domain, if it is required by a specific application at hand.

B. Tracking Military Targets

The upper row of Fig. 5 shows a subset of tracking images of a military target, viz. a T-72 tank. The images are shown in the log-transfrom domain after they have been subjected to the “gaussianization” procedure discussed in the previous subsection. The estimated SNR of these data images was found to be about 5 dB.

Fig. 5.

Fig. 5

(First row) Original images of the T-72 target; (second row) reconstruction by the uniform soft-thresholding; (third row) reconstruction by the “static” Bayesian shrinkage; (fourth row) reconstruction by the DDN method.

As the first step, the data images were enhanced independently of each other by the classical soft-thresholding in the wavelet domain as initially proposed in [7]. The uniform threshold was defined to be equal to 2logNσ~, with N being the number of image samples and σ~ being the standard deviation of the noise estimated as mentioned at the end of Section II-B. The results of this wavelet denoising are shown in the second row of Fig. 5. One can see that this method is capable of effectively suppressing the noise. Unfortunately, it seems to happen on the account of substantially oversmoothing the target details (e.g., edges). On the other hand, the Bayesian shrinkage of Section II-B is capable of considerably better preserving the target's details as shown by the third row of Fig. 5. However, in this case, the preservation of the target's details comes on the account of letting a significant portion of the noise “survive” the shrinkage. Moreover, since this type of Bayesian shrinkage is applied to each of the tracking images independently, all the estimates appear to be equally noisy.

It should be noted that both above-mentioned results were obtained using the same type of orthogonal wavelet analysis based on the nearly symmetric wavelets of I. Daubechies with four vanishing moments [35, Ch.6]. The wavelet decomposition level was set to be equal to be L = 4. In the case of Bayesian shrinkage, the hyperparameters of the signal and noise pdf's were estimated directly from the data by means of the expectation-maximization algorithm as detailed in [13].

As the next step, the same decomposition and distributional parameters were used to enhance the tracking images by means of the proposed DDN method, whose results are demonstrated in the lowest row of Fig. 5. In this case, the learning parameters A and α in (15) were set to be equal to 3 and 0.85, respectively. The measurement and model noises ut and vt in (18) and (19) were assumed to be mutually independent, white, Gaussian noises with zero means and covariances equal to 10−2 × I6 and 10−3 × I6, respectively (where I6 denotes a 6 × 6 identity matrix). The initial value of the posterior error covariance was set to be equal to 0.5 × I6.

Analyzing the last row of Fig. 5 one can see that, in the beginning of convergence, the overall quality of the DDN-enhanced images is virtually identical to that of the images processed by the “static” Bayesian denoising. However, the noise contamination of the DDN-estimates goes and weakens as the frame index grows and a constantly increasing amount of information on the target is accumulated by the algorithm. In particular, one can see that, by the frame number 25, the noise decreases to the level comparable to that of the soft-thresholded images, while the edge preservation of the DDN method is similar to the case of the “static” Bayesian shrinkage. For qualitative assessment of the above results, the measure of equivalent number of looks (ENL) [32] was used in this paper. This measure is conventionally used in SAR imagery to quantify the contamination of images by speckle noise, and it is defined as a ratio of the squared mean and the variance of a SAR image.3 Note that, in general, higher values of ENL indicates lower levels of the noise contamination. Thus, for instance, the average ENL for the soft-thresholding and the “static” Bayesian methods were found to be equal to 9.6 and 7.2, respectively, which implies that the former approach suppresses the noise more effectively as compared to the latter (it has to be kept in mind, however, that ENL is a global criterion and, hence, incapable of assessing the preservation of image details). On the other hand, in the course of convergence, the average ENL obtained by DDN improved from 7.2 to 9.46, thereby closely reaching the smoothness of the soft-thresholded images, while notably better preserving the target's details.

VII. Discussion and Conclusion

Whenever reasonable assumptions on the statistical properties of an image of interest can be made, the Bayesian estimation framework often allows one to derive a considerably more informative estimate of the image as compared to the case when the estimation is performed based on observed data alone. This is why Bayesian estimation has long become a preferred method of reconstructing signals and images, especially in the situations when a priori assumptions are necessary to either recover lost information or fuse multiple informational sources [36]. Unfortunately, defining the priors is well-known to be a very delicate step that, when performed incorrectly, could produce rather misleading results [37]. The central idea of the present study has been to show conceptually and experimentally that, in dynamic scenarios, some useful priors can be extracted from previous observations of the dynamic scene that needs to be enhanced. As a result, an image enhancement procedure—referred to as DDN—was introduced as a method for simultaneously enhancing sequences of images and tracking the objects of interest represented by the latter.

The results of image enhancement by means of the DDN method were compared with the results obtained using the standard soft-thresholding procedure of [7] as well as the “static” Bayesian shrinkage detailed in Section II-B. In both cases, the performance of the reference approaches was found to be inferior to that of the proposed method. In particular, in the steady state, the DDN algorithm outperformed the soft-thresholding in terms of edge-preservation, while resulting in substantially more effective noise cancellation as compared with the “static” Bayesian approach.

The above results, however, can be only regarded as preliminary, as a formal proof of the convergence properties of the DDN algorithm still needs to be derived. Specifically, a dependency between the “annealing” parameters A and α in (15) and the posterior error covariance of the Kalman filter should be found. We note that this dependency appears to be central to the convergence properties of the proposed method. Indeed, in the beginning of the recursion, when the Kalman filter has not yet converged, and, hence, the error covariance is relatively large, the “location” priors Pj,m1 and Pj,m0 have to be oversmoothed. This is necessary to make the algorithm discard the information coming from the past observations as not sufficiently certain. Subsequently, as the filter converges (implying that the error covariance decreases), the amount of smoothing can be adequately diminished. Thus, in some sense, the degree of smoothing in estimating the “location” priors is to be inversely proportional to the posterior error covariance of the Kalman filter.

The wavelet shrinkage procedure used in this paper was based on orthogonal wavelet transforms. While being preferable from many perspectives, the latter have been observed to produce Gibbs-like artifacts when used in denoising applications. On the other hand, this undesirable phenomenon appears to be much less pronounced, when nonorthogonal undecimated wavelet decompositions are used instead [38]. Consequently, modifying the DDN method to take advantage of the above property of undecimated wavelet transforms may be a possible development of the present study.

In this paper, the apparent motion of tracked objects was described by affine optical flows. Needless to say that whenever the motion of a nonrigid object has to be estimated and/or the scenario involves tracking of more than one target, the affine model may not be appropriate for the task at hand. In this case, using the displacement models of higher complexity, such as the local affine [39] or spline-based [40] models, can be considered instead. Moreover, the linear Kalman filter employed in the present study can be replaced by either by a more general extended Kalman filter [41] when (18) and (19) are known to be nonlinear or by a particle filter [42] when the motion distribution is, e.g., multimodal (as it would be the case in multiobject tracking). All the above modifications may also be considered as possible developments of the method proposed in this paper.

Acknowledgment

The authors would like to thank Dr. A. Betser for very helpful conversations on Kalman filtering. The authors would also like to thank all the anonymous reviewers, whose input allowed substantial improvement of the quality of this paper.

This work was supported in part by the National Science Foundation, in part by the the Air Force Office of Sponsored Research, in part by the Army Research Office, in part by MURI, in part by MRI-HEL, in part by the National Institutes of Health (NAC P41 RR-13218), and in part by the Natural Sciences and Engineering Research Council of Canada under a Discovery grant. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Pier Luigi Dragotti.

Biographies

graphic file with name nihms-159643-b0001.gif

Oleg Michailovich (M'02) was born in Saratov, Russia, in 1972. He received the M.Sc. degree in electrical engineering from the Saratov State University in 1994 and the M.Sc. and Ph.D. degrees in biomedical engineering from The Technion—Israel Institute of Technology, Haifa, in 2003.

He is currently with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada. His research interests include the application of image processing to various problems of image reconstruction, segmentation, inverse problems, nonparametric estimations, approximation theory, and multiresolution analysis.

graphic file with name nihms-159643-b0002.gif

Allen Tannenbaum (M'93) was born in New York in 1953. He received the Ph.D. degree in mathematics from Harvard University, Cambridge, MA, in 1976.

He has held faculty positions at the Weizmann Institute of Science, Rehovot, Israel; McGill University, Montreal, QC, Canda; ETH, Zurich, Switzerland; The Technion—Israel Institute of Technology, Haifa; the Ben-Gurion University of the Negev, Israel; and the University of Minnesota, Minneapolis. He is presently the Julian Hightower Professor of Electrical and Biomedical Engineering, Georgia Institute of Technology, Atlanta, and Emory University, Atlanta. He has done research in image processing, medical imaging, computer vision, robust control, systems theory, robotics, semiconductor process control, operator theory, functional analysis, cryptography, algebraic geometry, and invariant theory.

Footnotes

1

Note that, in this case, the pseudo-inverse involves inversion of only a 6 × 6 matrix.

2

Only homogeneous portions of the backgrounds of the images were used to compute the histograms.

3

For analogous purposes, a square root of ENL is used in ultrasound imagery, where it is known as speckle-SNR.

References

  • [1].Malladi R, Sethian J, Vemuri B. Shape modeling with front propagation: A level set approach. IEEE Trans. Pattern Anal. Mach. Intell. 1995 Feb.17(2):1 158–175. [Google Scholar]
  • [2].Unal G, Krim H, Yezzi A. Fast incorporation of optical flow into active polygons. IEEE Trans. Image Process. 2005 Jun.14(6):745–759. doi: 10.1109/tip.2005.847286. [DOI] [PubMed] [Google Scholar]
  • [3].Blake A, Yuille A. Active Vision. Mass. Inst. Technol.; Cambridge, MA: 1992. [Google Scholar]
  • [4].Horn P, Schunck B. Determining optical flow. Artif. Intell. 1981 Aug.17:185–203. [Google Scholar]
  • [5].Raol J, Girija G, Singh J. Modeling and Parameter Estimation of Dynamic Systems. IET; London, U.K.: 2004. [Google Scholar]
  • [6].Kornprobst P, Deriche R, Aubert G. Image sequence analysis via partial differential equations. J. Math. Imag. Vis. 1999 Sep.11(1):5–26. [Google Scholar]
  • [7].Donoho D. De-noising by soft-thresholding. IEEE Trans. Inf. Theory. 1995 May;41(5):613–627. [Google Scholar]
  • [8].Muller P, Vidakovic B. Lecture Notes in Statistics. vol. 141. Springer-Verlag; New York: 1999. Bayesian inference in wavelet-based models. [Google Scholar]
  • [9].Sorenson H. Kalman Filtering: Theory and Application. IEEE; Piscataway, NJ: 1985. [Google Scholar]
  • [10].Mallat S. A Wavelet Tour of Signal Processing. Academic; New York: 1998. [Google Scholar]
  • [11].Gruber M. Improving Efficiency by Shrinkage: The James–Stein and Ridge Regression Estimators. Marcel Dekker; New York: 1998. [Google Scholar]
  • [12].Moulin P, Liu J. Analysis of multiresolution image de-noising schemes using generalized Gaussian and complexity priors. IEEE Trans. Inf. Theory. 1999 Apr.45(3):909–919. [Google Scholar]
  • [13].Johnstone M, Silverman BW. Empirical Bayes selection of wavelet threshold. Stanford Univ.; Oxford Univ.; Stanford, CA: Oxford, U.K.: 2004. [Google Scholar]
  • [14].Chipman HA, Kolaczyk E, McCulloch R. Adaptive Bayesian wavelet shrinkage. J. Amer. Statist. Assoc. 1997 Dec.92(440):1413–1421. [Google Scholar]
  • [15].Clyde M, Parmigiani G, Vidakovic B. Multiple shrinkage and subset selection in wavelets. Biometrika. 1998;85:391–402. [Google Scholar]
  • [16].Abramovich F, Sapatinas T. Bayesian approach to wavelet decomposition and shrinkage. In: Muller P, Vidakovic B, editors. Bayesian Inference in Wavelet-Based Models. Springer; New York: pp. 33–50. ch. 3. [Google Scholar]
  • [17].Vidakovic B. Nonlinear wavelet shrinkage with Bayes rules and Bayes factors. J. Amer. Statist. Assoc. 1998 Mar.93(441):173–179. [Google Scholar]
  • [18].Chang G, Yu B, Vetterli M. Spatially adaptive wavelet thresholding with context modeling for image de-noising. IEEE Trans. Image Process. 2000 Sep.9(9):1532–1546. doi: 10.1109/83.862630. [DOI] [PubMed] [Google Scholar]
  • [19].Simoncelli E. Bayesian de-noising of visual images in the wavelet domain. In: Muller P, Vidakovic B, editors. Bayesian Inference in Wavelet-Based Models. Springer; New York: pp. 291–308. ch. 18. [Google Scholar]
  • [20].Pizurica, Philips W, Lemahieu I, Acheroy M. A versatile wavelet domain noise filtration technique for medical imaging. IEEE Trans. Med. Imag. 2003 Mar.22(3):323–331. doi: 10.1109/TMI.2003.809588. [DOI] [PubMed] [Google Scholar]
  • [21].Scharcanski J, Jung C, Clarke R. Adaptive image de-noising using scale and space consistency. IEEE Trans. Image Process. 2002 Sep.11(9):1092–1100. doi: 10.1109/TIP.2002.802528. [DOI] [PubMed] [Google Scholar]
  • [22].Crouse M, Nowak RD, Baraniu RG. Wavelet-based statistical signal processing using hidden Markov models. IEEE Trans. Image Process. 1998 Apr.46(4):886–902. [Google Scholar]
  • [23].Pizurica, Philips W, Lemahieu I, Acheroy M. A joint inter- and intra-scale statistical model for Bayesian wavelet based image de-noisin. IEEE Trans. Image Process. 2002 May;11(5):545–557. doi: 10.1109/TIP.2002.1006401. [DOI] [PubMed] [Google Scholar]
  • [24].Middleton D, Esposito R. Simultaneous optimum detection and estimation of signals in noise. IEEE Trans. Inf. Theory. 1968 May;14(5):434–444. [Google Scholar]
  • [25].Silverman BW. Density Estimation for Statistics and Data Analysis. CRC; Boca Raton, FL: 1986. [Google Scholar]
  • [26].Cohen A, Daubechies I, Vial P. Wavelets on the interval and fast wavelet transform. Appl. Comput. Harmon. Anal. 1993;1(1):54–81. [Google Scholar]
  • [27].Breiman L, Meisel W, Purcell E. Variable kernel estimates of multivariate densities. Technometrics. 1977;19:135–144. [Google Scholar]
  • [28].Robinson D, Milanfar P. Fast local and global projection-based methods for affine motion estimation. J. Math. Imag. Vis. 2003;18:35–54. [Google Scholar]
  • [29].Unser M, Thevenaz P, Yaroslavsky L. Convolution-based interpolation for fast, high-quality rotation of images. IEEE Trans. Image Process. 1995 Oct.4(10):1371–1381. doi: 10.1109/83.465102. [DOI] [PubMed] [Google Scholar]
  • [30].Goodman JW. Some fundamental properties of speckle. J. Opt. Soc. Amer. 1976 Nov.66(11):1145–1150. [Google Scholar]
  • [31].Jain AK. Fundamentals of Digital Image Processing. Prentice-Hall; Englewood Cliffs, NJ: 1989. [Google Scholar]
  • [32].Guo H, Odegard JE, Lang M, Gopinath RA, Selesnick IW, Burrus CS. Wavelet based speckle reduction with application to SAR based ATD/R. Proc. Int. Conf. Image Processing. 1994;1:75–79. [Google Scholar]
  • [33].Michailovich O, Tannenbaum A. De-speckling of medical ultrasound images. IEEE Trans. Ultrason., Ferroelect., Freq. Contr. 2006 Jan.53(1):64–78. doi: 10.1109/tuffc.2006.1588392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Arsenault HH, April G. Properties of speckle integrated with a finite aperture and logarithmically transformed. J. Opt. Soc. Amer. 1976 Nov.66(11):1160–1163. [Google Scholar]
  • [35].Daubechies I. Ten Lectures on Wavelets. SIAM; Philadelphia, PA: 1992. [Google Scholar]
  • [36].Maskell S. A Bayesian approach to fusing uncertain, imprecise and conflicting information. Inf. Fusion. 2007 [Google Scholar]
  • [37].Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for Data Analysis. CRC; Boca Raton, FL: 2000. [Google Scholar]
  • [38].Gnanaduraia D, Sadasivamb V. Undecimated wavelet based speckle reduction for SAR images. Pattern Recognit. Lett. 2005 May;25(6):793–800. [Google Scholar]
  • [39].Periaswamy S, Farid H. Elastic registration in the presence of intensity variations. IEEE Trans. Med. Imag. 2003 Jul.22:865–874. doi: 10.1109/TMI.2003.815069. [DOI] [PubMed] [Google Scholar]
  • [40].Kybic J, Thevenaz P, Nirkko A, Unser M. Unwarping of unidirectionally distorted EPI images. IEEE Trans. Med. Imag. 2000 Feb.19(2):80–93. doi: 10.1109/42.836368. [DOI] [PubMed] [Google Scholar]
  • [41].Ljung L. Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems. IEEE Trans. Autom. Contr. 1979 Feb.24(1):36–50. [Google Scholar]
  • [42].Risfic B, Arulampalam S, Gord N. Beyond the Kalman Filter: Particle Filters for Tracking Applications. Artech House; Boston, MA: 2004. [Google Scholar]

RESOURCES