Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Aug 6.
Published in final edited form as: IEEE Trans Ultrason Ferroelectr Freq Control. 2010 Jun;57(6):1347–1357. doi: 10.1109/TUFFC.2010.1554

A Fast Normalized Cross-Correlation Calculation Method for Motion Estimation

Jianwen Luo 1, Elisa E Konofagou 2
PMCID: PMC4123965  NIHMSID: NIHMS581730  PMID: 20529710

Abstract

High-precision motion estimation has become essential in ultrasound-based techniques such as time-domain Doppler and elastography. Normalized cross-correlation (NCC) has been shown as one of the best motion estimators. However, a significant drawback is its associated computational cost, especially when RF signals are used. In this paper, a method based on sum tables developed elsewhere is adapted for fast NCC calculation in ultrasound-based motion estimation, and is tested with respect to the speed enhancement of the specific application of ultrasound-based motion estimation. Both the numerator and denominator in the NCC definition are obtained through pre-calculated sum tables to eliminate redundancy of repeated NCC calculations. Unlike a previously reported method, a search region following the principle of motion estimation is applied in the construction of sum tables. Because an exhaustive search and high window overlap are typically used for highest quality imaging, the computational cost of the proposed method is significantly lower than that of the direct method using the NCC definition, without increasing bias and variance characteristics of the motion estimation or sacrificing the spatial resolution. Therefore, high quality, high spatial resolution, and high calculation speed can be all simultaneously obtained using the proposed methodology. The high efficiency of this method was verified using RF signals from a human abdominal aorta in vivo. For the parameters typically used, a real-time, very high frame rate of 310 frames/s was achieved for the motion estimation. The proposed method was also extended to 2-D NCC motion estimation and motion estimation with other algorithms. The technique could thus prove very useful and flexible for real-time motion estimation as well as in other fields such as optical flow and image registration.

I. Introduction

Motion estimation is performed in several ultrasound-based techniques such as blood flow imaging [1]–[4], elastography or elasticity imaging [5], [6], phase-aberration correction [7]–[9], and strain compounding [10]. Many methods have been proposed for motion estimation, including phase-domain methods [11], [12], time-domain or space-domain methods [13], [14], and spline-based methods [15], [16]. Time-domain (1-D) or space-domain (2-D) methods have been widely and frequently used because of their high accuracy, precision, and resolution, and relative simplicity in implementation [3]. Typically, an observation reference signal/image window (i.e., kernel) is defined in one frame of ultrasound data (either B-mode image, RF, or envelope signal) and subsequently compared with different candidate windows (i.e., comparison windows) from another frame of ultrasound data, within a pre-defined search range. A cost function is calculated to quantify the similarity, or matching, between the reference and comparison windows. The motion is estimated from the temporal or spatial shift between the reference window and the best-match comparison window, which gives the maximum (or minimum) cost function value.

Numerous cost functions have been developed for different motion estimation methods. These methods include, but are not limited to, non-normalized cross-correlation (CC), normalized cross-correlation (NCC), zero-mean normalized cross-correlation (ZNCC, i.e., normalized covariance), sum of absolute differences (SAD), and sum of squared differences (SSD) [13], [14]. Numerous efforts have been concentrated on the performance comparison of different methods [13], [14], [17]–[20].

The ultimate objective of motion estimation techniques is to obtain images at high accuracy, high spatial resolution, and low computational cost. However, there exist several trade-offs between accuracy, spatial resolution, and computational cost: 1) compared with B-mode-based motion estimation, RF signal-based motion estimation is more accurate [21] and provides higher spatial resolution, but is associated with higher computational cost because of the higher RF sampling rate; 2) compared with other estimators, the NCC algorithm is considered as one of the most accurate motion estimators [13] but at a higher computational cost because of its complex definition; 3) high window overlap (or, small shift between successive reference windows) in NCC improves the spatial resolution for a given set of acoustic parameters [22], but increases the computational cost because it incurs a larger number of calculations; 4) within a certain range, a larger window size reduces the jitter error of motion estimation [21], [23], [24] but at the expense of increased computational cost. Because of these trade-offs, estimation quality is typically sacrificed to obtain real-time motion estimation, e.g., through the use of B-mode instead of RF frames or alternative estimators and parameters.

The NCC is generally considered to be the gold standard of motion estimation because it compensates for local variation (such as scaling) of the signal energy [13]. However, its higher computational cost is a significant drawback in its real-time application, especially when highly sampled RF signals and an exhaustive search are used. An exhaustive search, or, full search, entails that the reference window is contrasted with every possible comparison window within a pre-specified search range.

An intuitive approach to reducing the computational cost in motion estimation is to use a guided, instead of an exhaustive, search [25]–[27]. In a guided search, the estimated motion of neighboring regions is used as a priori information, thus significantly reducing the search region and its computational cost [25]–[27]. Methods with a guided search have been successfully applied to strain imaging of breast tumors [25], [27], where an external, static compression is used and the a priori information is easier to obtain. Motion estimation of cardiovascular tissues has been playing an increasingly important role in the diagnosis of cardiovascular disease [28]–[30]. The distribution of the internal, physiologic motion within cardiovascular tissues is too complex for any a priori information to be used. An exhaustive search is thus required. This more general methodology also acts as a benchmark for evaluating the efficiency of all fast algorithms.

The NCC and ZNCC have been widely used in the fields of template/block matching, object recognition, image registration and image/video compression. As will be shown later [see (1)], the definition of NCC consists of a denominator (i.e., energy terms) and a numerator (i.e., standard CC term). A relatively efficient way of calculating the NCC is to compute the standard CC (numerator) in the frequency domain using the fast Fourier transform (FFT) and then compute the energy terms in the denominator using sum tables (i.e., tables that store cumulative sums, in this case, cumulative sums of the squared signal) [31]. Sum tables have also been used in the calculation of mean and energy terms in ZNCC [32]. However, the calculation of the numerator dominates the computational cost of the NCC. As a result, even though the FFT is used, it remains computationally intensive. Basis functions have been proposed to approximate the template (i.e., reference window) and obtain substantial computational gains over the FFT-based methods [33], [34]. This method relies on selecting a suitable set of basis functions and only provides an approximation of NCC or ZNCC. Tsai and Lin proposed a fast ZNCC method for defect detection which used the sum tables to calculate both the numerator and denominator [35]. In their application, both reference and comparison windowed sub-images were simultaneously shifted pixel by pixel across the images, with low correlation between them indicating the defect. However, the calculation of the ZNCC in their application did not include a search region, as emphasized in this paper, because the reference and comparison windows were always coincident in their locations. In motion estimation, the location of the maximum correlation within a search range indicates the shift or, motion, between the reference and comparison windows. The search range is therefore essential for the purpose of motion estimation.

In this paper, a fast method for NCC with an exhaustive full search is hereby proposed for motion estimation by using the sum-table scheme [31], [35]. This method, referred to as the sum-table method, includes the construction of one sum table for the reference signal energy, one sum table for the comparison signal energy, and several sum tables for the non-normalized CC term between the reference and comparison signals. Both the denominator and numerator of the NCC are calculated using sum tables, unlike the previously reported method for which only the denominator was taken into account [31]. Compared with [35], this method also consists of a search region, as required by the principle of motion estimation. An example of motion estimation using high-frame-rate RF signals from a human abdominal aorta in vivo is used to test the high efficiency of the proposed method. As will be shown in this paper, the proposed method significantly reduces the computational cost of the NCC algorithm, thus allowing for the simultaneous use of RF signals, large window sizes, and high window overlaps for real-time imaging. Therefore, high-quality motion estimation can be achieved at both high spatial resolution and low computational cost.

II. Methods

A. Normalized Cross-Correlation (NCC)

The reference and comparison signals are referred to as f(n) and g(n), respectively, where n is the sample index (1 ≤ nM, M is the total number of samples). The NCC between the reference and comparison windows, RNCC, is defined as [13]

RNCC(u,τ)=n=uu+W-1f(n)g(n+τ)n=uu+W-1f2(n)·n=uu+W-1g2(n+τ),(τ1ττ2), (1)

where the reference window is located within the interval of [u, u + W − 1], u is the origin of the reference window, W is the window size, as shown in Fig. 1, τ is the shift between the comparison and reference windows, and [τ1, τ2] is the search range determined by the range of physiologic displacements.

Fig. 1.

Fig. 1

A schematic diagram of redundant calculation of the (a) energies of the (upper) reference and (lower) comparison windows, and (b) non-normalized cross-correlation between the reference and comparison windows. The overlapping regions (i.e., redundant calculations) were indicated by the gray areas.

As shown in (1), the NCC calculation consists of three terms, i.e., the energy of the reference window (n=uu+W-1f2(n)) in the denominator, the energy of the comparison window (n=uu+W-1g2(n+τ)) in the denominator, and the standard (i.e., non-normalized) CC between these two windows (n=uu+W-1f(n)g(n+τ)) in the numerator. For each reference window, both the non-normalized CC term between the reference and comparison windows and the energy of the comparison window need to be calculated for every search τ within the range of [τ1, τ2]. This calculation is repeated for each reference window (i.e., each value of u) across the entire signal length. Therefore, NCC-based motion estimation with an exhaustive search is extremely time-consuming.

Motion estimation consists of a series of template matching. In the entire process of motion estimation, the reference window (i.e., the template) varies, by sliding it across the entire signal length. Typically, the separation between successive reference windows in space, i.e., the window shift (ΔW), is smaller than the window size (W), as shown in Fig. 1. In other words, the reference windows overlap.

B. Fast NCC Calculation Using the Sum-Table Method

The efficient NCC calculation method based on sum tables relies on the fact that most calculations are redundant because of the exhaustive search of the comparison windows and high overlap between the reference windows. When the search is equal to τ and τ + 1, the comparison windows overlap, as shown in Fig. 1(a). Therefore, the energy calculation within the overlapping region becomes redundant. According to this, the sum table for the comparison window, sg2(u), is constructed as follows:

sg2(u)={g2(u)+sg2(u-1)(1uM)0(u=0), (2)

where M is the total length (in samples) of the signal used in motion estimation.

The energy of the comparison window [i.e., the first term of the denominator in (1)] can then be directly calculated as

n=uu+W-1g2(n+τ)=sg2(u+W-1+τ)-sg2(u-1+τ). (3)

Similarly, the calculations of different reference windows become redundant when there is an overlap between them (i.e., ΔW < W), as shown in Fig. 1(a). Similar to (2), the following sum table, sf2(u), is constructed as follows:

sf2(u)={f2(u)+sf2(u-1)(1uM)0(u=0). (4)

The energy of the reference window [i.e., the second term of the denominator in (1)] can be calculated as

n=uu+W-1f2(n)=sf2(u+W-1)-sf2(u-1). (5)

Because successive reference windows usually overlap, the entire calculation of the numerator in (1) is also redundant. For a search τ, as shown in Fig. 1(b), the CC between the reference window at u and the comparison window at u+τ, and the CC between the successive reference window at u+ ΔW and the comparison window at u+τW share some redundant information, because both reference (and comparison) windows substantially overlap (i.e., ΔW < W). This holds for every search τ. Based on this fact, the following sum table, sf,g(u, τ), is constructed for each search τ (τ1ττ2),

sf,g(u,τ)={f(u)·g(u+τ)+sf,g(u-1,τ)(1uM)0(u=0). (6)

Therefore, the numerator in NCC given by (1) can be calculated from the sum tables through

n=uu+W-1f(n)g(n+τ)=sf,g(u+W-1,τ)-sf,g(u-1,τ). (7)

In summary, the calculation of the NCC is converted to two sum tables, i.e., sf2(u) and sg2(u), for the denominator and τ2τ1 + 1 sum tables, i.e., sf,g(u, τ), for the numerator, as defined by (2), (4), and (6), respectively.

RNCC(u,v,τx,τy)=m=uu+Wx-1n=vv+Wy-1f(m,n)g(m+τx,n+τy)m=uu+Wx-1n=vv+Wy-1f2(m,n)·m=uu+Wx-1m=vv+Wy-1g2(m+τx,n+τy) (8)

C. 2-D Motion Estimation Using the Sum-Table Method

In the previous section, a 1-D kernel in a 1-D search was used to estimate the 1-D motion. To estimate 2-D motion, a 1-D kernel in a 2-D search can be used [36]. The RF signals are interpolated along the lateral direction of the ultrasound beam to perform sub-beam lateral displacement estimation. The expansion of the sum-table method in this case is straightforward. For both original and interpolated beams within the lateral search range, sum tables given by (4) and (6) are calculated.

The sum-table method can be easily extended to a 2-D/3-D kernel in a 2-D/3-D search, where the windows overlapped in two/three directions, thus suggesting high efficiency of the proposed method. The NCC between the 2-D reference window f(m,n) and comparison window g(m,n) is defined as (8), see above, where the reference window is located in the interval of [u, u + Wx − 1] and [v, v + Wy − 1] in x and y directions, Wx and Wy denote the window sizes in x and y directions, τx and τy are the searches along the x and y directions, respectively. Similar to (1), most calculations in (8) are redundant when an exhaustive search and high window overlap are used. The sum-table method for fast 2-D NCC calculation (i.e., motion estimation with a 2-D kernel in a 2-D search) is described and tested in Appendix A.

D. Computational Complexity

Table I lists the numbers of arithmetic operations required for the direct method and the proposed sum-table method for the NCC calculation, respectively. The direct method calculates the NCC by directly using its definition given by (1). Assuming the search range R = τ2τ1 + 1 ≫ 1, its computational complexity is approximately O(nWR), where O(·) denotes the order of magnitude, n is the total number of reference windows across the entire signal length, and W is the window size.

TABLE I.

The Comparison of Arithmetic Operations Between the Direct Method and the Proposed Sum-Table Method for the NC Calculation.

Direct method Sum-table method
Construction of sf2 in (4) 0 M multiplications
M additions
Construction of sg2 in (2) 0 M multiplications
M additions
Construction of sf,g in (6) 0 MR multiplications
MR additions
Calculation of Σf2 in (1) or (5) nW multiplications
nW additions
n subtractions
Calculation of Σg2 in (1) or (3) nWR multiplications
nWR additions
nR subtractions
Calculation of Σf · g in (1) or (7) nWR multiplications
nWR additions
nR subtractions
Normalization in (1) nR multiplications
nR root square operators
nR divisions
nR multiplications
nR root square operators
nR divisions

W = window size; ΔW = window shift; R = search range; n = number of reference windows.

The computational complexity of the sum-table method is mainly determined by the construction of sum tables in (2), (4), and (6). Assuming R ≫ 1, the computational complexity is approximately O(MR), where M is the total signal length. Therefore, the complexity of the sum table construction mainly depends on the signal length (M) and the search range (R), and is independent of the window size (W).

Assuming MW and MR, the number of windows is approximated by n = MW. Therefore, the complexity of the direct method is approximately O(M(WW)R). Compared with the complexity of the sum-table method, i.e., O(MR), the direct method is significantly more computationally expensive in the case of high overlap (i.e., WW ≫ 1). In other words, higher overlaps are associated with higher computational savings of the sum-table method.

E. In Vivo Performance Evaluation

The proposed method was compared with the direct calculation method [i.e., using (1) without sum tables] using in vivo RF signals. The RF frames were acquired from a human abdominal aorta using a Sonix RP system (Ultrasonix Medical Corp., Burnaby, BC, Canada) with a phased array (3.3 MHz) at a frame rate of 194 Hz [37], [38]. Each frame consisted of 32 beams with 2592 samples per beam (i.e., 10-cm depth) using a sampling frequency of 20 MHz.

C++ (Visual Studio .NET 2003, Microsoft Corp., Seattle, WA) was used to implement both the direct and proposed sum-table methods; 64-bit, double precision floating-point numbers were used for all computations. The computational time of both methods was compared at different window sizes (W), window shifts (ΔW), and search ranges (R = τ2τ1 + 1). The average and standard deviation (SD) of computational time (in milliseconds) for the motion estimation of each frame was calculated from a total of 323 frames, with 32 beams per frame. The data processing was performed on a PC workstation (Intel Pentium D CPU, 3.4 GHz, 2 GB RAM).

At the high frame rate used, a search range of nine samples in the axial direction (i.e., −4 to 4 samples, or −0.15 to 0.15 mm displacement range) was deemed sufficient to cover the maximum inter-frame displacement experienced by the aorta over a cardiac cycle. The search range was also varied between 5, 17, 33, and 41 samples to investigate its effect on the computational cost. The window size used had 64, 128, or 256 samples, i.e., 2.46, 4.93, or 9.86 mm in length, respectively. The window shift was varied to include 16, 32, 64, or 128 samples.

Using the RF signals of a human left-ventricle in a parasternal long-axis view [39], a preliminary comparison was also performed on the same PC workstation. The RF signals had 2848 samples (or, 11 cm at a sampling frequency of 20 MHz) and there were 64 beams in each frame, acquired at the high frame rate of 481 Hz. At a window size of 128 samples (i.e., 4.9 mm) and window shift of 32 samples (i.e., 75% overlap), an axial search range of ± 4 samples, a lateral search range of ± 1 beam and an interpolation factor of 10 were used.

III. Results

Fig. 2 shows the axial displacement estimation using the sum-table method (W = 128 samples, ΔW = 32 samples, and R = 9 samples). Using these parameters, the computational time was equal to 3.2 ± 0.1 ms/frame, resulting in a frame rate of motion estimation of approximately 310 frames/s. The sequence of displacement images [Figs. 2(a)–(c)] depicts the pulse-wave propagation from the rightmost (proximal) to the leftmost (distal) side of the image [37]–[40]. Although not shown here, it should be noted that the direct and sum-table methods yielded identical estimates, because they were mathematically equivalent, i.e., both computed (1).

Fig. 2.

Fig. 2

The estimated pulse wave imaging (PWI) displacements (motions) of a human aortic aorta overlaid onto the B-mode images [37], [38]. (a), (b), and (c) are the image frames at consecutive time points, with 10 ms (i.e., 2 frames) between images [40]. The time lapsing after the ECG R-wave peak is indicated above each image. The color scale of the displacement is [−3 3] samples, equivalent to [−0.12 0.12] mm or velocities range of [−2.2 2.2] cm/s. The positive and negative displacements represent motion toward and away from the phased array on the top of all images, respectively. Only the displacements on the anterior aortic wall and the peri-aortic tissue are shown for better visualization. The thick arrows indicate the propagation of the pulse wavefront. With the parameters used (W = 128 samples, ΔW = 32 samples and R = 9 samples), the frame rate of motion estimation is approximately 310 frames/s.

Fig. 3 shows the computational time of the direct and sum-table methods at different window sizes. As shown, the sum-table method reduces the computational time of the direct method by 53 to 90% depending on the parameters used. The computational time of the direct method linearly increases with the window size when the window shift is fixed. However, the computational time of the sum-table method remains virtually the same at different window sizes for both small and large search ranges (Fig. 3). The computational savings of the sum-table method over the direct method are more pronounced at larger window sizes (i.e., higher window overlaps), in accordance with the computational complexity analysis.

Fig. 3.

Fig. 3

The computational time (ms) and percentage of computational time of the sum-table relative to the direct method at different window sizes (W). (a) search range R = 5 samples (b) search range R = 41 samples. (ΔW = 32 samples). Error bars represent 10 × standard deviation (SD). *P < 0.001.

Fig. 4 compares the computational times of the direct and sum-table methods at different search ranges. The sum-table method reduces the computational cost of the direct method by 73 to 81% at the window overlap of 75% and the search range used (5 to 33 samples). The computational time of both methods increases with the search range, in agreement with the computational complexity.

Fig. 4.

Fig. 4

The computational time (ms) and percentage of computational time of the sum-table relative to the direct method at different search ranges (R). (W = 128 samples, ΔW = 32 samples). Error bars represent 10 × SD *P < 0.001.

Fig. 5 shows the computational time of both the direct and sum-table methods at different window shifts for a fixed window size. As shown, the computational time of the direct method steadily decreases with the window shift, because the number of reference windows used across the entire signal length decreases (from 152 to 19). However, compared with the direct method, the computational efficiency of the sum-table method was observed in all cases.

Fig. 5.

Fig. 5

The computational time (ms) and percentage of computational time of the sum-table relative to the direct method at different window shifts (ΔW) (W = 128 samples, R = 41 samples). Error bars represent 10 × SD. *P < 0.001.

The computational cost gain of the sum-table method decreases from 88% to 47% when the window shift increases from 16 to 128 samples (Fig. 5). In other words, the efficiency of the sum-table method decreases with the window shift. This is mainly because of the reduced window overlap when the window shift increases, and thus is in agreement with the complexity analysis. However, even when there is no overlap (i.e., maximum window shift, W = ΔW = 128 samples), the computational cost of the sum-table method is still substantially lower than that of the direct method (Fig. 5). This is because the comparison window slides sample-by-sample within the search range while still overlapping with successive comparison windows, as shown in Fig. 1(a).

In the motion estimation of the human left ventricle using a 1-D kernel in a 2-D search, the computational time of the direct and sum-table methods was equal to 0.63 ± 0.01 and 0.13 ± 0.01 s/frame, respectively. Therefore, the sum-table method achieved computational savings of 79%, thus ensuring quasi-real-time 2-D estimation at a frame-rate higher than 7 frames/s. The frame rate was further increased to 30 frame/s by using a more powerful PC workstation (Intel Core 2 Quad Q9650 CPU, 3.00 GHz, 8 GB RAM) and parallel computation (4 parallel threads). To the authors’ best knowledge, this is the fastest reported speed for 2-D motion estimation using RF signals with an exhaustive search.

IV. Discussion

Motion estimation with an exhaustive search can be computationally intensive. The proposed method used pre-calculated sum tables to avoid repeating redundant computations in the definition of the NCC given by (1). After construction of the sum tables, the NCC can be efficiently calculated through table lookup. For the parameters typically used (W = 128 samples, ΔW = 32 samples, R = 9 samples), the frame rate of the motion estimation in a human abdominal aorta can be increased from 70 frames/s in the direct method to 310 frames/s in the proposed method. The frame rate of the motion estimation is higher than the frame rate of the data acquisition (i.e., 168 frames/s), thus guaranteeing real-time data processing.

By assuming that NCC needs to be calculated at three search points to obtain the sub-sample precision using interpolation on the NCC function [41], the guided search [25]–[27] could potentially reduce the computational cost of the direct method to a minimum ratio of 3/R. The sum-table method is faster than the method with a guided search, when the search range is small, e.g., R = 5 and R = 9 samples (Fig. 4). Such small search ranges are typically used in motion estimation in cardiovascular applications, where data are acquired at very high frame rates to obtain high-quality estimation, e.g., in the cases of [37]–[40] and the case presented in this paper. The comparison of the sum-table method and the guided search with a larger search range remains to be further investigated.

The reduced computational cost of the sum-table method incurs, however, larger RAM requirements, because the sum tables are pre-calculated and stored in RAM. In the in vivo data verification with a 1-D kernel in a 1-D search, the RAM needed for the sum tables was only 142 (= 2592 × (5 + 2) × 8/1024), 223, 385, and 709 KB at 64-bit (i.e., 8-byte) precision and a search range of 5, 9, 17, and 33 samples, respectively. Using a 1-D kernel in a 2-D search, the RAM required increased linearly with the interpolation factor and search range in the lateral direction, and was significantly higher (e.g., on the order of 1 MB) than that with a 1-D kernel in a 1-D search.

The computational cost of the proposed method was significantly lower than the direct method, without sac-rificing the high-quality motion estimation that can be obtained at high window overlaps (i.e., high spatial resolution), large window sizes (i.e., high signal-to-noise ratio), and high frame rates. In vivo data were used to verify the high efficiency of the proposed method, while maintaining the high precision of motion estimation by using RF signals [21]. The computational savings of this method are more significant at higher window overlaps, thus guaranteeing the high resolution of motion estimation with specific acoustic parameters [22]. In addition, because the computational cost of this method is independent of the window size, a larger window size (within the spatial resolution limit) can be used to reduce the jitter errors of motion estimation [21], [23], [24], without the tradeoff of increased computational cost of conventional methods. It should be noted that a larger window size may include larger intra-window deformation and can thus reduce the spatial resolution of motion estimation [22]. Therefore, the largest possible window size that can be used depends on the tradeoff between accuracy and spatial resolution [42], and will thus have to be adapted to the specific application. If needed, a large window can be used in the proposed method to improve the estimation quality without increasing the computational time.

Dynamic programming has also been previously proposed to reduce the computational cost of NCC in 3-D motion estimation [43]. In that method, intermediate row sums were stored in a circular buffer. The calculation of the CC of sliding windows was obtained by adding in the new row sums and subtracting out the previous ones, similar to the approach of box filtering in the field of template matching [44]–[50]. That method also took advantage of the overlap between successive windows in the axial direction. By using dynamic programming combined with signal downsampling (i.e., reducing the number of samples in the RF signals) and two-path processing (i.e., coarse-to-fine estimation), a reduction in the computational time by a factor of 6 was obtained [43]. However, dynamic programming was implemented in only one direction (i.e., the axial direction), although it was applied to 3-D motion estimation. Extension of the dynamic programming method [43], or the approach of box filtering [44]–[50], in more than one direction has not been reported in the literature and, if possible, may introduce higher programming complexity and overhead time. The sum-table method proposed in this paper is deemed more straightforward and more easily extendable to 2-D and 3-D cases, by taking advantage of high window overlaps in all directions. In Appendix A, an example of 2-D motion estimation has shown a reduction in the computational cost by a factor of about 110 (i.e., computational savings of 99%), demonstrating thus the high efficiency and advantage of the proposed method.

In addition to its expansion to 2-D/3-D, the sum-table method can be extended to other algorithms, such as non-normalized CC, ZNCC, SSD, and SAD. Fig. 6 compares the computational time of the direct and sum-table methods for different algorithms in the 1-D case using RF signals of the human abdominal aorta. As shown, the sum-table method performs faster than the direct method for all algorithms. The computational efficiency of the sum-table methods over the direct methods is more significant in the ZNCC algorithms (i.e., computational savings of 85%), because the signal means, energies, and CC terms are all efficiently calculated from the sum tables.

Fig. 6.

Fig. 6

The computational time (ms) at different estimation algorithms and percentage of computational time between the sum-table and direct method. (W = 128 samples, ΔW = 32 samples, R = 41 samples.) Error bars represent 10 × SD. *P < 0.001.

As shown in Fig. 6, the ZNCC is the slowest method because its definition is more complex than the other methods. The CC method is the fastest for both direct and sum-table methods. The SAD was unexpectedly found to be slower than the CC. This could be explained by the fact that the computational cost between addition and multiplication is similar in some new computer architectures (e.g., a pipeline processor) [51], and, compared with CC, the SAD method needs an additional relational test operation and an assignment operation. The efficiency of the methods thus depends on the workstation used. The SAD can be implemented in parallel by using Intel CPU instruction sets, as previously shown [52], [53], but was typically limited to a window size of 4, 8, or 16 samples and 8-bit data precision [54]. In contrast, the sum-table method can still work with an arbitrary window size, data precision, and motion estimator, and hence has a more general impact in its applications. It is also worth mentioning that the computational costs of different motion estimators are on the same order of magnitude with the use of the sum-table method because of its high efficiency (Fig. 6).

V. Conclusion

A time-efficient sum-table method was implemented in the field of ultrasound-based motion estimation to rapidly calculate the normalized cross-correlation (NCC). By taking advantage of the exhaustive search and high overlap between windows required for high-quality imaging, this method can avoid redundant calculation in motion estimation and thereby significantly reduce the computational cost. This method allows for the use of RF signals, together with large window sizes and high window overlaps, for high-quality motion estimation at low computational cost, without the multiple associated tradeoffs in conventional methods between accuracy, spatial resolution, and computational cost. This method was also extended to 2-D motion estimation and motion estimation with methods other than NCC, achieving lower computational time in all cases. The proposed method could thus prove very useful in real-time motion estimation in medical ultrasound as well as other medical imaging modalities or computational fields.

Acknowledgments

This study was supported in part by the Wallace H. Coulter Foundation (WHCFCU02650301) and National Institutes of Health (R01EB006042).

Biographies

graphic file with name nihms581730b1.gif

Jianwen Luo (S’02–M’06) was born in Fujian Province, China, in 1978. He received the B.S., M.S., and Ph. D. with honors degrees in biomedical engineering from Tsinghua University, Beijing, China, in 2000, 2002, and 2005, respectively.

His research interests include ultrasound imaging and biomedical signal processing. He joined the Ultrasound and Elasticity Imaging Laboratory as a postdoctoral research scientist in 2005, and is currently an associate research scientist in the Department of Biomedical Engineering at Columbia University, New York. He is researching high-resolution cardiovascular imaging, including myocardial elastography, electromechanical wave imaging, and pulse wave imaging. He is a member of IEEE, Sigma Xi, and the American Institute of Ultrasound in Medicine, and is an editorial board member of the Journal of Ultrasound in Medicine.

graphic file with name nihms581730b2.gif

Elisa E. Konofagou received her B.S. degree in chemical physics from Université de Pierre et Marie Curie, Paris VI in Paris, France, and her M.S. degree in biomedical engineering from Imperial College of Physics, Engineering and Medicine in London, UK, in 1992 and 1993, respectively. In 1999, Dr. Konofagou received her Ph.D. degree in biomedical engineering from the University of Houston, Houston, TX, for her work on multidimensional elastography for breast cancer diagnosis at the University of Texas Medical School, Houston, TX, then pursued her postdoctoral work in elasticity-based monitoring of ultrasound therapy at Brigham and Women’s Hospital, Harvard Medical School, Boston, MA.

Dr. Konofagou is currently an associate professor of biomedical engineering and radiology and Director of the Ultrasound and Elasticity Imaging Laboratory at Columbia University, New York. Her main interests are in the development of novel elasticity imaging techniques and therapeutic ultrasound methods, such as myocardial elastography, breast elastography, ligament elastography, harmonic motion imaging, and ultrasound-induced brain drug delivery, with several clinical collaborations in the Columbia Presbyterian Medicine Center, New York. Dr. Konofagou is a technical committee member of the Acoustical Society of America and a technical standards committee member of the American Institute of Ultrasound in Medicine. She has served on peer-review committees for the National Institutes of Health, the National Aeronautics and Space Administration, and the National Science Foundation. She also serves as an associate editor for the journal Medical Physics and as an editorial board member of the Ultrasound in Medicine and Biology journal, and is recipient of several awards, including from the Acoustical Society of America, the American Heart Association, the American Institute of Ultrasound in Medicine, the National Institutes of Health, the National Science Foundation, the Radiology Society of North America, and the Wallace H. Coulter Foundation. She is also a member of the IEEE Ultrasonics, Ferroelectrics and Frequency Control Society, the International Society of Therapeutic Ultrasound, the Acoustical Society of America, the American Institute of Ultrasound in Medicine, and the American Heart Association.

Appendix A. Fast 2-D NC Calculation Using the Sum-Table Method

The sum tables are defined as follows:

sf2(u,v)=m=0un=0vf2(m,n)sg2(u,v)=m=0un=0vg2(m,n)sf,g(u,v,τx,τy)=m=0un=0vf(m,n)·g(m+τx,n+τy). (A1)

The sum tables can be constructed efficiently in two steps. First, the summation is performed in the u direction for each v. Then, the summation was performed in the v direction for each u. An example for constructing sf2(u,v) is given as follows:

sf,temp2(u,v)=f2(u,v)+sf,temp2(u-1,v)sf2(u,v)=sf,temp2(u,v)+sf2(u,v-1). (A2)

Therefore, the three terms in (8) can be efficiently computed from the sum tables given by (A1) as

m=uu+Wx-1n=vv+Wy-1f2(m,n)=sf2(u+Wx-1,v+Wy-1)-sf2(u-1,v+Wy-1)-sf2(u+Wx-1,v-1)+sf2(ux-1,v-1), (A3)
m=uu+Wx-1n=vv+Wy-1g2(m+τx,n+τy)=sg2(u+Wx-1+τx,v+Wy-1+τy)-sg2(u-1+τx,v+Wy-1+τy)-sg2(u+Wx-1+τx,v-1+τy)+sg2(ux-1+τx,v-1+τy), (A4)
m=uu+Wx-1n=vv+Wy-1f(m,n)g(m+τx,n+τy)=sf,g(u+Wx-1,v+Wy-1,τx,τy)-sf,g(u-1,v+Wy-1,τx,τy)-sf,g(u+Wx-1,v-1,τx,τy)+sf,g(u-1,v-1,τx,τy). (A5)

A preliminary test was also performed on the same PC workstation (Intel Pentium D CPU, 3.4 GHz, 2 GB RAM) using the high-resolution (30 MHz), high framerate (1000 Hz) B-mode data of a mouse left-ventricle in a short-axis view acquired from a Vevo 770 system (Visual-Sonics Inc., Toronto, ON, Canada). Each B-mode image consisted of 432 × 192 pixels (axially × laterally) in a region of interest of 12 × 12 mm2. A 2-D kernel of 64 samples (i.e., 1.78 mm) in the axial direction and 32 samples (i.e., 2 mm) in the lateral direction and a search range of ± 2 pixels in the axial direction (i.e., ± 0.056 mm) and ± 1 pixel in the lateral direction (i.e., ± 0.063 mm) were used. Fig. 7 shows the estimated 2-D displacements in the systolic phase. All the anterior, posterior, septal, and lateral walls of the left ventricle move toward the center of the blood cavity. The computational time of the sum-table method was 42 ± 1 ms for each frame, approximately 110 times lower than that of the direct method (4.68 ± 0.01 s). Therefore, a frame rate of 24 Hz was achieved for the 2-D motion estimation, with significant computational savings of 99% because of the high overlaps required in both axial the lateral directions. Compared with the 1-D case, the 2-D/3-D extension of the sum-table method takes advantage of the window overlap in more than one direction. Therefore, the computational savings of the 2-D case of the sum-table method are much more significant than those of the 1-D case.

Fig. 7.

Fig. 7

The estimated (a) axial and (b) lateral displacements of a mouse left ventricle overlaid onto the B-mode images. The images were taken from the systolic phase (25 ms after R-wave of the ECG). The positive (red) displacements represent upward or rightward motion, while negative (blue) displacements represent downward or left motion. Only the displacements of the left ventricle are shown. (ANT: anterior wall; LAT: lateral wall; LV: left ventricle; POS: posterior wall; RV: right ventricle; SEP: interventricular septum)

Contributor Information

Jianwen Luo, Department of Biomedical Engineering, Columbia University, New York, NY.

Elisa E. Konofagou, Email: ek2191@columbia.edu, Department of Biomedical Engineering, Columbia University, New York, NY. Also with the Department of Radiology, Columbia University, New York, NY.

References

  • 1.Bonnefous O, Pesque P. Time domain formulation of pulse-Doppler ultrasound and blood velocity estimation by cross-correlation. Ultrason Imaging. 1986;8(2):73–85. doi: 10.1177/016173468600800201. [DOI] [PubMed] [Google Scholar]
  • 2.Trahey GE, Allison JW, Vonramm OT. Angle independent ultrasonic detection of blood-flow. IEEE Trans Biomed Eng. 1987;34(12):965–967. doi: 10.1109/tbme.1987.325938. [DOI] [PubMed] [Google Scholar]
  • 3.Embree PM, Obrien WD. Volumetric blood flow via time-domain correlation —Experimental verification. IEEE Trans Ultrason Ferroelectr Freq Control. 1990;37(3):176–189. doi: 10.1109/58.55307. [DOI] [PubMed] [Google Scholar]
  • 4.Bohs LN, Friemel BH, McDermott BA, Trahey GE. A real-time system for quantifying and displaying 2-dimensional velocities using ultrasound. Ultrasound Med Biol. 1993;19(9):751–761. doi: 10.1016/0301-5629(93)90092-3. [DOI] [PubMed] [Google Scholar]
  • 5.Ophir J, Cespedes I, Ponnekanti H, Yazdi Y, Li X. Elastography—A quantitative method for imaging the elasticity of biological tissues. Ultrason Imaging. 1991;13(2):111–134. doi: 10.1177/016173469101300201. [DOI] [PubMed] [Google Scholar]
  • 6.O’Donnell M, Skovoroda AR, Shapo BM, Emelianov SY. Internal displacement and strain imaging using ultrasonic speckle tracking. IEEE Trans Ultrason Ferroelectr Freq Control. 1994;41(3):314–325. [Google Scholar]
  • 7.Ng GC, Worrell SS, Freiburger PD, Trahey GE. A comparative-evaluation of several algorithms for phase aberration correction. IEEE Trans Ultrason Ferroelectr Freq Control. 1994;41(5):631–643. [Google Scholar]
  • 8.Sumino Y, Waag RC. Measurements of ultrasonic pulse arrival time differences produced by abdominal-wall specimens. J Acoust Soc Am. 1991;90(6):2924–2930. doi: 10.1121/1.401766. [DOI] [PubMed] [Google Scholar]
  • 9.Flax SW, Odonnell M. Phase-aberration correction using signals from point reflectors and diffuse scatterers—Basic principles. IEEE Trans Ultrason Ferroelectr Freq Control. 1988;35(6):758–767. doi: 10.1109/58.9333. [DOI] [PubMed] [Google Scholar]
  • 10.Li PC, Chen MJ. Strain compounding: A new approach for speckle reduction. IEEE Trans Ultrason Ferroelectr Freq Control. 2002;49(1):39–46. doi: 10.1109/58.981382. [DOI] [PubMed] [Google Scholar]
  • 11.Loupas T, Peterson RB, Gill RW. Experimental evaluation of velocity and power estimation for ultrasound blood-flow imaging, by means of a 2-dimensional autocorrelation approach. IEEE Trans Ultrason Ferroelectr Freq Control. 1995;42(4):689–699. [Google Scholar]
  • 12.Kasai C. Real-time two-dimensional blood-flow imaging using an autocorrelation technique. IEEE Trans Ultrason Ferroelectr Freq Control. 1986;33(1):94. [Google Scholar]
  • 13.Viola F, Walker WF. A comparison of the performance of time-delay estimators in medical ultrasound. IEEE Trans Ultrason Ferroelectr Freq Control. 2003;50(4):392–401. doi: 10.1109/tuffc.2003.1197962. [DOI] [PubMed] [Google Scholar]
  • 14.Langeland S, D’hooge J, Torp H, Bijnens B, Suetens P. Comparison of time-domain displacement estimators for two-dimensional RF tracking. Ultrasound Med Biol. 2003;29(8):1177–1186. doi: 10.1016/s0301-5629(03)00972-4. [DOI] [PubMed] [Google Scholar]
  • 15.Viola F, Walker WF. A spline-based algorithm for continuous time-delay estimation using sampled data. IEEE Trans Ultrason Ferroelectr Freq Control. 2005;52(1):80–93. doi: 10.1109/tuffc.2005.1397352. [DOI] [PubMed] [Google Scholar]
  • 16.Pinton GF, Trahey GE. Continuous delay estimation with polynomial splines. IEEE Trans Ultrason Ferroelectr Freq Control. 2006;53(11):2026–2035. doi: 10.1109/tuffc.2006.143. [DOI] [PubMed] [Google Scholar]
  • 17.Friemel BH, Bohs LN, Trahey GE. Relative performance of two-dimensional speckle-tracking techniques: Normalized correlation, non-normalized correlation and sum-absolute-difference. Proc. IEEE Ultrasonics Symp; 1995; pp. 1481–1484. [Google Scholar]
  • 18.Eder A, Arnold T, Kargel C. Performance evaluation of displacement estimators for real-time ultrasonic strain and blood flow imaging with improved spatial resolution. IEEE Trans Instrum Meas. 2007;56(4):1275–1284. [Google Scholar]
  • 19.Jacovitti G, Scarano G. Discrete-time techniques for time-delay estimation. IEEE Trans Signal Process. 1993;41(2):525–533. [Google Scholar]
  • 20.Fertner A, Sjolund A. Comparison of various time-delay estimation methods by computer-simulation. IEEE Trans Acoust Speech Signal Process. 1986;34(5):1329–1330. [Google Scholar]
  • 21.Walker WF, Trahey GE. A fundamental limit on the performance of correlation-based phase correction and flow estimation techniques. IEEE Trans Ultrason Ferroelectr Freq Control. 1994;41(5):644–654. [Google Scholar]
  • 22.Righetti R, Ophir J, Ktonas P. Axial resolution in elastogra-phy. Ultrasound Med Biol. 2002;28(1):101–113. doi: 10.1016/s0301-5629(01)00495-1. [DOI] [PubMed] [Google Scholar]
  • 23.Walker WF, Trahey GE. A fundamental limit on delay estimation using partially correlated speckle signals. IEEE Trans Ultrason Ferroelectr Freq Control. 1995;42(2):301–308. [Google Scholar]
  • 24.Varghese T, Ophir J. A theoretical framework for performance characterization of elastography: The strain filter. IEEE Trans Ultrason Ferroelectr Freq Control. 1997;44(1):164–172. doi: 10.1109/58.585212. [DOI] [PubMed] [Google Scholar]
  • 25.Jiang J, Hall TJ. A parallelizable real-time motion tracking algorithm with applications to ultrasonic strain imaging. Phys Med Biol. 2007;52(13):3773–3790. doi: 10.1088/0031-9155/52/13/008. [DOI] [PubMed] [Google Scholar]
  • 26.Chen LJ, Treece GM, Lindop JE, Gee AH, Prager RW. A quality-guided displacement tracking algorithm for ultrasonic elasticity imaging. Med Image Anal. 2009;13(2):286–296. doi: 10.1016/j.media.2008.10.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhu YN, Hall TJ. A modified block matching method for real-time freehand strain imaging. Ultrason Imaging. 2002;24(3):161–176. doi: 10.1177/016173460202400303. [DOI] [PubMed] [Google Scholar]
  • 28.D’hooge J, Bijnens B, Thoen J, Van de Werf F, Sutherland GR, Suetens P. Echocardiographic strain and strain-rate imaging: A new tool to study regional myocardial function. IEEE Trans Med Imaging. 2002;21(9):1022–1030. doi: 10.1109/TMI.2002.804440. [DOI] [PubMed] [Google Scholar]
  • 29.de Korte CL, Pasterkamp G, van der Steen AFW, Woutman HA, Bom N. Characterization of plaque components with intravascular ultrasound elastography in human femoral and coronary arteries in vitro. Circulation. 2000;102(6):617–623. doi: 10.1161/01.cir.102.6.617. [DOI] [PubMed] [Google Scholar]
  • 30.Konofagou EE, D’hooge J, Ophir J. Myocardial elastography—A feasibility study in vivo. Ultrasound Med Biol. 2002;28(4):475–482. doi: 10.1016/s0301-5629(02)00488-x. [DOI] [PubMed] [Google Scholar]
  • 31.Lewis JP. Fast template matching. Vision Interface, Canadian Image Processing and Pattern Recognition Society; Quebec City Canada. May 15–19, 1995; pp. 120–123. [Google Scholar]
  • 32.Chang MC, Fuh CS, Chen HY. Fast search algorithms for industrial inspection. Int J Pattern Recognit Artif Intell. 2001;15(4):675–690. [Google Scholar]
  • 33.Briechle K, Hanebeck UW. Template matching using fast normalized cross correlation. Proc SPIE. 2001;4387:95–102. [Google Scholar]
  • 34.Hii AJH, Hann CE, Chase JG, Van Houten EEW. Fast normalized cross correlation for motion tracking using basis functions. Comput Methods Programs Biomed. 2006;82(2):144–156. doi: 10.1016/j.cmpb.2006.02.007. [DOI] [PubMed] [Google Scholar]
  • 35.Tsai DM, Lin CT. Fast normalized cross correlation for defect detection. Pattern Recognit Lett. 2003;24(15):2625–2631. [Google Scholar]
  • 36.Konofagou E, Ophir J. A new elastographic method for estimation and imaging of lateral displacements, lateral strains, corrected axial strains and Poisson’s ratios in tissues. Ultrasound Med Biol. 1998;24(8):1183–1199. doi: 10.1016/s0301-5629(98)00109-4. [DOI] [PubMed] [Google Scholar]
  • 37.Luo J, Lee W-N, Wan S, Konofagou EE. Pulse wave imaging of human abdominal aortas in vivo. Proc. IEEE Ultrasonics Symp; 2008; pp. 859–862. [Google Scholar]
  • 38.Vappou J, Luo J, Konofagou EE. Pulse wave imaging for noninvasive and quantitative measurement of arterial stiffness in vivo. Am J Hypertens. 2010;23(4):393–398. doi: 10.1038/ajh.2009.272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wang S, Lee WN, Provost J, Luo J, Konofagou EE. A composite high-frame-rate system for clinical cardiovascular imaging. IEEE Trans Ultrason Ferroelectr Freq Control. 2008;55(10):2221–2233. doi: 10.1109/TUFFC.921. [DOI] [PubMed] [Google Scholar]
  • 40.Luo J, Fujikura K, Tyrie LS, Tilson MD, Konofagou EE. Pulse wave imaging of normal and aneurysmal abdominal aortas in vivo. IEEE Trans Med Imaging. 2009;28(4):477–486. doi: 10.1109/TMI.2008.928179. [DOI] [PubMed] [Google Scholar]
  • 41.Céspedes I, Huang Y, Ophir J, Spratt S. Methods for estimation of subsample time delays of digitized echo signals. Ultrason Imaging. 1995;17(2):142–171. doi: 10.1177/016173469501700204. [DOI] [PubMed] [Google Scholar]
  • 42.Srinivasan S, Righetti R, Ophir J. Trade-offs between the axial resolution and the signal-to-noise ratio in elastography. Ultrasound Med Biol. 2003;29(6):847–866. doi: 10.1016/s0301-5629(03)00037-1. [DOI] [PubMed] [Google Scholar]
  • 43.Chen X, Xie H, Erkamp R, Kim K, Jia C, Rubin JM, O’Donnell M. 3-D correlation-based speckle tracking. Ultrason Imaging. 2005;27(1):21–36. doi: 10.1177/016173460502700102. [DOI] [PubMed] [Google Scholar]
  • 44.McDonnell MJ. Box-filtering techniques. Comput Graph Image Process. 1981;17(1):65–70. [Google Scholar]
  • 45.Sun C. A fast stereo matching method. Digital Image Computing: Techniques and Applications; 1997; pp. 95–100. [Google Scholar]
  • 46.Sun CM. Fast stereo matching using rectangular subregioning and 3D maximum-surface techniques. Int J Comput Vis. 2002;47(1–3):99–117. [Google Scholar]
  • 47.Sadykhov RK, Lamovsky DV. Fast cross correlation algorithm for optical flow estimation. Proc. IEEE Signal Processing Symp; 2006.pp. 322–325. [Google Scholar]
  • 48.Di Stefano L, Marchionni M, Mattoccia S. A fast area-based stereo matching algorithm. Image Vis Comput. 2004;22(12):983–1005. [Google Scholar]
  • 49.Kanade T, Yoshida A, Oda K, Kano H, Tanaka M. A stereo machine for video-rate dense depth mapping and its new applications. Proc. IEEE Comp. Vision and Pattern Recognition Conf; 1996; pp. 192–202. [Google Scholar]
  • 50.Muhlmann K, Maier D, Hesser J, Manner R. Calculating dense disparity maps from color stereo images, an efficient implementation. Int J Comput Vis. 2002;47(1–3):79–88. [Google Scholar]
  • 51.Wu QX, McNeill SJ, Pairman D. Correlation and relaxation labelling: An experimental investigation on fast algorithms. Int J Remote Sens. 1997;18(3):651–662. [Google Scholar]
  • 52.Li Y, Garson CD, Xu Y, Beyers RJ, Epstein FH, French BA, Hossack JA. Quantification and MRI validation of regional contractile dysfunction in mice post myocardial infarction using high resolution ultrasound. Ultrasound Med Biol. 2007;33:894–904. doi: 10.1016/j.ultrasmedbio.2006.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Tran TH, Cho HM, Cho SB. Performance enhancement of motion estimation using SSE2 technology. Proc World Acad Sci Eng Technol. 2008;30:168–171. [Google Scholar]
  • 54.Shahbahrami A, Juurlink B, Vassiliadis S. Limitations of special-purpose instructions for similarity measurements in media SIMD extensions. Proc. Int. Conf. Compilers, Architecture and Synthesis for Embedded Systems; 2006; pp. 293–303. [Google Scholar]

RESOURCES