Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 May 10.
Published in final edited form as: J Comput Phys. 2011 May 10;230(10):3656–3667. doi: 10.1016/j.jcp.2011.02.016

Reduced-Rank Approximations to the Far-Field Transform in the Gridded Fast Multipole Method

Andrew J Hesford a,*, Robert C Waag a,b
PMCID: PMC3086302  NIHMSID: NIHMS284101  PMID: 21552350

Abstract

The fast multipole method (FMM) has been shown to have a reduced computational dependence on the size of finest-level groups of elements when the elements are positioned on a regular grid and FFT convolution is used to represent neighboring interactions. However, transformations between plane-wave expansions used for FMM interactions and pressure distributions used for neighboring interactions remain significant contributors to the cost of FMM computations when finest-level groups are large. The transformation operators, which are forward and inverse Fourier transforms with the wave space confined to the unit sphere, are smooth and well approximated using reduced-rank decompositions that further reduce the computational dependence of the FMM on finest-level group size. The adaptive cross approximation (ACA) is selected to represent the forward and adjoint far-field transformation operators required by the FMM. However, the actual error of the ACA is found to be greater than that predicted using traditional estimates, and the ACA generally performs worse than the approximation resulting from a truncated singular-value decomposition (SVD). To overcome these issues while avoiding the cost of a full-scale SVD, the ACA is employed with more stringent accuracy demands and recompressed using a reduced, truncated SVD. The results show a greatly reduced approximation error that performs comparably to the full-scale truncated SVD without degrading the asymptotic computational efficiency associated with ACA matrix assembly.

Keywords: Fast solvers, iterative methods, acoustic scattering, fast multipole method, adaptive cross approximation, moment methods

1. Introduction

Fast scattering solutions have a broad range of applications. Estimation and correction of aberration requires simulation of propagation of ultrasound pulses through aberrating media [13]. Iterative inverse scattering algorithms such as the distorted Born iterative method [47] and the eigenfunction method [810] require repeated computation of the forward scattering problem for successive guesses of a reconstructed medium.

The fast multipole method (FMM) [1114] satisfies these needs by providing efficient solutions of integral equations that describe electromagnetic or acoustic scattering in two or three dimensions. The FMM recursively subdivides scattering media into groups of elements and computes the product of a test vector and the Green’s function matrix in a time that is O(N) when the medium is composed of N volume-filling elements. Redundant information in the FMM is discarded by representing interactions using band-limited wave expansions.

Representation of scattering media as sets of regular, gridded arrangements of elements provides a number of advantages. In inverse scattering applications, the scattering structure is not known a priori, rendering more sophisticated geometric meshes useless. Furthermore, scattering models developed with other techniques, such as magnetic resonance or b-scan imaging, are often inherently defined on a regular grid of voxels. Regular grids also allow the FMM to take advantage of FFT convolution to represent near-field interactions [15]. This technique, which exploits concepts originally developed for variants of the conjugate gradient-fast Fourier transform (CG-FFT) method [1619], reduces the dependence of the FMM on the size of groups of scattering elements at the finest level. This is an advantageous hybrid because the FMM offers better asymptotic scaling than CG-FFT methods. Furthermore, unlike CG-FFT methods, the FMM can handle scattering by sparse arrangements of scatterers without the need to mesh the entire volume enclosing the scatterers.

Even with FFT convolution of near-field interactions, the FMM still requires transformations between the plane-wave expansions used in far-field interactions and the pressure distributions used for near-field interactions. When finest-level groups have edge lengths that approach a wavelength and contain hundreds or thousands of scattering elements, these transformations become the costliest portion of the FMM. Reducing the complexity of applying these transformations further relaxes the restriction on the size of finest-level FMM groups and provides additional opportunities to optimize the forward solution.

The adaptive cross approximation (ACA) [2022] was developed to efficiently compute approximate, reduced-rank decompositions of matrices representing integral equations with smooth kernels. The method is purely algebraic and does not require knowledge of the full matrix to construct the approximation. The ACA has been applied to scattering problems by applying a recursive grouping of the scattering region in the same fashion as FMM and using ACA matrices to represent interactions between well-separated groups at the coarsest possible subdivision. Near-field interactions are represented, as in the fast multipole method, using dense matrices. However, because the ACA is purely algebraic and fails to directly exploit the integral scattering equation and its Green’s function, approximate interaction matrices for mid-frequency scattering often have excessively high rank. The FMM, in contrast, employs an explicit decomposition of the Green’s function to better control the bandwidth of far-field expansions, and hence the rank of the scattering operator, to reduce the amount of computation required for a certain accuracy requirement. In addition, the cost of building a hierarchical ACA scattering matrix scales with the square of the number of elements in the worse case [21]. Nevertheless, when used in concert with the FMM, the ACA can be an important part of efficient forward solution methods.

Previous studies have illustrated the efficiency of the FMM when combined with regular arrangements of scattering elements and FFT convolution of near-field interactions [15, 23]. That work is extended in this paper by employing the adaptive cross approximation to represent the far-field transformations that convert between far-field plane-wave expansions and local pressure distributions in the finest-level groups of the FMM. Because these operations are essentially Fourier transforms restricted to the unit sphere in wave space, the transformations are smooth and well approximated by the ACA. The effective rank of the approximate far-field transform for an FMM group scales with the bandwidth of the plane-wave expansions associated with the group, which in turn scales approximately with the radius of a sphere enclosing the group. In contrast, the number of samples of the local pressure distribution scales with the volume of the group for a fixed sampling density. As a result, as the number of samples of local pressure distributions within finest-level groups are increased, low-rank approximations to far-field transforms become increasingly compressed. However, trials of the ACA show that it underestimates approximation error when using traditional convergence tests. To overcome this limitation, the far-field transformation matrices are approximated to lower error tolerances with ACA and then “recompressed” [24] using a reduced singular-value decomposition that provides error and rank characteristics comparable to the optimum reduced-rank approximation provided by truncated singular-value decomposition.

2. Theory

The scattering formulation employed herein is identical to that developed in Ref. [15], but is summarized below for convenient reference. Of interest is the solution of three-dimensional scattering of acoustic waves by a bounded, penetrable, inhomogeneous scattering region with arbitrarily variable sound speed c(r), attenuation α (r), and density ρ (r), where r is a three-dimensional coordinate vector. The region is embedded in an infinite, homogeneous background medium and is characterized by a complex wave number [10, 25, 26]

k(r)=ωc(r)+iα(r), (1)

where ω is the radian frequency of the time-harmonic pressure fields. A time dependence of e−iωt is assumed and suppressed to yield the acoustic wave equation [27]

ρ(r)·[ρ1(r)p(r)]+k2(r)p(r)=S(r), (2)

in which p(r) is the total acoustic pressure and S is an arbitrary acoustic source distribution.

To simplify Eq. (2), the change of variable p(r)=ρn1/2(r)f(r) is made, where ρn = ρ/ρ0 is the ratio of the variable density to the homogeneous background density ρ0. Under this change of variable, Eq. (2) becomes the Helmholtz equation for the pseudo-pressure f(r) [28, 29]:

2f(r)+k02f(r)=ρn1/2(r)S(r)O(r)f(r). (3)

The object contrast or scattering potential is

O(r)=k2(r)k02ρn1/2(r)2ρn1/2(r). (4)

The wave number k0 corresponds to the homogeneous background with sound speed c0 and attenuation α0.

When ρ is continuous, the Helmholtz equation (3) is readily inverted by using the homogeneous Green’s function

g0(r,r)=eik0rr4πrr (5)

to yield an integral equation of scattering

f(r)Ddrg0(r,r)O(r)f(r)=fi(r), (6)

in which the incident field fi(r) satisfies the contrast-free Helmholtz equation

[2+k02]fi(r)=ρn1/2(r)S(r). (7)

The integral expression in Eq. (6) is the scattered pseudo-pressure and is equal to the scattered pressure wherever ρn = 1.

It is important to note that, when ρn is discontinuous, solutions to Eq. (6) may correspond with acoustic pressures that are discontinuous or have a discontinuous derivative. This violates the physical boundary conditions governing wave propagation. These boundary conditions require that the pressure and its normal derivative remain continuous across media interfaces. However, approximations of the term ρn1/22ρn1/2 appearing in Eq. (4) using finite differences on rectilinear grids with nonzero spacing inherently approximate discontinuous density profiles as C1-continuous functions. Therefore, the underlying issue is whether discontinuous density variations can be suitably approximated by continuous density variations. For problems in which the approximation is unsuitable, it may be necessary to modify the scattering formulation to correctly model the behavior as done in, e.g., Ref. [30].

3. Methods

For numerical solution of Eq. (6), the scattering domain D is subdivided into a collection {cj: 0 ≤ j < N } of N disjoint, cubic cells. Each cell cj has a volume Δj and is associated with a constant contrast value Oj and a constant field value fj such that

O(r)=j=0N1Ojχj(r), (8a)
f(r)=j=0N1fjχj(r), (8b)

where χj is an arbitrary basis function with support cj. As in Ref. [15], χj is here taken to be the characteristic function, or pulse basis function, of cj. Together with Galerkin testing, the expansions in Eq. (8) allow representation of Eq. (6) as a discrete matrix equation

fGf=fi, (9)

where the column vectors f = [Δjfj], = [Ojfj]; the column vector fi=[fji] contains entries

fji=Ddrχj(r)fi(r) (10)

and the Green’s matrix G = [Gij] contains entries

Gij=Ddrχi(r)Ddrg0(r,r)χj(r). (11)

Iterative solution methods such as the generalized minimal residual method (GMRES) [31] or the stabilized biconjugate gradient method (BiCG-STAB) [32] may be employed in concert with the fast multipole method to solve Eq. (9) efficiently without the need to explicitly construct the Green’s matrix (11) [13]. Such solutions are known to proceed in O(N) time and with O(N) storage.

3.1. The Fast Multipole Method

The FMM and its implementation are discussed in detail in Ref. [13]. At the heart of the method, inter-actions between source elements and target elements are represented as the composition of five operations:

  1. Fields radiated by sources are converted by a far-field transform into outgoing wave expansions.

  2. Neighboring sources are recursively grouped and their expansions are aggregated into outgoing expansions of successively higher bandwidth.

  3. Outgoing expansions centered on a source group are translated into incoming wave expansions centered on a target group.

  4. Incoming expansions are recursively distributed to target elements within groups.

  5. Incoming expansions are converted by an adjoint far-field transform into fields observed at each target element.

In the Fourier domain, the outgoing and incoming wave expansions are plane-wave distributions. The aggregation, distribution and translation operators for such expansions are diagonal and efficiently applied. The increasing bandwidth of successively aggregated outgoing expansions is offset by a reduction in the number of field expansions that must be computed. Likewise, the increasing number of successively distributed incoming expansions is offset by the decreasing bandwidth of each expansion.

Translation of outgoing wave expansions to incoming expansions requires that the source and target group be sufficiently separated. Therefore, application of the FMM to Eq. (9) results in an expression of the form

f[GF+GN]f=fi, (12)

where GF represents far-field interactions that may be computed using the FMM and GN represents near-field interactions that are computed directly using Eq. (11) among neighboring finest-level groups. In typical FMM implementations, the finest-level groups are kept small to reduce the impact of high-cost near-field interactions. However, as described in Ref. [15], FFT convolution may be employed when scattering elements are defined on a regular grid to reduce significantly the cost of near-field interactions. In such cases, it is the cost of evaluating far-field transforms (and their adjoints) of each finest-level group that dominates the cost of the FMM for large group sizes.

Far-field transforms represent outgoing plane-wave expansions for an FMM group as a function of the local pressure variation defined using basis functions within the group; adjoint far-field transforms represent an induced local pressure variation on testing functions within an FMM group as a function of incoming plane-wave expansions from radiation sources far from the group. For the described scattering problem, an arbitrary basis or testing function χj is associated with corresponding outgoing or incoming plane-wave expansion given, respectively, by [13]

Fj(cJ,s^)=Ddreik0s^·(cJr)χj(r), (13a)
Rj(cJ,s^)=Ddreik0s^·(rcJ)χj(r). (13b)

The outgoing expansion (13a) describes the far-field radiation pattern, in a direction ŝ relative to an arbitrary center cJ, of a unit pseudo-pressure applied to cell cj. In a similar fashion, the incoming expansion (13b) performs a “focusing” onto the cell cj of a plane wave incident from a direction ŝ relative to an arbitrary center cJ.

Consider a group J = {cj: 0 ≤ j < M} of M scattering elements with center cJ . Each cell cj has an associated contrast coefficient Oj and a pseudo-pressure coefficient fj. The outgoing plane-wave expansion βJ of the group is given by the far-field transform

βJ(s^)=k0j=0MFj(cJ,s^)Ojfj. (14a)

The function βJ describes the complex amplitudes of plane waves traveling outward in directions ŝ from cJ . Conversely, a set of plane waves converging on the group J from directions ŝ relative to cJ and with a complex amplitude profile γJ induces a pressure on cell cj described by the adjoint far-field transform

fj=Ωds^Rj(cJ,s^)γJ(s^), (14b)

in which Ω is the unit sphere. When the plane-wave coefficients βJ and γJ are sampled at discrete locations on the unit sphere, the integral in Eq. (14b) becomes a sum and the transforms (14) may be represented as matrix equations. The far-field transform is represented by a matrix, F, whose entries Fij = k0Fj(cJ, ŝi). The adjoint far-field transform is represented by a matrix, R, whose entries Rij=Ri(cJ,s^j)=Fji. Storage and multiplication of these matrices must be made more efficient when large FMM groups are desired.

3.2. Far-Field Transforms and the Adaptive Cross Approximation

The excess bandwidth formula [13] prescribes an estimate of the harmonic bandwidth of the far-field pattern of fields radiated by groups of sources. If the group of sources is confined to a sphere with radius a in wavelengths, then the bandwidth L of the field radiated by the group is

Lka+1.8d02/3(ka)1/3, (15)

in which d0 = − log ε for a desired tolerance ε. Given an approximate bandwidth L, representations of plane-wave expansions of outgoing and incoming fields should be sampled at O(L) polar locations and O(L) azimuthal locations. In a cubic, gridded arrangement of N cubic scatterers, a = O(N1/3), so the total number of samples of plane-wave expansions is O(N2/3). Thus, the far-field transform (14a) may be represented by a matrix F ∈ ℂm×n, where m = O(N2/3) and n = O(N). Similarly, the adjoint far-field transform (14b) may be represented as a matrix R = F n×m, where (·) represents the conjugate transpose of a matrix. Because the bandwidth of plane-wave expansions is approximately limited according to Eq. (15), reduced-rank approximate decompositions of the form

FUV, (16)

with U ∈ ℂm×k and V ∈ ℂn×k, have a rank k = O(L) = O(N1/3). If such decompositions exist, the cost of applying a far-field transform or its adjoint will be reduced from O(N5/3) to O(N4/3).

The ACA [2022] provides a method for computing an approximation of the form in Eq. (16). The ACA is an algebraic method that does not require full knowledge of the matrix being approximated and does not require replacement of the kernel of the underlying integral operator, as is done with the FMM. Conceptually, the ACA works by alternatively constructing columns to populate the column matrix U and the row matrix V . Columns for U and V are selected as the most significant of the remaining columns and rows, respectively, of the matrix F to be approximated; for this purpose, “most significant” means the column or row of F that contributes the largest element of the column of U or V that is currently being analyzed, neglecting elements contributed by previously analyzed columns or rows.

Algorithm 1 describes the ACA procedure for computing the columns uk of the column matrix U and the columns vk of the row matrix V . In the listing, Fik,: represents the ik-th row of F, while F:,jk jk represents the jk-th column of F. The ACA requires computation time that is O(k2[m + n]) if the maximum rank of the decomposition is k. As implemented, the algorithm approximates the matrix F as the product F = UVT. However, to maintain agreement with Eq. (16), the conjugate of the matrix V should be considered, such that F = UV. The convergence criterion in Line 9 of Algorithm 1 is based on an estimate of the approximation error [21]. The choice of 3 should reflect the fact that the actual approximation error, which is not revealed by the ACA algorithm, may be larger than that suggested by the convergence criterion.

Algorithm 1.

The Adaptive Cross Approximation

1: i1 = 1
2: ||F (0) || = 0
3: for k = 1 to min {m, n} do
4:
vk=Fik,:l=1k1(ul)ikvl
5: jk = argmaxjj1,…,jk−1|(vk)j|
6: vk = vk/(vk)jk
7:
uk=F:,jkl=1k1(vl)jkul
8:
||F(k)||2=||F(k1)||2+||uk||2||vk||2+2l=1k1ulT·uk·vlT·vk
9: if ||uk||·||vk|| ≤ ε || F(k)|| then
10:   break
11: end if
12: ik+1 = argmaxii1,…,ik|(uk)i|
13: end for

3.3. Recompression of Adaptive Cross Matrices

The SVD of an arbitrary matrix F ∈ ℂm×n is given by

F=UV, (17)

where U ∈ ℂm×m and V ∈ ℂn×n are unitary matrices and Σ ∈ ℝm×n is a diagonal matrix. The diagonal elements of Σ are called the singular values of the matrix F. By convention and to ensure uniqueness, the diagonal entries of Σ are generally assumed to be non-negative and arranged in order of decreasing magnitude. The truncated SVD is an approximation of rank k to the matrix F given by

FFk=UV, (18)

in which Ũ ∈ ℂm×k and ∈ ℂn×k are unitary matrices consisting, respectively, of the first k columns of U and V, and Σ̃ ∈ ℝk×k is a diagonal matrix whose entries are the largest k singular values of F . Of all matrices of rank k, the truncated SVD satisfies

Fk=minM{||FM||F:rankM=k}, (19)

where ||·|| is the Frobenius norm. Thus, adaptive cross approximations to far-field transform matrices will generally have greater error, in the sense of the Frobenius norm, than truncated singular-value decompositions of the transforms. Despite the lower error of truncated SVD approximations, the cubic scaling of the cost of computing an SVD is much worse than the O(k2[m + n]) cost associated with computing a rank-k adaptive cross approximation to an m × n when k ≪ min{m, n}.

Recompression of ACA factorizations was first developed to eliminate redundant information by orthogonalizing the non-orthogonal row and column matrices resulting from the ACA [24]. If Fk = UV is an adaptive cross approximation of rank k to a matrix F ∈ ℂm×n, the row and column matrices can each be orthogonalized using QR decompositions:

U=QuRu, (20a)
V=QvRv. (20b)

If the SVD of the k × k matrix product RuRv is given by

RuRv=U^^V^, (21)

then the matrix Fk is given by the product

Fk=QuU^^[QvV^], (22)

which is an SVD of Fk because the matrices QuÛ and Qv are orthogonal. This SVD is less costly to compute than a regular SVD of the matrix Fk because the k × k matrix RuRv has fewer elements than the m × n matrix Fk when k < min{m, n}.

The redundancy resulting from the non-orthogonality of the row and column matrices computed using the ACA means that the error resulting in an approximation of fixed rank tends to have an error that is higher than estimated. Conversely, for a fixed approximation error, the ACA tends to overestimate the required rank of the decomposition. If an ACA factorization Fk of rank k is known to approximate a matrix F such that

||FFk||F<εA (23)

for some error tolerance εA, the reduced SVD in Eq. (21) may be truncated to k′ < k such that

RuRvU^k^kV^k<εS, (24)

where Ûk, Σ̂k and k represent, respectively, the truncation of Û, Σ̂ and to rank k′ as described in Eq. (18). By the triangle inequality and the invariance of the Frobenius norm under transformation by unitary matrices, the truncated approximation satisfies

FQuU^k^k[QvV^k]<εA+εS. (25)

Therefore, the rank of an adaptive cross approximation to a far-field transform can be reduced while ensuring a desired error tolerance ε by computing an initial approximation to a tolerance εA < ε and then truncating a reduced SVD of the approximation to a tolerance εS suitably less than ε (e.g., εS < ε -εA). This technique avoids the higher cost of computing a full SVD of the far-field transform while providing a rank and approximation error comparable to that of the truncated SVD.

4. Numerical Results

Numerical studies were conducted on a scattering geometry designed to emulate a tissue-mimicking lab- oratory phantom. The geometry consisted of twelve scattering spheres embedded in a contrasting reference sphere of radius 24 mm with a sound speed of 1570 m/s, an absorption coefficient of 0.3 dB/(cm MHz), and a density of 970 kg/m3. This reference sphere was immersed in an infinite, lossless background (water) with a sound speed of 1509 m/s and a density of 997 kg/m3. The interior spheres were each assigned properties to mimic one of the three human tissue types listed in Table 1. The tissue assignments and positions of the interior spheres, relative to the center of the reference sphere, are listed in Table 2. A diagram depicting the sphere arrangement is shown in Fig. 1.

Table 1.

Material properties of spheres designed to mimic human tissue.

Tissue Sound speed m/s Absorption dB/(cm MHz) Density kg/m3
Water 1509.0 0.00 997.0
Reference 1570.0 0.30 970.0
Fat 1478.0 0.52 950.0
Muscle 1547.0 0.91 1050.0
Skin 1613.0 1.61 1120.0

Table 2.

Characteristics of the spheres in the tissue-mimicking phantom.

Radius (mm) Center (mm) Tissue
x y z
4.0 0.0 0.0 0.0 Fat
5.0 14.0 2.0 4.0 Skin
5.0 5.0 −10.0 −4.0 Fat
3.0 17.0 −7.0 0.0 Fat
7.8 −10.0 10.0 7.2 Muscle
7.8 5.0 12.0 −7.2 Muscle
5.0 14.0 12.0 3.0 Muscle
5.0 −5.0 −18.0 −3.0 Skin
2.5 7.5 −18.0 −2.0 Skin
1.5 −4.0 20.0 0.0 Skin
2.5 −18.0 4.0 2.0 Skin
9.1 −12.5 −5.0 −5.2 Muscle

Figure 1.

Figure 1

An arrangement of twelve tissue-mimicking spheres used for numerical experiments. The enclosing reference sphere is not shown. The spheres are colored according to their composition as listed in Table 2. Red, green and blue represent skin, muscle, and fat, respectively.

For numerical modeling, the phantom is represented on a cubic domain whose center coincides with the center of the reference sphere and that has edge length 16λ, where λ is the wavelength in water. The domain is subdivided into elements with edge length 0.1λ, resulting in a 160 × 160 × 160 computational grid. The FMM hierarchy is arranged so that each finest-level group contains the same number of scattering elements placed in the same relative cubic arrangement. A single far-field transform and its adjoint can thus be reused for all groups. When the number of elements in a single group does not divide the total number of elements, the domain is extended so that all groups are completely filled.

4.1. Error and Rank of the Approximate Far-Field Transform

Studies of the error and rank of approximations the far-field transform depend only on two aspects of the scattering problem: the number of scattering elements in a finest-level group (which may be unique in each dimension) and the acoustic size of each element. Because the previously described phantom is modeled by cubic elements with an edge length of 0.1λ, the behavior of far-field transforms for groups of cells of this size is of particular interest. However, forward solutions are commonly modeled using unknowns with edge lengths between 0.1λ and 0.2λ. Therefore, studies of far-field transforms involving cells with edge lengths of 0.2λ are also presented as a worst-case scenario. As in the phantom model, the FMM groups are assumed to be cubic so that the total number of elements completely determines the arrangement of elements within the group.

Based on the excess bandwidth formula, the approximate rank of the far-field transform is expected to scale as O(N1/3), where N is the total number of elements in a group. The ranks of approximations for various error tolerances produced by the ACA, the recompressed adaptive cross approximation (RACA), and truncated SVD are shown in Fig. 2 as a function of the number of elements in a finest-level group. The O(N1/3) scaling of the approximate rank was observed for all desired tolerances when the edge length of the scattering elements was fixed at 0.1λ. With element edge lengths of 0.2λ, the rank appeared to scale worse than O(N1/3), but better than the O(N2/3) scaling describing the number of samples of plane-wave expansions. In this case, the second term of Eq. (15) was a significant contributor to the bandwidth for the group sizes under investigation. The bandwidth, and hence the approximate rank, is expected to grow like O(N1/3) for group sizes larger than those investigated.

Figure 2.

Figure 2

Comparison of the predicted rank for approximations to the far-field transform using the ACA, the RACA, and truncated SVD for groups containing a variable number of cells of edge length 0.1λ or 0.2λ. The desired tolerance of the approximations is shown along the right axis. For the 0.1λ edge length, the RACA and truncated SVD ranks coincided exactly. RACA was not investigated for edge lengths of 0.2λ.

The tolerances reported in Fig. 2 were only estimates of the true error (in the sense of the Frobenius norm) of approximate far-field transforms. For ACA, the reported tolerance was the convergence parameter ε. For the truncated SVD, the rank was selected to eliminate all singular values less than ε. In the RACA, the ACA was performed with a convergence parameter 0.01ε, and the reduced SVD was truncated with a tolerance ε. The actual error of reduced-rank approximations to the far-field transform, with the rank determined as in Fig. 2, is shown in Fig. 3 as a function of the total number of unknowns in the finest-level group. Whether the edge length of the scattering elements was 0.1λ or 0.2λ, the truncated SVD resulted in an actual error very close to the desired error. For elements with edge lengths of 0.1λ, the RACA yielded an error very close to that for a truncated SVD; the ACA error was between 1 and 1.5 orders of magnitude higher. When the element edge length was increased to 0.2λ, the error associated with the ACA increased further. For these larger elements, the RACA is expected to perform similarly to truncated SVD provided that the initial ACA decomposition is performed with sufficiently low error. An increase in element size results in, ceteris paribus, an increased actual error for an adaptive-cross approximation. Therefore, the previously mentioned convergence parameter 0.01ε for the initial ACA might need to be further reduced to obtain a desired RACA tolerance of ε. The determination of optimum convergence parameters for larger element sizes will require further investigation.

Figure 3.

Figure 3

Actual error, in the sense of the Frobenius norm, between the actual far-field transform and the rank-limited approximations produced by the ACA, the RACA, and truncated SVD for cubic groups with a variable number of cells of edge length 0.1λ or 0.2λ.

4.2. Error in the Scattering Operator

Studies of error in the entire scattering operator contributed by approximate far-field transforms are problem dependent, but provide insight into the effect of the approximate transforms in a realistic setting. An ideal, but impractical, study would compare numerical solutions obtained with approximate scattering operators to known, analytical solutions. One important limitation of such an analysis is the lack of exact solutions for geometries more complicated than very basic structures. Another important limitation is that, even if the exact solution were known, the iterative solution process could compound the error inherent in the scattering operator in complicated ways. For the purposes of this study, the measure of error was derived from the normalized residual. For a system of equations Ax = b, the normalized residual associated

with an approximate solution is

rA=bAx||b||. (26)

Because the scattering operators under consideration are distinguished by the method used to compute far-field transforms and the number of elements in a finest-level FMM group, residuals will here be denoted rT,M, where T is one of “full”, “ACA”, or “RACA” to denote the full-rank, ACA, or recompressed ACA representation of the far-field transforms, respectively. Truncated SVD representations of the far-field transforms were not investigated because truncated SVD and RACA performed similarly. The index M represents the number of elements per dimension per finest-level group, with the total number of elements in the group being N = M3.

The first measure of the error contributed by reduced-rank approximations to full scattering operators is shown in Fig. 4 as the difference ||rT,M - rfull,M|| as a function of the size N = M3 of the finest-level groups. The ACA was used to approximate far-field transforms with a convergence parameter ε = 10−4 or ε = 10−6. The RACA, with a tolerance of 10−4, was obtained from an initial ACA with a convergence parameter ε = 10−6 and a reduced, truncated SVD such that all singular values less than 10−4 were suppressed. In general, the errors plotted in Fig. 4 follow the same trends as the errors plotted in Fig. 3. The approximate solution was the same in all cases, and was chosen so that ||rfull,20|| < 10−6. Thus, the approximate solution can be assumed to be in the neighborhood of the actual solution.

Figure 4.

Figure 4

Residual error of the scattering operator using ACA or RACA for far-field transforms for groups containing a variable number of elements, relative to the residual computed using full-rank far-field transforms.

An additional measure of error is shown in Fig. 5 as the norm ||rT,Mrfull,30|| as a function of the size N = M3 of finest-level groups. The residual rfull,30 was chosen as a reference because the solution accuracy should be limited by the accuracy of integration in the far-field transform. By minimizing the number and influence of FMM interactions, rfull,30 should be closest to the true residual for the examined range of group sizes. As the figure shows, both the RACA with a tolerance of 10−4 and the ACA with a convergence parameter ε = 10−6 very closely follow the behavior of scattering operators with full-rank solutions. For the ACA with a convergence parameter ε = 10−4, error in the approximate far-field transform limits the accuracy of the solution.

Figure 5.

Figure 5

Residual error of the scattering operator using ACA, RACA and full-rank far-field transforms for groups with a variable number of elements with edge length 0.1λ, relative to the residual of the scattering operating using full far-field transforms with finest-level groups containing 30 × 30 × 30 elements.

4.3. Computational Complexity

Perhaps the primary motivation for selecting reduced-rank approximations to far-field transforms in the FMM is the availability of acceptably accurate solutions in less time, rather than the availability of highly accurate solutions. Thus, the time required to compute a matrix-vector product (MVP) using the FMM with approximate far-field transforms is of great interest. The CPU time required to compute an MVP using both ACA and full-rank far-field matrices is shown in Fig. 6. Also shown are the CPU times required for evaluations of forward and adjoint far-field transforms only and the CPU time required for evaluations of near-field interactions only. The time required to compute near-field interactions is the same whether full-rank or approximate far-field transforms are used.

Figure 6.

Figure 6

Computing time for a single matrix-vector product (MVP) using FMM with a reduced-rank, approximate far-field transform using ACA with a tolerance of 10−4, compared to the computing time for a single MVP when a full, dense far-field transform is employed. Also shown is the time for evaluating only forward and adjoint far-field transforms using full matrices or the ACA. The time for computing near-field interactions (NN) is identical in both implementations.

As noted above, the scattering domain is extended when the 160 × 160 × 160 grid is not uniformly divided by the size of finest-level FMM groups. Therefore, the number of unknowns in the problem is not constant, and the times required to compute matrix-vector products are influenced by this variation. The times reported in Fig. 6 have been normalized through division by the actual number of scattering elements and multiplication by 4, 096, 000. This eliminates from consideration the overall linear scaling of the FMM.

The cost of evaluating a single, full-rank far-field transform of N elements is O(N5/3), while the cost of evaluating a reduced-rank transform is O(N4/3). However, when the overall size of the scattering problem is fixed, the total number of groups varies like O(1/N). Therefore, the total cost for evaluating full-rank far-field transforms is expected to scale like O(N2/3), while the total cost for evaluating reduced-rank transforms scales like O(N1/3). This behavior is confirmed by the plots in Fig. 6. Furthermore, the reduced computational dependence of the FMM of finest-level group size shifted the optimum group size from 5 elements per group per dimension to 10 elements per group per dimension and reduced the MVP time by approximately 33%.

5. Conclusion

An implementation of the FMM was presented that employs reduced-rank approximations to far-field transforms to reduce the dependence of the FMM on the size of finest-level groups of scatterers. While the ACA was found to underestimate error for a given convergence parameter and generally perform worse than truncated SVD, the recompressed ACA maintains the performance advantage of the ACA while providing error behavior comparable to the truncated SVD. The resulting reduced computational dependence facilitates the selection of an optimum group size to improve overall computation time by eliminating excessive FMM calculations.

Acknowledgments

Professor Weng C. Chew, of the University of Illinois and the University of Hong Kong, is thanked for the use of his ScaleME parallel FMM library [13, 33]. Jason C. Tillett and Jeffrey P. Astheimer, both of the University of Rochester, are thanked for helpful discussions, suggestions, and comments about material in this paper. This research was funded in part by NIH Grants EB 009692 and EB 010069 and the University of Rochester Diagnostic Ultrasound Research Laboratory Industrial Associates.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Tabei M, Mast TD, Waag RC. Simulation of ultrasonic focus aberration and correction through human tissue. J Acoust Soc Am. 2003;113:1166–1176. doi: 10.1121/1.1531986. [DOI] [PubMed] [Google Scholar]
  • 2.Lacefield JC, Pilkington WC, Waag RC. Distributed aberrators for emulation of pulse distortion by abdominal wall. Acoust Res Lett Online. 2002;3:47–52. [Google Scholar]
  • 3.Mast TD, Hinkelman LM, Metlay LA, Orr MJ, Waag RC. Simulation of ultrasonic pulse propagation, distortion, and attenuation in the human chest wall. J Acoust Soc Am. 1999;106:3665–3677. doi: 10.1121/1.428209. [DOI] [PubMed] [Google Scholar]
  • 4.Devaney AJ, Oristaglio ML. Inversion procedure for inverse scattering within the distorted-wave Born approximation. Phys Rev Lett. 1983;51:237–240. [Google Scholar]
  • 5.Chew WC, Wang YM. Reconstruction of two-dimensional permittivity distribution using the distorted Born iterative method. IEEE Trans Med Imag. 1990;9:218–225. doi: 10.1109/42.56334. [DOI] [PubMed] [Google Scholar]
  • 6.Lavarello R, Oelze M. Density imaging using inverse scattering. J Acoust Soc Am. 2009;125:793–802. doi: 10.1121/1.3050249. [DOI] [PubMed] [Google Scholar]
  • 7.Hesford AJ, Chew WC. Fast inverse scattering solutions using the distorted Born iterative method and the multilevel fast multipole algorithm. J Acoust Soc Am. 2010;128(2):679–690. doi: 10.1121/1.3458856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mast TD, Nachman AI, Waag RC. Focussing and imaging using eigenfunctions of the scattering operator. J Acoust Soc Am. 1997;102:715–726. doi: 10.1121/1.419898. [DOI] [PubMed] [Google Scholar]
  • 9.Lin F, Nachman AI, Waag RC. Quantitative imaging using a time-domain eigenfunction method. J Acoust Soc Am. 2000;108:899–912. doi: 10.1121/1.1285919. [DOI] [PubMed] [Google Scholar]
  • 10.Waag RC, Lin F, Varslot TK, Astheimer JP. An eigenfunction method for reconstruction of large-scale and high-contrast objects. IEEE Trans Ultrason, Ferroelectr Freq Control. 2007;54:1316–1332. doi: 10.1109/tuffc.2007.392. [DOI] [PubMed] [Google Scholar]
  • 11.Greengard L, Rokhlin V. A fast algorithm for particle simulations. J Comput Phys. 1987;73:325–348. [Google Scholar]
  • 12.Rokhlin V. Rapid solution of integral equations of scattering theory in two dimensions. J Comput Phys. 1990;86:414–439. [Google Scholar]
  • 13.Chew WC, Jin J, Michielssen E, Song J, editors. Fast and Efficient Algorithms in Computational Electromagnetics. Artech House; Boston: 2001. [Google Scholar]
  • 14.Michielssen E, Jin JM. Guest editorial for the special issue on large and multiscale computational electromagnetics. IEEE Trans Antennas Propag. 2008;56(8):2146–2149. [Google Scholar]
  • 15.Hesford AJ, Waag RC. The fast multipole method and Fourier convolution for the solution of acoustic scattering on regular volumetric grids. J Comput Phys. 2010;229:8199–8210. doi: 10.1016/j.jcp.2010.07.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zhai H, Chen Q, Yuan Q, Sawaya K, Liang C. Analysis of large-scale periodic array antennas by CG-FFT combined with equivalent sub-array preconditioner. IEICE Trans Commun. 2006;E89-B:922–928. [Google Scholar]
  • 17.Cui TJ, Chew WC, Aydiner AA, Zhang YH. Fast-forward solvers for the low-frequency detection of buried dielectric objects. IEEE Trans Geosci Remote Sens. 2003;41:2026–2036. [Google Scholar]
  • 18.Xu XM, Liu QH. The BCGS-FFT method for electromagnetic scattering from inhomogeneous objects in a planarly layered medium. IEEE Antennas Wireless Propag Lett. 2002;1:77–80. [Google Scholar]
  • 19.Cui TJ, Chew WC. Fast algorithm for electromagnetic scattering by buried 3-D dielectric objects of large size. IEEE Trans Geosci Remote Sens. 1999;37:2597–2608. [Google Scholar]
  • 20.Bebendorf M. Approximation of boundary element matrices. Numer Math. 2000;86:565–589. [Google Scholar]
  • 21.Zhao K, Vouvakis MN, Lee JF. The adaptive cross approximation algorithm for accelerated method of moments computations of EMC problems. IEEE Trans Electromagn Compat. 2005;47:763–773. [Google Scholar]
  • 22.Shaeffer J. Direct solve of electrically large integral equations for problem sizes to 1 M unknowns. IEEE Trans Antennas Propag. 2008;56(8):2306–2313. [Google Scholar]
  • 23.De Zaeytijd J, Bogaert I, Franchois A. An efficient hybrid MLFMA-FFT solver for the volume integral equation in case of sparse 3D inhomogeneous dielectric scatterers. J Comput Phys. 2008;227:7052–7068. [Google Scholar]
  • 24.Bebendorf M, Kunis S. Recompression techniques for adaptive cross approximation. Journal of Integral Equations and Applications. 2009;21(3):331–357. [Google Scholar]
  • 25.Nachman AI, Smith JF, III, Waag RC. An equation for acoustic propagation in inhomogeneous media with relaxation losses. J Acoust Soc Am. 1990;88(3):1584–1594. [Google Scholar]
  • 26.Duck FA. Physical Properties of Tissue: A Comprehensive Reference Book. Academic Press; London: 1990. [Google Scholar]
  • 27.Morse PM, Ingard KU. Theoretical Acoustics. McGraw-Hill; New York: 1968. [Google Scholar]
  • 28.Johnson SA, Stenger F, Wilcox C, Ball J, Berggren MJ. Wave equations and inverse solutions for soft tissue. Acoustic Imaging. 1982;11:409–424. [Google Scholar]
  • 29.Pourjavid S, Tretiak OJ. Numerical solution of the direct scattering problem through the transformed wave equation. J Acoust Soc Am. 1992;91:639–645. doi: 10.1121/1.402524. [DOI] [PubMed] [Google Scholar]
  • 30.Martin PA. Acoustic scattering by inhomogeneous obstacles. SIAM J Appl Math. 2003;64(1):297–308. [Google Scholar]
  • 31.Saad Y, Schultz MH. GMRES—a generalized minimal residual algorithm for solving nonsymmetric linear-systems. SIAM J Sci and Stat Comput. 1986;7:856–869. [Google Scholar]
  • 32.van der Vorst HA. Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J Sci and Stat Comput. 1992;13(2):631–644. [Google Scholar]
  • 33.Velamparambil S, Chew WC. Analysis and performance of a distributed memory multilevel fast multipole algorithm. IEEE Trans Antennas Propag. 2005;53:2719–2727. [Google Scholar]

RESOURCES