Accelerating image reconstruction in three-dimensional optoacoustic tomography on graphics processing units

Kun Wang; Chao Huang; Yu-Jiun Kao; Cheng-Ying Chou; Alexander A Oraevsky; Mark A Anastasio

doi:10.1118/1.4774361

. 2013 Jan 28;40(2):023301. doi: 10.1118/1.4774361

Accelerating image reconstruction in three-dimensional optoacoustic tomography on graphics processing units

Kun Wang ¹, Chao Huang ¹, Yu-Jiun Kao ², Cheng-Ying Chou ², Alexander A Oraevsky ³, Mark A Anastasio ^4,^a)

PMCID: PMC3581128 PMID: 23387778

Abstract

Purpose: Optoacoustic tomography (OAT) is inherently a three-dimensional (3D) inverse problem. However, most studies of OAT image reconstruction still employ two-dimensional imaging models. One important reason is because 3D image reconstruction is computationally burdensome. The aim of this work is to accelerate existing image reconstruction algorithms for 3D OAT by use of parallel programming techniques.

Methods: Parallelization strategies are proposed to accelerate a filtered backprojection (FBP) algorithm and two different pairs of projection/backprojection operations that correspond to two different numerical imaging models. The algorithms are designed to fully exploit the parallel computing power of graphics processing units (GPUs). In order to evaluate the parallelization strategies for the projection/backprojection pairs, an iterative image reconstruction algorithm is implemented. Computer simulation and experimental studies are conducted to investigate the computational efficiency and numerical accuracy of the developed algorithms.

Results: The GPU implementations improve the computational efficiency by factors of 1000, 125, and 250 for the FBP algorithm and the two pairs of projection/backprojection operators, respectively. Accurate images are reconstructed by use of the FBP and iterative image reconstruction algorithms from both computer-simulated and experimental data.

Conclusions: Parallelization strategies for 3D OAT image reconstruction are proposed for the first time. These GPU-based implementations significantly reduce the computational time for 3D image reconstruction, complementing our earlier work on 3D OAT iterative image reconstruction.

Keywords: Optoacoustic tomography, photoacoustic tomography, thermoacoustic tomography, graphics processing unit (GPU), compute unified device architecture (CUDA)

INTRODUCTION

Optoacoustic tomography (OAT), also known as photoacoustic computed tomography, is an emerging imaging modality that has great potential for a wide range of biomedical imaging applications.¹^,²^,³^,⁴ In OAT, a short laser pulse is employed to irradiate biological tissues. When the biological tissues absorb the optical energy, acoustic wave fields can be generated via the thermoacoustic effect. The acoustic wave fields propagate outward in three-dimensional (3D) space and are measured by use of ultrasonic transducers that are distributed outside the object. The goal of OAT is to obtain an estimate of the absorbed energy density map within the object from the measured acoustic signals. To accomplish this, an image reconstruction algorithm is required.

A variety of analytic image reconstruction algorithms have been proposed.⁵^,⁶^,⁷^,⁸ These algorithms generally assume an idealized transducer model and an acoustically homogeneous medium. Also, since they are based on discretization of continuous reconstruction formulae, these algorithms require the acoustic pressure to be densely sampled over a surface that encloses the object to obtain an accurate reconstruction. To overcome these limitations, iterative image reconstruction algorithms have been proposed.⁹^,¹⁰^,¹¹^,¹²^,¹³^,¹⁴^,¹⁵^,¹⁶^,¹⁷^,¹⁸^,¹⁹^,²⁰^,²¹^,²²^,²³^,²⁴ Although the optoacoustic wave intrinsically propagates in 3D space, when applying to experimental data, most studies have employed two-dimensional (2D) imaging models by making certain assumptions on the transducer responses and/or the object structures.¹⁰^,¹³^,¹⁵^,¹⁶^,¹⁹^,²⁵ An important reason is because the computation required for 3D OAT image reconstruction is excessively burdensome. Therefore, acceleration of 3D image reconstruction will facilitate algorithm development and many applications including real-time 3D PACT.²⁶^,²⁷

A graphics processing unit (GPU) card is a specialized device specifically designed for parallel computations.²⁸ Compute unified device architecture (CUDA) is an extension of the C/FORTRAN language that provides a convenient programming platform to exploit the parallel computational power of GPUs.²⁹ The CUDA-based parallel programming technique has been successfully applied to accelerate image reconstruction in mature imaging modalities such as x-ray computed tomography (CT)³⁰^,³¹^,³² and magnetic resonance imaging (MRI).³³ In OAT, however, only a few works on utilization of GPUs to accelerate image reconstruction have been reported.²⁰^,³⁴ For example, the k-wave toolbox employs the NVIDIA CUDA Fast Fourier Transform library (cuFFT) to accelerate the computation of 3D FFT.³⁴ Also a GPU-based sparse matrix-vector multiplication strategy has been applied to 3D OAT image reconstruction for the case that the system matrix is sparse and can be stored in memory.²⁰ However, there remains an important need to develop efficient implementations of OAT reconstruction algorithms for general applications in which the system matrix is too large to be stored.

In this work, we propose parallelization strategies, for use with GPUs, to accelerate 3D image reconstruction in OAT. Both filtered backprojection (FBP) and iterative image reconstruction algorithms are investigated. For use with iterative image reconstruction algorithms, we focus on the parallelization of projection and backprojection operators. Specifically, we develop two pairs of projection/backprojection operators that correspond to two distinct discrete-to-discrete (D-D) imaging models employed in OAT, namely, the interpolation-based and the spherical-voxel-based D-D imaging models. Note that our implementations of the backprojection operators compute the exact adjoint of the forward operators, and therefore the projector pairs are “matched.”³⁵

The remainder of the paper is organized as follows. In Sec. 2, we briefly review OAT imaging models in their continuous and discrete forms. We propose GPU-based parallelization strategies in Sec. 3. Numerical studies and results are described in Secs. 4, 5, respectively. Finally, a brief discussion and summary of the proposed algorithms are provided in Sec. 6.

BACKGROUND

Continuous-to-continuous (C-C) imaging models and analytic image reconstruction algorithms

A C-C OAT imaging model neglects sampling effects and provides a mapping from the absorbed energy density function A(r) to the induced acoustic pressure function p(r^s, t). Here, t is the temporal coordinate, r ∈ V and r^s ∈ S denote the locations within the object support V and on the measurement surface S, respectively. A canonical OAT C-C imaging model can be expressed as¹^,¹⁴^,³⁶

p (r^{s}, t) = \frac{β}{4 π C_{p}} \int_{V} d r A (r) \frac{d}{d t} \frac{δ (t - \frac{| r^{s} - r |}{c_{0}})}{| r^{s} - r |} \equiv H_{CC} A,

(1)

where δ(t) is the Dirac delta function, β, c₀, and C_p denote the thermal coefficient of volume expansion, (constant) speed-of-sound, and the specific heat capacity of the medium at constant pressure, respectively. We introduce an operator notation $H_{CC}$ to denote this C-C mapping.

Alternatively, Eq. 1 can be reformulated as the well-known spherical Radon transform (SRT)¹²^,³⁷

g (r^{s}, t) = \int_{V} d r A (r) δ (c_{0} t - | r^{s} - r |),

(2)

where the function g(r^s, t) is related to p(r^s, t) as

p (r^{s}, t) = \frac{β}{4 π C_{p}} \frac{\partial}{\partial t} (\frac{g (r^{s}, t)}{t}) .

(3)

The SRT model provides an intuitive interpretation of each value of g(r^s, t) as a surface integral of A(r) over a sphere centered at r^s with radius tc₀.

Based on C-C imaging models, a variety of analytic image reconstruction algorithms have been developed.⁵^,⁶^,⁷^,⁸ For the case of a spherical measurement geometry, a FBP algorithm in its continuous form is given by⁶

\begin{matrix} A (r) = - \frac{C_{p}}{2 π β c_{0}^{2} R^{s}} \int_{S} d r^{s} {[\frac{2 p (r^{s}, t)}{| r - r^{s} |} + \frac{1}{c_{0}} \frac{\partial p (r^{s}, t)}{\partial t}]}_{t = \frac{| r - r^{s} |}{c_{0}}}, \end{matrix}

(4)

where R^s denotes the radius of the measurement surface S.

Discrete-to-discrete imaging models and iterative image reconstruction algorithms

When sampling effects are considered, an OAT system is properly described as a continuous-to-discrete (C-D) imaging model¹⁴^,²²^,³⁵^,³⁸^,³⁹

\begin{matrix} {[u]}_{q K + k} & = & h^{e} (t) *_{t} \frac{1}{S_{q}} \int_{S_{q}} d r^{s} p (r^{s}, t) |_{t = k Δ_{t}}, \\ \begin{matrix} q = 0, 1, \dots, Q - 1 \\ k = 0, 1, \dots, K - 1 \end{matrix}, \end{matrix}

(5)

where Q and K denote the total numbers of transducers (indexed by q) and the time samples (indexed by k) respectively. S_q is the surface area of the qth transducer, which is assumed to be a subset of S; h^e(t) denotes the acousto-electric impulse response (EIR) of each transducer that, without loss of generality, is assumed to be identical for all transducers; “*_t” denotes a linear convolution with respect to time coordinate; and Δ_t is the temporal sampling interval. The vector u represents the lexicographically ordered measured voltage signals whose (qK + k)th element is denoted by [u]_{qK + k}.

In order to apply iterative image reconstruction algorithms, a D-D imaging model is required, which necessitates the discretization of A(r). The following N-dimensional representation of the object function can be employed³⁵^,³⁸

A (r) \approx \sum_{n = 0}^{N - 1} {[α]}_{n} ψ_{n} (r),

(6)

where α is a coefficient vector whose nth element is denoted by [α]_n and ψ_n(r) is the expansion function. On substitution from Eq. 6 into Eq. 5, where p(r^s, t) is defined by Eq. 1, one obtains a D-D mapping from α to u, expressed as

u \approx H α,

(7)

where each element of the matrix H is defined as

{[H]}_{q K + k, n} = {[h^{e} *_{t} \frac{1}{S_{q}} \int_{S_{q}} d r^{s} H_{CC} ψ_{n}]}_{t = k Δ_{t}} .

(8)

Here, H is the D-D imaging operator also known as system matrix or projection operator. Note that the “≈” in Eq. 7 is due to the use of the finite-dimensional representation of the object function [i.e., Eq. 6]. No additional approximations have been introduced.

Below we describe two types of D-D imaging models that have been employed in OAT:¹⁴^,²²^,³⁹ the interpolation-based imaging model and the spherical-voxel-based imaging model. The quantities u, H, and α (or ψ_n) in the two models will be distinguished by the subscripts (or superscripts) “int” and “sph,” respectively.

Interpolation-based D-D imaging model

The interpolation-based D-D imaging model defines the coefficient vector as samples of the object function on the nodes of a uniform Cartesian grid

{[α_{int}]}_{n} = \int_{V} d r δ (r - r_{n}) A (r), n = 0, 1, \dots, N - 1,

(9)

where r_n = (x_n, y_n, z_n)^T specifies the location of the nth node of the uniform Cartesian grid. The definition of the expansion function depends on the choice of interpolation method.¹⁹ If a trilinear interpolation method is employed, the expansion function can be expressed as⁴⁰

ψ_{n}^{int} (r) = \{\begin{matrix} (1 - \frac{| x - x_{n} |}{Δ_{s}}) (1 - \frac{| y - y_{n} |}{Δ_{s}}) (1 - \frac{| z - z_{n} |}{Δ_{s}}), & if | x - x_{n} |, | y - y_{n} |, | z - z_{n} | \leq Δ_{s} \\ 0, & otherwise \end{matrix},

(10)

where Δ_s is the distance between two neighboring grid points.

In principle, the interpolation-based D-D imaging model can be constructed by substitution from Eqs. 9, 10 to Eq. 8. In practice, however, implementation of the surface integral over S_q is difficult for the choice of expansion functions in Eq. 10. Also, implementations of the temporal convolution and $H_{CC} ψ_{n}^{int}$ usually require extra discretization procedures. Therefore, utilization of the interpolation-based D-D model commonly assumes the transducers to be point-like. In this case, the implementation of H_int is decomposed as a three-step operation

u_{int} = H_{int} α_{int} \equiv H^{e} D G α_{int},

(11)

where G, D, and H^e are discrete approximations of the SRT [Eq. 2], the differential operator [Eq. 3], and the operator that implements a temporal convolution with EIR, respectively. We implemented G in a way¹²^,⁴¹^,⁴² that is similar to the “ray-driven” implementation of Radon transform in x-ray CT,⁴⁰ i.e, for each data sample, we accumulated the contributions from the voxels that resided on the spherical shell specified by the data sample. By use of Eqs. 2, 6, 9, 10, one obtains

\begin{matrix} {[G α_{int}]}_{q K + k} = Δ_{s}^{2} \sum_{n = 0}^{N - 1} {[α_{int}]}_{n} \sum_{i = 0}^{N_{i} - 1} \sum_{j = 0}^{N_{j} - 1} ψ_{n}^{int} (r_{k, i, j}) \equiv {[g]}_{q K + k}, \end{matrix}

(12)

where ${[g]}_{q K + k} \approx g (r_{q}^{s}, t) |_{t = k Δ_{t}}$ with $r_{q}^{s}$ specifying the location of the qth point-like transducer, and N_i and N_j denote the numbers of divisions over the two angular coordinates of a local spherical coordinate system shown in Fig. 1b. A derivation of Eq. 12 is provided in the Appendix. The differential operator in Eq. 3 is approximated as

\begin{matrix} {[D g]}_{q K + k} = \frac{β}{8 π C_{p} Δ_{t}^{2}} (\frac{{[g]}_{q K + k + 1}}{k + 1} - \frac{{[g]}_{q K + k - 1}}{k - 1}) \equiv {[p_{int}]}_{q K + k}, \end{matrix}

(13)

where ${[p_{int}]}_{q K + k} \approx p (r_{q}^{s}, t) |_{t = k Δ_{t}}$ . Finally, the continuous temporal convolution is approximated by a discrete linear convolution as⁴³

{[H^{e} p_{int}]}_{q K + k} = \sum_{κ = 0}^{K - 1} {[h^{e}]}_{k - 1 - κ} {[p_{int}]}_{q K + κ} \equiv {[u_{int}]}_{q K + k},

(14)

where ${[h^{e}]}_{k} = Δ_{t} h^{e} (t) |_{t = k Δ_{t}}$ .

(a) Schematic of the 3D OAT scanning geometry. (b) Schematic of the local coordinate system for the implementation of interpolation-based D-D imaging model.

Spherical-voxel-based D-D imaging model

The spherical-voxel-based imaging model is also widely employed in OAT.⁹^,¹¹^,¹⁴^,²²^,⁴⁴ It employs the expansion functions

ψ_{n}^{sph} (r) = \{\begin{matrix} 1, & if | r - r_{n} | \leq Δ_{s} / 2 \\ 0, & otherwise \end{matrix},

(15)

where r_n is defined as in Eq. 9. The nth expansion function $ψ_{n}^{sph} (r)$ is a uniform sphere that is inscribed by the nth cuboid of a Cartesian grid. The nth component of the coefficient vector α_sph is defined as

{[α_{sph}]}_{n} = \frac{V_{cube}}{V_{sph}} \int_{V} d r ψ_{n}^{sph} (r) A (r),

(16)

where V_cube and V_sph are the volumes of a cubic voxel of dimension Δ_s and of a spherical voxel of radius Δ_s/2, respectively.

Unlike the interpolation-based imaging model, by use of the expansion functions defined in Eq. 15, the surface integral over S_q and $H_{CC} ψ_{n}^{sph}$ in Eq. 8 can be converted to a temporal convolution and calculated analytically.¹⁴^,²² To avoid utilizing excessively high sampling rates to mitigate aliasing, the spherical-voxel-based imaging model can be conveniently implemented in the temporal frequency domain as²²

\begin{matrix} {[{\tilde{u}}_{sph}]}_{q L + l} = {\tilde{p}}_{0} (f) \sum_{n = 0}^{N - 1} {[α_{sph}]}_{n} \frac{1}{S_{q}} {\tilde{h}}_{q}^{s} (r_{n}, f) |_{f = l Δ_{f}}, \\ for l = 0, 1, \dots, L - 1, \end{matrix}

(17)

where Δ_f is the frequency sampling interval, and L denotes the total number of temporal-frequency samples indexed by l. A derivation of Eq. 17 can be found in Ref. 22. The function ${\tilde{h}}_{q}^{s} (r_{n}, f)$ represents the temporal Fourier transform of the spatial impulse response (SIR) of the qth transducer for the source located at r_n, expressed as

{\tilde{h}}_{q} (r_{n}, f) = \int_{S_{q}} d r^{s} \frac{\exp (- \hat{ȷ} 2 π f \frac{| r^{s} - r_{n} |}{c_{0}})}{2 π | r^{s} - r_{n} |} .

(18)

Also, ${\tilde{p}}_{0} (f)$ is defined as

\begin{matrix} {\tilde{p}}_{0} (f) = - \hat{ȷ} \frac{β c_{0}^{3}}{C_{p} f} [\frac{Δ_{s}}{2 c_{0}} \cos (\frac{π f Δ_{s}}{c_{0}}) - \frac{1}{2 π f} \sin (\frac{π f Δ_{s}}{c_{0}})] {\tilde{h}}^{e} (f), \end{matrix}

(19)

where ${\tilde{h}}^{e} (f)$ is the EIR in temporal-frequency domain. In summary, the imaging model can be expressed in matrix form as

{\tilde{u}}_{sph} = H_{sph} α_{sph} .

(20)

Adjoints of the system matrices

Iterative image reconstruction algorithms employ numerical implementations of the projection operator, i.e., the system matrix H, as well as its adjoint, denoted by H†.⁴⁵ The adjoint is also referred to as the backprojection operator. Note that for most practical applications, H and H† are too large to be stored in the random access memory of currently available computers. Therefore, in practice, the actions of H and H† are almost always calculated on the fly. The same strategy was adopted in this work.

According to the definition of the adjoint operator,³⁵^,⁴³ $H_{int}^{†} = G^{†} D^{†} H^{e †}$ , where

{[H^{e †} u_{int}]}_{q K + k} = \sum_{κ = 0}^{K - 1} {[h^{e}]}_{κ - 1 - k} {[u_{int}]}_{q K + κ} \equiv {[p_{int}^{'}]}_{q K + k},

(21)

\begin{matrix} {[D^{†} p_{int}^{'}]}_{q K + k} & = & \frac{β}{8 π C_{p} Δ_{t}^{2} k} ({[p_{int}^{'}]}_{q K + k - 1} - {[p_{int}^{'}]}_{q K + k + 1}) \\ \equiv & {[g^{'}]}_{q K + k}, \end{matrix}

(22)

and

\begin{matrix} {[G^{†} g^{'}]}_{n} & = & Δ_{s}^{2} \sum_{q = 0}^{Q - 1} \sum_{k = 0}^{K - 1} {[g^{'}]}_{q K + k} \sum_{i = 0}^{N_{i} - 1} \sum_{j = 0}^{N_{j} - 1} ψ_{n}^{int} (r_{k, i, j}) \\ \equiv & {[α_{int}^{'}]}_{n} . \end{matrix}

(23)

It can also be verified that the adjoint operator $H_{sph}^{†}$ is given by

\begin{matrix} {[H_{sph}^{†} {\tilde{u}}_{sph}]}_{n} = \sum_{q = 0}^{Q - 1} \sum_{l = 0}^{L - 1} {[{\tilde{u}}_{sph}]}_{q L + l} {\tilde{p}}_{0}^{*} (f) {\tilde{h}}_{q}^{*} (r_{n}, f) |_{f = l Δ_{f}}, \end{matrix}

(24)

where the superscript “*” denotes the complex conjugate. Unlike the unmatched backprojection operators⁴⁶ that are obtained by discretization of the continuous adjoint operator, utilization of the exact adjoint operator facilitates the convergence of iterative image reconstruction algorithms.

GPU architecture and CUDA programming

The key features of GPU architecture and the basics of CUDA programming are briefly summarized in this section. We refer the readers to Refs. ²⁸^,²⁹ for additional details.

A GPU card contains multiple streaming multiprocessors. Each streaming multiprocessor is configured with multiple processor cores. For example, the Tesla C1060 possesses 30 streaming multiprocessors with 8 processor cores on each; and the Tesla C2050 possesses 14 streaming multiprocessors with 32 processor cores on each.²⁸ The processor cores in each multiprocessor execute the same instruction on different pieces of data, which is referred to as “single instruction, multiple data” (SIMD) model of parallel programming. In order to fully exploit the computing power of GPUs, one of the major challenges is to design a parallelization strategy fitting in the SIMD framework such that the largest number of processor cores can execute the computation simultaneously.²⁹

A GPU card has six types of memory that have varying capacities and different access rules and efficiencies: (1) Registers are assigned for each thread and have the fastest access. (2) Shared memory is assigned for each block and can be efficiently accessed by all threads in the block if designed appropriately. (3) Constant memory is read-only and can be accessed by all threads efficiently. (4) Texture memory is also read-only and is optimized for interpolation operations. (5) Global memory has the slowest access that takes hundred times more clock cycles than does the computation of basic arithmetic operations. (6) Local memory is assigned for each thread but has a slow access as does the global memory. Therefore, an efficient GPU-based implementation in general requires a limited number of global and local memory access.

CUDA is a platform and programming model developed by NVIDIA that includes a collection of functions and keywords to exploit the parallel computing power of GPUs.²⁹ A CUDA parallel program is composed of a host program and kernels. The host program is executed by central processing units (CPUs) and launches the kernels, which are custom-designed functions executed by GPUs. A general parallel programming strategy is to launch multiple instances of a kernel and to run the multiple instances concurrently on GPUs. In CUDA, each instance of the kernel is named as a thread and processes only a portion of the data. A hierarchy of threads is employed: Threads are grouped into blocks, and blocks are grouped into a grid. Therefore, each thread is specified by a multi-index containing a block index and a thread index within the block.

GPU-ACCELERATED RECONSTRUCTION ALGORITHMS

In this section, we propose GPU-based parallelization strategies for the FBP algorithm and the projection/backprojection operations corresponding to the interpolation-based and the spherical-voxel-based D-D imaging models.

Measurement geometry

We employed a spherical measurement geometry shown in Fig. 1a. The measurement sphere was of radius R^s centered at the origin of the Cartesian coordinate system (or the equivalent spherical coordinate system). The polar angle θ^s ∈ [0, π] was equally divided with interval $Δ_{θ^{s}} = π / N_{r}$ , starting from $θ_{\min}^{s}$ . At each polar angle, a ring on the sphere that was parallel to the plane z = 0 can be specified, resulting N_r rings. On each ring, N_v ultrasonic transducers were assumed to be uniformly distributed with azimuth angle interval $Δ_{ϕ^{s}} = 2 π / N_{v}$ . Hereafter, each azimuth angle will be referred to as a tomographic view. At each view, we assumed that N_t temporal samples were acquired and the first sample corresponded to time instance t_min. For implementations in temporal-frequency domain, we assumed that N_f temporal-frequency samples were available and the first sample corresponded to f_min. The region to be reconstructed was a rectangular cuboid whose edges were parallel to the axes of the coordinate system and the left-bottom-back vertex was located at (x_min, y_min, z_min). The numbers of voxels along the three coordinates will be denoted by N_x, N_y, and N_z, respectively, totally N = N_xN_yN_z voxels. We also assumed the cuboid was contained in another sphere of radius R that was concentric with the measurement sphere shown in Fig. 1b.

Implementation of the FBP algorithm

Central processing unit-based implementations of continuous FBP formulae have been described in Refs. ⁵^,⁶^,⁷^,⁸. Though the discretization methods vary, in general, three approximations have to be employed. First, the first-order derivative term ∂p(r^s, t)/∂t has to be approximated by a difference scheme up to certain order.⁴⁷ Second, the measurement sphere has to be divided into small patches, and the surface integral has to be approximated by a summation of the area of every patch weighted by the effective value of the integrand on the patch. Finally, the value of the integrand at an arbitrary time instance t = |r^s − r|/c₀ has to be approximated by certain interpolation method.

In this study, we approximated the surface integral by use of the trapezoidal rule. As described earlier, the spherical surface was divided into N_rN_v patches. For the transducer indexed by q that was located at $r_{q}^{s} = (R^{s}, θ_{q}^{s}, ϕ_{q}^{s})$ , the area of the patch was approximated by ${(R^{s})}^{2} Δ_{θ^{s}} Δ_{ϕ^{s}} \sin θ_{q}^{s}$ . The value at time instance $t = | r_{q}^{s} - r_{n} | / c_{0}$ was approximated by the linear interpolation from its two neighboring samples as

\begin{matrix} p (r_{q}^{s}, t) |_{t = \frac{| r_{q}^{s} - r_{n} |}{c_{0}}} \approx (k + 1 - \tilde{k}) {[p]}_{q K + k} + (\tilde{k} - k) {[p]}_{q K + k + 1}, \end{matrix}

(25)

where $\tilde{k} = (| r_{q}^{s} - r_{n} | / c_{0} - t_{\min}) / Δ_{t}$ , and k is the integer part of $\tilde{k}$ . Here, p is a vector of lexicographically ordered samples of the pressure function p(r^s, t), which is estimated from the measured voltage data vector u. Also, the first-order derivative term was approximated by

\frac{\partial}{\partial t} p (r_{q}^{s}, t) |_{t = \frac{| r_{q}^{s} - r_{n} |}{c_{0}}} \approx \frac{1}{Δ_{t}} ({[p]}_{q K + k + 1} - {[p]}_{q K + k}) .

(26)

By use of these three numerical approximations, the discretized FBP formula was expressed as

\begin{matrix} {[{\hat{α}}_{fbp}]}_{n} & = & - \frac{C_{p} R^{s} Δ_{θ^{s}} Δ_{ϕ^{s}}}{π β c_{0}^{3} Δ_{t}} \sum_{n_{r} = 0}^{N_{r} - 1} \sin θ_{q}^{s} \\ \times \sum_{n_{v} = 0}^{N_{v} - 1} {(1.5 - \frac{k + t_{\min} / Δ_{t}}{\tilde{k} + t_{\min} / Δ_{t}}) {[p]}_{q K + k + 1} \\ + (\frac{k + 1 + t_{\min} / Δ_{t}}{\tilde{k} + t_{\min} / Δ_{t}} - 1.5) {[p]}_{q K + k}} . \end{matrix}

(27)

Unlike the implementations of FBP formulas in x-ray cone beam CT,³¹^,³² we combined the filter and the linear interpolation. This reduced the number of visits to the global memory in the GPU implementation described below.

We implemented the FBP formula in a way that is similar to the “pixel-driven” implementation in x-ray CT,³² i.e., we assigned each thread to execute the two accumulative summations in Eq. 27 for each voxel. We bound the pressure data p to texture memory because it is cached and has a faster accessing rate. Therefore, our implementation only requires access to texture memory twice and to global memory once. The pseudocodes are provided in Algorithms 1 and 2 for the host part and the device part, respectively. Note that the pseudocodes do not intend to be always optimal because the performance of the codes could depend on the dimensions of p and ${\hat{α}}_{fbp}$ . For example, we set the block size to be (N_z, 1, 1) because for our applications, N_z was bigger than N_x and N_y and smaller than the limit number of threads that a block can support (i.e., 1024 for the NVIDIA Tesla C2050). If the values of N_x, N_y, and N_z change, we may need to redesign the dimensions of the grid and blocks. However, the general SIMD parallelization strategy remains.

Table .

Algorithm I. Implementation of the FBP algorithm (on host).

Input:p

Output:

{\hat{α}}_{fbp}

w = - C_{p} R^{s} Δ_{θ^{s}} Δ_{ϕ^{s}} / (π β c_{0}^{3} Δ_{t})

{Precalculate the common coefficient}

T_p \leftarrow p

{Bound data to texture memory}

3: K_fbp ⟨⟨⟨ (N_y, N_x), (N_z, 1, 1) ⟩⟩⟩ (ω,

D_{\hat{α}}_{fbp}

)

{\hat{α}}_{fbp} \leftarrow D_{\hat{α}}_{fbp}

{Copy data from global memory to host}

Open in a new tab

Table .

Algorithm II. Implementation of kernel K_fbp ⟨⟨⟨ (N_y, N_x), (N_z, 1, 1) ⟩⟩⟩.

Input: ω,

T_p

D_{\hat{α}}_{fbp}

Output:

D_{\hat{α}}_{fbp}

1: x = (blockIdx.y)Δ_s + x_min; y = (blockIdx.x)Δ_s + y_min; z = (threadIdx.x)Δ_s + z_min

2: Σ = 0

3: forn_r = 0 toN_r − 1 do

θ^{s} = n_{r} Δ_{θ^{s}} + θ_{\min}^{s}

; z^s = R^scos θ^s; r^s = R^ssin θ^s; w′ = wsin θ^s

5: forn_v = 0 toN_v − 1 do

ϕ^{s} = n_{v} Δ_{ϕ^{s}} + ϕ_{\min}^{s}

; x^s = r^scos ϕ^s; y^s = r^ssin ϕ^s

\bar{t} = {({(x - x^{s})}^{2} + {(y - y^{s})}^{2} + {(z - z^{s})}^{2})}^{1 / 2}

t_{n} = (\bar{t} / c_{0} - t_{\min}) / Δ_{t}

; n_t = floor(t_n)

Σ + = ω^{'} {[(n_{t} Δ_{t} + t_{\min}) / (t_{n} Δ_{t} + t_{\min}) - 1.5] T_p [n_{r}] [n_{v}] [n_{t}] + [1.5 - (n_{t} Δ_{t} + t_{\min}) / (t_{n} Δ_{t} + t_{\min})] T_p [n_{r}] [n_{v}] [n_{t} + 1]}

{Fetch data from texture memory}

10: endfor

11: endfor

12:

D_{\hat{α}}_{fbp} [blockIdx . y] [blockIdx . x] [threadIdx . x] = Σ

Open in a new tab

Implementation of H_int and $H_{int}^{†}$

The forward projection operation H_intα_int is composed of three consecutive operations g = Gα_int, p_int = Dg, and u_int = H^ep_int that are defined in Eqs. 12, 13, 14, respectively. Both the difference operator D and the one-dimensional (1D) convolution H^e have low computational complexities while the SRT operator G is computationally burdensome. Hence, we developed the GPU-based implementation of G while leaving D and H^e to be implemented by CPUs.

The SRT in OAT shares many features with the Radon transform in x-ray CT. Thus, our GPU-based implementation is closely related to the implementations of Radon transform that have been optimized for x-ray CT.³⁰^,³¹^,³² The surface integral was approximated according to the trapezoidal rule. First, the integral surface was divided into small patches, which is described in the Appendix. Second, each patch was assigned an effective value of the object function by trilinear interpolation. The trilinear interpolation was calculated by use of the texture memory of GPUs that is specifically designed for interpolation. Finally, GPU threads accumulated the areas of patches weighted by the effective values of the object function and wrote the final results to global memory. The pseudocodes for implementation of G are provided in Algorithms 3 and 4 for the host part and the device part, respectively. Note that we employed the “one-level”-strategy,³² i.e., each thread calculates one data sample. Higher level strategies have been proposed to improve the performance by assigning each block to calculate multiple data samples,³² which, however, caused many thread idles in OAT mainly because the amount of computation required to calculate a data sample varies largely among samples for SRT.

Table .

Algorithm III. Implementation of g = Gα_int (on host)

Input:α_int

Output:g

T_α_{int} \leftarrow α_{int}

{Bound data to texture memory}

2: K_srt ⟨⟨⟨ (N_v, N_t), (N_r, 1, 1) ⟩⟩⟩ (

D_g

)

g \leftarrow D_g

{Copy data from global memory to host}

Open in a new tab

Table .

Algorithm IV. Implementation of kernel K_srt ⟨⟨⟨ (N_v, N_t), (N_r, 1, 1) ⟩⟩⟩.

Input:

D_g

T_α_{int}

Output:

D_g

\bar{t} = (blockIdx . y) c_{0} Δ_{t} + c_{0} t_{\min}

;

θ^{s} = (threadIdx . x) Δ_{θ^{s}} + θ_{\min}^{s}

;

ϕ^{s} = (blockIdx . x) Δ_{ϕ^{s}} + ϕ_{\min}^{s}

θ_{\max}^{'} = \arccos (({(R^{s})}^{2} + {\bar{t}}^{2} - R^{2}) / (2 \bar{t} R^{s}))

3: Σ = 0;

θ^{'} = θ_{\max}^{'}

4: while θ^′ > 0 do

z^{'} = \bar{t} \cos θ^{'}

;

r^{'} = \bar{t} \sin θ^{'}

; ϕ^′ = 0

6: while ϕ^′ < 2π do

7: x^′ = r^′cos ϕ^′; y^′ = r^′sin ϕ^′

8: x = −x^′sin θ^′ − (z^′ − R^s)cos θ^′; y = y^′; z = x^′cos θ^′ − (z^′ − R^s)sin θ^′ {Convert to global coordinate system}

9: x_n = (x − x_min)/Δ_s; y_n = (y − y_min)/Δ_s; z_n = (z − z_min)/Δ_s

10: Σ + = tex3D(x_n, y_n, z_n) {Tri-linear interpolation}

11: ϕ^′ = ϕ^′ + Δ_s/r^′

12: endwhile

13:

θ^{'} = θ^{'} - Δ_{s} / \bar{t}

14: endwhile

15:

D_g [threadIdx . x] [blockIdx . x] [blockIdx . y] = Σ Δ_{s}^{2}

Open in a new tab

Implementation of the backprojection operator $H_{int}^{†}$ was very similar to the implementation of H_int. The operators D† and H^e_† were calculated on CPUs while G† was calculated by use of GPUs. The pseudocodes are provided in Algorithms 5 and 6. We made use of the CUDA function “atomicAdd” to add weights to global memory from each thread.

Table .

Algorithm V. Implementation of $α_{int}^{'} = G^{†} g^{'}$ (on host).

Input:g′

Output:

α_{int}^{'}

T_g^{'} \leftarrow g^{'}

{Bound data to texture memory}

2: K_srtT ⟨⟨⟨ (N_v, N_t), (N_r, 1, 1) ⟩⟩⟩ (

D_α_{int}^{'}

)

α_{int}^{'} \leftarrow D_α_{int}^{'}

{Copy data from global memory to host}

Open in a new tab

Table .

Algorithm VI. Implementation of kernel K_srtT ⟨⟨⟨ (N_v, N_t), (N_r, 1, 1) ⟩⟩⟩.

Input:

D_α_{int}^{'}

T_g_{int}^{'}

Output:

D_α_{int}^{'}

\bar{t} = (blockIdx . y) c_{0} Δ_{t} + c_{0} t_{\min}

;

θ^{s} = (threadIdx . x) Δ_{θ^{s}} + θ_{\min}^{s}

;

ϕ^{s} = (blockIdx . x) Δ_{ϕ^{s}} + ϕ_{\min}^{s}

θ_{\max}^{'} = \arccos (({(R^{s})}^{2} + {\bar{t}}^{2} - R^{2}) / (2 \bar{t} R^{s}))

;

θ^{'} = θ_{\max}^{'}

3: while θ^′ > 0 do

z^{'} = \bar{t} \cos θ^{'}

;

r^{'} = \bar{t} \sin θ^{'}

; ϕ^′ = 0

5: while ϕ^′ < 2π do

6: x^′ = r^′cos ϕ^′; y^′ = r^′sin ϕ^′

7: x = −x^′sin θ − (z^′ − R^′)cos θ; y = y^′; z = x^′cos θ − (z^′ − R^′)sin θ {Convert to global coordinate system}

8: x_n = (x − x_min)/Δ_s; y_n = (y − y_min)/Δ_s; z_n = (z − z_min)/Δ_s

9: n_x = floor(x_n); n_y = floor(y_n); n_z = floor(z_n)

10:

D_α_{int}^{'} [n_{z}] [n_{y}] [n_{x}] + = Δ_{s}^{2} (n_{x} + 1 - x_{n}) (n_{y} + 1 - y_{n}) (n_{z} + 1 - z_{n}) T_g_{int}^{'} [threadIdx . x] [blockIdx . x] [blockIdx . y]

{Add weights to one of

the eight neighboring nodes by use of ‘atomicAdd'; Repeat this operation for all other seven neighboring nodes}

11: ϕ^′ = ϕ^′ + Δ_s/r^′

12: endwhile

13:

θ^{'} = θ^{'} - Δ_{s} / \bar{t}

14: endwhile

Open in a new tab

Implementation of H_sph and $H_{sph}^{†}$

Implementation of the forward projection operation for the spherical-voxel-based imaging model is distinct from that of the interpolation-based model. The major difference is that calculation of each element of the data vector for the spherical-voxel-based imaging model requires the accumulation of the contributions from all voxels because the model is expressed in the temporal frequency domain. Because of this, the amount of computation required to calculate each data sample in the spherical-voxel-based imaging model is almost identical, simplifying the parallelization strategy.

We proposed a parallelization strategy that was inspired by one applied in advanced MRI reconstruction³³ and is summarized as follows. Discrete samples of ${\tilde{p}}_{0} (f)$ defined in Eq. 19 were precalculated and stored as a vector ${\tilde{p}}_{0}$ in constant memory. Because the size of the input vector α_sph is often too large to fit in the constant memory, we divided α_sph into subvectors that matched the capacity of the constant memory. We employed a CPU loop to copy every subvector sequentially to the constant memory and call the GPU kernel function to accumulate a partial summation. The major advantage of this design is that the total number of global memory visits to calculate one data sample is reduced to the number of subvectors.

Implementation of the projection operator for the spherical-voxel-based imaging model generally involves more arithmetic operations than does the interpolation-based imaging model. Moreover, the spherical-voxel-based imaging model has been employed to compensate for the finite aperture size effect of transducers,¹⁴^,²² which makes the computation even more burdensome. Because of this, we further developed an implementation that employed multiple GPUs. The pseudocodes of the projection operation are provided in Algorithms 7, 8, 9. We created N_pth pthreads on CPUs by use of the “pthread.h” library. Here, we denote the threads on CPUs by “pthread” to distinguish from threads on GPUs. We divided the input vector α_sph into N_pth subvectors (denoted by α_pth's) of equal size and declared an output vector ${\tilde{u}}_{sph}^{'}$ of dimension N_pthN_rN_vN_f. By calling the pthread function “fwd_pthread,” N_sph pthreads simultaneously calculated the projection. Each pthread projected an α_pth to a partial voltage data vector ${\tilde{u}}_{pth}^{'}$ that filled in the larger vector ${\tilde{u}}_{sph}^{'}$ . Once all pthreads finished filling their ${\tilde{u}}_{pth}^{'}$ into ${\tilde{u}}_{sph}^{'}$ , the projection data ${\tilde{u}}_{sph}$ were obtained by a summation of the N_pth ${\tilde{u}}_{pth}^{'}$ 's.

Table .

Algorithm VI. Implementation of ${\tilde{u}}_{sph} = H_{sph} α_{sph}$ (on host).

Input:α_sph,

{\tilde{p}}_{0}

Output:

{\tilde{u}}_{sph}

1: forn_pth = 0 toN_pth − 1 do

2: parm_fwdarg[n_pth].n_pth = n_pth

3: parm_fwdarg[n_pth].

{\tilde{p}}_{0}

= &

{\tilde{p}}_{0} [0]

4: parm_fwdarg[n_pth].α_pth=&α_sph[n_pthN_xN_yN_z/N_pth]

5: parm_fwdarg[n_pth].

{\tilde{u}}_{pth}^{'}

& {\tilde{u}}_{sph}^{'} [n_{pth} N_{r} N_{v} N_{f}]

{Pass addresses

of arrays to each pthread}

6: pthread_create(&pthreads[n_pth], NULL, fwd_pthread, (void *)

(parm_fwdarg+n_pth)) {Call function fwd_pthread}

7: endfor

8: forn_pth = 0 toN_pth − 1 do

9: forn = 0 toN_rN_vN_fdo

10:

{\tilde{u}}_{sph} [n] + = {\tilde{u}}_{sph}^{'} [n + n_{pth} N_{r} N_{v} N_{f}]

11: endfor

12: endfor

Open in a new tab

Table .

Algorithm VIII. Implementation of function fwd_pthread (on host).

Input:n_pth,

{\tilde{p}}_{0}

, α_pth,

{\tilde{u}}_{pth}^{'}

Output:

{\tilde{u}}_{pth}^{'}

{C_\tilde{p}}_{0} \leftarrow {\tilde{p}}_{0}

{Copy from host to constant memory}

2: forn_x = 0 toN_x/N_pth − 1 do

3: x = (n_x + n_pthN_x/N_pth)Δ_s + x_min

4: forn_y = 0 toN_y − 1 do

5: y = n_yΔ_s + y_min

C_α_{pth} \leftarrow α_{pth} [n_{x}] [n_{y}] [:]

{Copy from host to constant

memory}

7: K_fwdsph ⟨⟨⟨ (N_v, N_r), (N_f, 1, 1) ⟩⟩⟩ (x, y, D_

{\tilde{u}}_{pth}^{'}

)

8: endfor

9: endfor

10:

{\tilde{u}}_{pth}^{'} \leftarrow D_{\tilde{u}}_{pth}^{'}

{Copy from global memory to host}

Open in a new tab

Table .

Algorithm IX. Implementation of Kernel K_fwdsph ⟨⟨⟨ (N_v, N_r), (N_f, 1, 1) ⟩⟩⟩.

Input:x, y, D_

{\tilde{u}}_{pth}^{'}

, C_α_pth, C_

{\tilde{p}}_{0}

Output: D_

{\tilde{u}}_{pth}^{'}

θ^{s} = (blockIdx . y) Δ_{θ^{s}} + θ_{\min}^{s}

;

ϕ^{s} = (blockIdx . x) Δ_{ϕ^{s}} + ϕ_{\min}^{s}

; f = (threadIdx.x)Δ_f + f_min

2: z^s = R^scos θ^s; x^s = R^ssin θ^scos ϕ^s; y^s = R^ssin θ^ssin ϕ^s {Calculate locations of transducers}

3: Σ^r = 0; Σⁱ = 0 {Initiate the partial summation including the real and imaginary parts}

4: forn_z = 0 toN_z − 1 do

5: z = n_zΔ_s + z_min

6: d = ((x − x^s)² + (y − y^s)² + (z − z^s)²)^1/2

{\tilde{h}}^{r} = \cos (2 π f d / c_{0}) / (2 π d)

;

{\tilde{h}}^{i} = - \sin (2 π f d / c_{0}) / (2 π d)

{Calculate SIR; Example here assumes point-like transducers}

Σ^{r} + = C_α_{pth} [n_{z}] ({\tilde{h}}^{r} C_{\tilde{p}}_{0} [threadIdx . x] . r - {\tilde{h}}^{i} C_{\tilde{p}}_{0} [threadIdx . x] . i)

Σ^{i} + = C_α_{pth} [n_{z}] ({\tilde{h}}^{r} C_{\tilde{p}}_{0} [threadIdx . x] . i + {\tilde{h}}^{i} C_{\tilde{p}}_{0} [threadIdx . x] . r)

10: endfor

11:

D_{\tilde{u}}_{pth}^{'} [blockIdx . y] [blockIdx . x] [threadIdx . x] . r + = Σ^{r}

12:

D_{\tilde{u}}_{pth}^{'} [blockIdx . y] [blockIdx . x] [threadIdx . x] . i + = Σ^{i}

Open in a new tab

Implementation of the backprojection operator was similar except the dividing and looping were over the vector ${\tilde{u}}_{sph}$ instead of α_sph. The pseudocodes for the backprojection operation are provided in Algorithms 10, 11, 12.

Table .

Algorithm X. Implementation of $α_{sph}^{'} = H_{sph}^{†} \tilde{u}$ (on host).

Input:

\tilde{u}

{\tilde{p}}_{0}

Output:

α_{sph}^{'}

1: forn_pth = 0 toN_pth − 1 do

2: parm_bwdarg[n_pth].n_pth = n_pth

3: parm_bwdarg[n_pth].

{\tilde{p}}_{0}

= &

{\tilde{p}}_{0} [0]

4: parm_bwdarg[n_pth].

{\tilde{u}}_{pth}

\tilde{u} [n_{pth} N_{r} N_{v} N_{f} / N_{pth}]

5: parm_bwdarg[n_pth].

α_{pth}^{''}

& α_{sph}^{''} [n_{pth} N_{x} N_{y} N_{z}]

{Pass addresses of arrays to each pthread}

6: pthread_create(&pthreads[n_pth], NULL, bwd_pthread,

(void *)(parm_bwdarg+n_pth)) {Call function bwd_pthread}

7: endfor

8: forn_pth = 0 toN_pth − 1 do

9: forn = 0 toN_xN_yN_zdo

10:

α_{sph}^{'} [n] + = α_{sph}^{''} [n + n_{pth} N_{x} N_{y} N_{z}]

11: endfor

12: endfor

Open in a new tab

Table .

Algorithm XI. Implementation of function bwd_pthread (on host).

Input:n_pth,

{\tilde{p}}_{0}

{\tilde{u}}_{pth}

α_{pth}^{''}

Output:

α_{pth}^{''}

{C_\tilde{p}}_{0} \leftarrow {\tilde{p}}_{0}

{Copy from host to constant memory}

2: forn_r = 0 toN_r/N_pth − 1 do

θ^{s} = (n_{r} + n_{pth} N_{r} / N_{pth}) Δ_{θ^{s}} + θ_{\min}^{s}

; z^s = R^scos θ^s; r^s = R^ssin θ^s

4: forn_v = 0 toN_v − 1 do

ϕ^{s} = n_{v} Δ_{ϕ^{s}} + ϕ_{\min}^{s}

; x^s = r^scos ϕ^s; y^s = r^ssin ϕ^s

C_{\tilde{u}}_{pth} \leftarrow {\tilde{u}}_{pth} [n_{r}] [n_{v}] [:]

{Copy from host to constant memory}

7: K_bwdsph ⟨⟨⟨ (N_y, N_x), (N_z, 1, 1) ⟩⟩⟩ (x^s, y^s, z^s D_

α_{pth}^{''}

)

8: endfor

9: endfor

10:

α_{pth}^{''} \leftarrow D_α_{pth}^{''}

{Copy from global memory to host}

Open in a new tab

Table .

Algorithm XII. Implementation of Kernel K_bwdsph ⟨⟨⟨ (N_y, N_x), (N_z, 1, 1) ⟩⟩⟩.

Input:x^s, y^s, z^s, D_

α_{pth}^{''}

, C_

{\tilde{u}}_{pth}

, C_

{\tilde{p}}_{0}

Output: D_

α_{pth}^{''}

1: x = (blockIdx.y)Δ_s + x_min; y = (blockIdx.x)Δ_s + y_min; z = (threadIdx.x)Δ_s + z_min

2: d = ((x − x^s)² + (y − y^s)² + (z − z^s)²)^1/2; Σ = 0 {Initiate the partial summation}

3: forn_f = 0 toN_f − 1 do

4: f = n_fΔ_f + f_min

{\tilde{h}}^{r} = \cos (2 π f d / c_{0}) / (2 π d)

;

{\tilde{h}}^{i} = - \sin (2 π f d / c_{0}) / (2 π d)

{Calculate SIR; Example here assumes point-like transducers}

Σ + = C_{\tilde{u}}_{pth} [n_{f}] . r ({\tilde{h}}^{r} C_{\tilde{p}}_{0} [n_{f}] . r - {\tilde{h}}^{i} C_{\tilde{p}}_{0} [n_{f}] . i) + C_{\tilde{u}}_{pth} [n_{f}] . i ({\tilde{h}}^{i} C_{\tilde{p}}_{0} [n_{f}] . r + {\tilde{h}}^{r} C_{\tilde{p}}_{0} [n_{f}] . i)

7: endfor

D_α_{pth}^{''} [blockIdx . y] [blockIdx . x] [threadIdx . x] + = Σ

Open in a new tab

DESCRIPTIONS OF COMPUTER SIMULATION AND EXPERIMENTAL STUDIES

The computational efficiency and accuracy of the proposed GPU-based implementations of the FBP algorithm and projection/backprojection operators for use with iterative image reconstruction algorithms were quantified in computer simulation and experimental OAT imaging studies.

Computer-simulation studies

Numerical phantom

The numerical phantom consisted of nine uniform spheres that were blurred by a 3D Gaussian kernel possessing a full width at half maximum (FWHM) of 0.77 mm. The phantom was contained within a cuboid of size 29.4 × 29.4 × 61.6 mm³. A 2D image corresponding to the plane y = 0 through the phantom is shown in Fig. 2a.

Slices corresponding to the plane y = 0 of (a) the phantom and the images reconstructed by use of (b) the CPU-based and (c) the GPU-based implementations of the FBP algorithm from the “128 × 90”-data.

Simulated projection data

The measurement surface was a sphere of radius R^s = 65 mm corresponding to an existing OAT imaging system.²²^,⁴⁸ As described in Sec. 3, ideal point-like transducers were uniformly distributed over 128 rings and 90 tomographic views. The 128 rings covered the full π polar angle, i.e., $θ_{\min}^{s} = π / 256$ , while the 90 views covered the full 2π azimuth angle. The speed of sound was set at c₀ = 1.54 mm/μs. We selected the Grüneisen coefficient as $Γ = β c_{0}^{2} / C_{p} = 2, 000$ of arbitrary units (a.u.). For each transducer, we analytically calculated 1022 temporal samples of the pressure function at the sampling rate of f_sam = 20 MHz by use of Eq. 1. Because we employed a smooth object function, the pressure data were calculated by the following two steps: First, we calculated temporal samples of pressure function p_us(r^s, t) that corresponds to the nine uniform spheres by¹^,³⁶

p_{us} (r^{s}, t) |_{t = k Δ_{t}} = \sum_{i = 0}^{8} \{\begin{matrix} A_{i} {[- \frac{β c_{0}^{3}}{C_{p} | r^{s} - r_{i} |} t + \frac{β c_{0}^{2}}{2}]}_{t = k Δ_{t}}, & if | c_{0} k Δ_{t} - | r^{s} - r_{i} | | \leq R_{i} \\ 0, & otherwise \end{matrix},

(28)

where r_i, R_i, and A_i denote the center location, the radius, and the absorbed energy density of the ith sphere, respectively. Subsequently, we convolved p_us(r^s, t) with a 1D Gaussian kernel with FWHM = 0.5 μs (Ref. ⁴⁹) to produce the pressure data. From the simulated pressure data, we calculated the temporal-frequency spectrum by use of FFT, from which we created an alternative data vector that contained 511 frequency components occupying (0, 5] MHz for each transducer. The simulated projection data in either the time domain or the temporal frequency domain will hereafter be referred to as “128 × 90”-data. By undersampling the “128 × 90”-data uniformly over rings and tomographic views, we created three subsets that contained varying number of transducers. These data sets will be referred to as “64 × 90”-data, “64 × 45”-data, and “32 × 45”-data, where the two numbers specify the number of rings and the number of tomographic views, respectively.

Reconstruction algorithms

The GPU accelerated FBP algorithm was employed to reconstruct the object function sampled on a 3D Cartesian grid with spacing Δ_s = 0.14 mm. The dimension of the reconstructed images ${\hat{α}}_{fbp}$ was 210 × 210 × 440.

We employed an iterative image reconstruction algorithm that sought to minimize a penalized least-squares (PLS) objective.⁴⁵^,⁵⁰ Two versions of the reconstruction algorithm were developed that utilized the interpolation-based imaging model and the spherical-voxel-based imaging model, respectively. The two versions sought to solve the optimization problems by use of the linear conjugate gradient (CG) method⁵¹^,⁵²

{\hat{α}}_{int} = \arg \min_{α_{int}} {‖ u - H_{int} α_{int} ‖}^{2} + μ R (α_{int})

(29)

and

{\hat{α}}_{sph} = \arg \min_{α_{sph}} {‖ \tilde{u} - H_{sph} α_{sph} ‖}^{2} + μ R (α_{sph}),

(30)

respectively, where R(α) is a regularizing penalty term whose impact is controlled by the regularization parameter μ. The penalty term was employed only when processing the experimental data as described in Sec. 4B. The reconstruction algorithms required computation of one projection and one backprojection operation at each iteration. Hereafter, the two reconstruction algorithms will be referred to as PLS-Int and PLS-Sph algorithms, respectively. We set Δ_s = 0.14 mm. Therefore, both the dimensions of ${\hat{α}}_{int}$ and ${\hat{α}}_{sph}$ were 210 × 210 × 440.

Performance assessment

We compared the computational times of 3D image reconstruction corresponding to the GPU- and CPU-based implementations. The CPU-based implementations of the PLS-Int and PLS-Sph algorithms take several days to complete a single iteration even for the “32 × 45”-data. Therefore, we only recorded the computational time for the CPU-based implementations to complete a single iteration when the data vector contained a single transducer. We assumed that the computational times were linearly proportional to the number of transducers in the data sets because the CPU-based implementations are sequential.

The GPU-based implementations employed the single-precision floating-point format rather than the conventional double-precision utilized by CPU-based implementations. In order to quantify how the single-precision floating-point format would degrade the image accuracy, we calculated the root-mean-square-error (RMSE) between the reconstructed image and the phantom defined by

RMSE = \sqrt{\frac{1}{N} {(\hat{α} - α)}^{T} (\hat{α} - α)},

(31)

where α and $\hat{α}$ are the samples of the phantom and the coefficients of the reconstructed images, respectively.

Hardware specifications

All implementations were tested on the platform consisted of dual quad-core Intel(R) Xeon (R) CPUs with a clock speed 2.40 GHz. The GPU-based implementations of the FBP and the PLS-Int algorithms were tested on a single Tesla C2050 GPU, while the PLS-Sph algorithm was tested on 8 Tesla C1060 GPUs.

Experimental studies

The FBP, PLS-Int, and PLS-Sph algorithms were investigated by use of an existing data set corresponding to a live mouse.²²^,⁴⁸ The scanning geometry and dimensions were the same as those employed in the computer-simulation studies except that only 64 rings were uniformly distributed over the polar angle ranging from 14° to 83°. The transducers were of size 2 × 2 mm². The raw data were acquired at 180 tomographic views, which are referred to as “full data.” We undersampled the “full data” uniformly over the tomographic views, constructing a subset containing 45 tomographic views. The subset will be referred to as “quarter data.”

Unlike in the idealized computer-simulation studies, the transducer response has to be compensated for when processing the experimental data. When implementing the FBP algorithm, the EIR was compensated for by a direct Fourier deconvolution, expressed in temporal frequency domain as³

\tilde{p} (r^{s}, f) = \frac{\tilde{u} (r^{s}, f)}{{\tilde{h}}^{e} (f)} \tilde{W} (f),

(32)

where $\tilde{W} (f)$ is a window function for noise suppression. In this study, we adopted the Hann window function defined as

\tilde{W} (f) = \frac{1}{2} [1 - \cos (π \frac{f_{c} - f}{f_{c}})],

(33)

where the cutoff frequency was chosen as f_c = 5 MHz. When applying iterative image reconstruction algorithms, the transducer effects were implicitly compensated for during iteration by employing imaging models that incorporates the transducer characteristics.¹⁴^,²² We incorporated the EIR into the interpolation-based imaging model while incorporating both the EIR and the SIR into the spherical-voxel-based imaging model.

For both PLS-Int and PLS-Sph algorithms, we employed a quadratic smoothness penalty to mitigate measurement noise⁵⁰

\begin{matrix} R (α) & = & \sum_{n = 0}^{N - 1} {({[α]}_{n} - {[α]}_{n_{x}})}^{2} + {({[α]}_{n} - {[α]}_{n_{y}})}^{2} \\ + {({[α]}_{n} - {[α]}_{n_{z}})}^{2}, \end{matrix}

(34)

where n_x, n_y, and n_z were the indices of the neighboring voxels before the nth voxel along the three Cartesian axes, respectively.

RESULTS

Computational efficiency

As shown in Table 1, the GPU-based implementations took less than 0.1%, 0.8%, and 0.4% of the computational times required by corresponding CPU-based implementations for the FBP, the PLS-Int, and the PLS-Sph algorithms, respectively. The relative computational times for the GPU-based implementations are nearly linearly proportional to the amount of data. Note that the “64 × 90”-data and the “quarter data” are of the same size. However, the computational times of the “quarter data” are more than 1.8 times those of the “64 × 90”-data. This is because the calculation of the SIR increases the computational complexity of the reconstruction algorithm.

Table 1.

Computational times of the 3D image reconstructions by use of the CPU- and GPU-based implementations.

Data sets	FBP [s]		PLS-Int [min/iteration]		PLS-Sph [min/iteration]
Data sets	CPU	GPU	CPU	GPU	CPU	GPU
“32 × 45”	6189	6	2448	20	7961	22
“64 × 45”	12 975	12	4896	35	15 923	43
“64 × 90”	26 190	23	9792	68	31 845	86
“128 × 90”	53 441	46	…	…	…	…
“Quarter data”	12 975	12	4896	35	19 776	78
“Full data”	53 441	46	19 968	137	79 177	313

Open in a new tab

Computational accuracy

Images reconstructed by use of the CPU- and GPU-based implementations of the FBP algorithm are almost identical. From the “128 × 90”-data, in which case, transducers were densely distributed over the measurement surface, both implementations reconstructed accurate images, as shown in Figs. 2b, 2c. The profiles along the three arrows in Fig. 2 are plotted in Fig. 5a, suggesting a nearly exact reconstruction. As expected, when the amount of measurement data are reduced, the reconstructed images contain more artifacts as shown in Fig. 3. However, the images reconstructed by use of GPU- and CPU-based implementations remain indistinguishable. The plots of the RMSE versus the amount of measurement data employed in Fig. 6 overlap, also suggesting the single-precision floating-point format employed by the GPU-based implementation has little impact on the computational accuracy.

Profiles along the line (x, y) = (−6.58, 0) mm of the images reconstructed by use of (a) the CPU- and GPU-based implementations of the FBP algorithm from the “128 × 90”-data, and (b) the GPU-based implementations of the PLS-Int and the PLS-Sph algorithms from the “64 × 90”-data.

Slices corresponding to the plane y = 0 of the images reconstructed by use of the FBP algorithm with (a) the CPU-based implementation from the “64 × 90”-data, (b) the CPU-based implementation from the “64 × 45”-data, (c) the CPU-based implementation from the “32 × 45”-data, (d) the GPU-based implementation from the “64 × 90”-data, (e) the GPU-based implementation from the “64 × 45”-data, and (f) the GPU-based implementation from the “32 × 45”-data.

Plots of the RMSE against the amount of data by use of the FBP, the PLS-Int and the PLS-Sph algorithms.

The GPU-based implementations of the PLS-Int and PLS-Sph algorithms both reconstructed accurate images as displayed in Fig. 4. As expected, the images reconstructed by use of both iterative algorithms contain fewer artifacts than those reconstructed by use of the FBP algorithm from the same amount of data. Unlike the images reconstructed by use of the FBP algorithm from the “64 × 90”-data [Figs. 3a or 3d], the images reconstructed by use of both iterative algorithms [Figs. 4a, 4d] appear to be identical to the numerical phantom. The profiles along the two arrows in Figs. 4a, 4d are plotted in Fig. 5b, further confirming the computational accuracy of iterative image reconstruction algorithms. The plots of the RMSE versus the amount of measurement data employed in Fig. 6 suggest the iterative image reconstruction algorithms in general outperform the FBP algorithm from the same amount of data.

Slices corresponding to the plane y = 0 of the images reconstructed by use of the GPU-based implementations of (a) the PLS-Int algorithm from the “64 × 90”-data, (b) the PLS-Int algorithm from the “64 × 45”-data, (c) the PLS-Int algorithm from the “32 × 45”-data, (d) the PLS-Sph algorithm from the “64 × 90”-data, (e) the PLS-Sph algorithm from the “64 × 45”-data, and (f) the PLS-Sph algorithm from the “32 × 45”-data.

Experimental results

The maximum intensity projection (MIP) of the 3D mouse images reconstructed by use of the GPU-based implementations reveal the mouse body vasculature as shown in Fig. 7. Images reconstructed by use of both the PLS-Int and the PLS-Sph algorithms appear to have cleaner background than the images reconstructed by use of the FBP algorithm from the same amount of data. All images reconstructed by iterative algorithms were obtained by 20-iterations starting with uniform zeros as the initial guess. The PLS-Int algorithm took approximately a half day and 2 days to process the “quarter data” and the “full data,” respectively. The PLS-Sph algorithm took approximately 1 day and 4 days to process the “quarter data” and the “full data,” respectively. Alternatively, if the CPU-based implementations were utilized, the PLS-Int algorithm would take an estimated 68 days and 277 days to process the “quarter data” and the “full data,” respectively. The PLS-Sph algorithm would take an estimated 275 days and 1100 days to process the “quarter data” and the “full data,” respectively.

MIP renderings of the 3D images of the mouse body reconstructed by use of the GPU-based implementations of (a) the FBP algorithm from the “full data,” (b) the PLS-Int algorithm from the “full data” with μ = 1.0 × 10⁴, (c) the PLS-Sph algorithm from the “full data” with μ = 1.0 × 10⁴, (d) the FBP algorithm from the “quarter data,” (e) the PLS-Int algorithm from the “quarter data” with μ = 1.0 × 10³, and (f) the PLS-Sph algorithm from the “quarter data” with μ = 1.0 × 10³. The grayscale window is [0,12.0].

DISCUSSION AND CONCLUSION

In this study, we developed and investigated GPU-based implementations of the FBP algorithm and two pairs of projection/backprojection operators for 3D OAT. Our implementation of the FBP algorithm improved the computational efficiency over 1000 times compared to the CPU-based implementation. This work complements our earlier studies that demonstrated the feasibility of 3D iterative image reconstruction in practice.²¹^,²²

Our current implementations of the iterative image reconstruction algorithms still require several days to process the densely sampled data set, which, however, can be further improved. First, the amount of measurement data required for accurate image reconstruction can be further reduced by developing advanced image reconstruction methods.¹³^,¹⁵^,²²^,⁵³ Second, the number of iterations required can be reduced by developing fast-converging optimization algorithms.²²^,⁵⁴^,⁵⁵

The proposed parallelization strategies by use of GPUs are of general interest. The implementation of the FBP algorithm⁶ can be adapted to other analytic image reconstruction algorithms, including those described in Refs. ⁵^,⁷^,⁵⁶^,⁵⁷^,⁵⁸. We demonstrated the feasibility of PLS algorithm that utilized the proposed GPU-based implementations of the projection/backprojection operators. By use of these implementations, many advanced image reconstruction algorithms may also be feasible in practice.²² Though we described our parallelization strategies for the projection/backprojection operators that utilized two discrete-to-discrete imaging models, these strategies can also be applied to other D-D imaging models.⁹^,¹¹^,²⁰^,⁵⁹ Therefore, the proposed algorithms will facilitate the further investigation and application of advanced image reconstruction algorithms in 3D OAT.

ACKNOWLEDGMENT

This research was supported in part by the National Institutes of Health (NIH) Award Nos. EB010049 and CA167446.

APPENDIX: DISCRETIZATION OF SRT

Derivation of Eq. 12

The integrated data function g(r^s, t) in Eq. 2, evaluated at the qth transducer and the kth time instance, can be expressed as

g (r_{q}^{s}, t) |_{t = k Δ_{t}} = \int_{| r_{q}^{s} - r | = k c_{0} Δ_{t}} d r A (r),

(A1)

where $r_{q}^{s}$ denotes the location of the qth point-like transducer. We defined a local coordinate system, distinguished by a superscript “tr,” centered at the qth transducer with the z^tr-axis pointing to the origin of the global coordinate system as shown in Fig. 1b. Assuming the object function A(r) is compactly supported in a sphere of radius R, the integral surface is symmetric about the z^tr-axis. Thus, the orientations of the x^tr- and y^tr-axes can be arbitrary within the z^tr = 0 plane. Representing the right-hand side of Eq. A1 in the local spherical coordinate system, one obtains

\begin{matrix} g (r_{q}^{s}, t) |_{t = k Δ_{t}} & = & {(k c_{0} Δ_{t})}^{2} \int_{0}^{θ_{\max}^{tr}} d θ^{tr} \sin θ^{tr} \\ \times \int_{0}^{2 π} d ϕ^{tr} A (k c_{0} Δ_{t}, θ^{tr}, ϕ^{tr}), \end{matrix}

(A2)

where $θ_{\max}^{tr}$ is half of the apex angle of the cone that corresponds to the intersectional spherical cap as shown in Fig. 1b. The polar angle θ^tr and the azimuth angle ϕ^tr were discretized with intervals $Δ_{θ^{tr}}$ and $Δ_{ϕ^{tr}}$ that satisfied

k c_{0} Δ_{t} Δ_{θ^{tr}} = k c_{0} Δ_{t} \sin θ^{tr} Δ_{ϕ^{tr}} = Δ_{s} .

(A3)

Therefore, Eq. A2 can be approximated by

g (r_{q}^{s}, t) |_{t = k Δ_{t}} \approx Δ_{s}^{2} \sum_{i = 0}^{N_{i} - 1} \sum_{j = 0}^{N_{j} - 1} A (k c_{0} Δ_{t}, θ_{i}^{tr}, ϕ_{j}^{tr}),

(A4)

where $N_{i} = ⌊ θ_{\max}^{tr} / Δ_{θ^{tr}} ⌋$ , $N_{j} = ⌊ 2 π / Δ_{ϕ^{tr}} ⌋$ , $θ_{i}^{tr} = i Δ_{θ^{tr}}$ , and $ϕ_{j}^{tr} = j Δ_{ϕ^{tr}}$ . We denoted by r_{k, i, j} the location in the global coordinate system corresponding to the location vector $(k c_{0} Δ_{t}, θ_{i}^{tr}, ϕ_{j}^{tr})$ in the local coordinate system in Eq. A4. On substitution from the finite-dimensional representation Eq. 6 into Eq. A4 with α and ψ_n(r) defined by Eqs. 9, 10, respectively, we obtained

\begin{matrix} g (r_{q}^{s}, t) |_{t = k Δ_{t}} & \approx & Δ_{s}^{2} \sum_{n = 0}^{N - 1} {[α_{int}]}_{n} \sum_{i = 0}^{N_{i} - 1} \sum_{j = 0}^{N_{j} - 1} ψ_{n}^{int} (r_{k, i, j}) \\ \equiv & {[g]}_{q K + k} . \end{matrix}

(A5)

References

Oraevsky A. A. and Karabutov A. A., “Optoacoustic tomography,” in Biomedical Photonics Handbook, edited by Vo-Dinh T. (CRC, Boca Raton, FL, 2003), Chap. 34. [Google Scholar]
Wang L. V., “Tutorial on photoacoustic microscopy and computed tomography,” IEEE J. Sel. Top. Quantum Electron. 14, 171–179 (2008). 10.1109/JSTQE.2007.913398 [DOI] [Google Scholar]
Kruger R., Reinecke D., and Kruger G., “Thermoacoustic computed tomography: Technical considerations,” Med. Phys. 26, 1832–1837 (1999). 10.1118/1.598688 [DOI] [PubMed] [Google Scholar]
Cox B. T., Arridge S. R., Köstli K. P., and Beard P. C., “Two-dimensional quantitative photoacoustic image reconstruction of absorption distributions in scattering media by use of a simple iterative method,” Appl. Opt. 45, 1866–1875 (2006). 10.1364/AO.45.001866 [DOI] [PubMed] [Google Scholar]
Kunyansky L. A., “Explicit inversion formulae for the spherical mean Radon transform,” Inverse Probl. 23, 373–383 (2007). 10.1088/0266-5611/23/1/021 [DOI] [Google Scholar]
Finch D., Patch S., and Rakesh, “Determining a function from its mean values over a family of spheres,” SIAM J. Math. Anal. 35, 1213–1240 (2004). 10.1137/S0036141002417814 [DOI] [Google Scholar]
Xu M. and Wang L. V., “Universal back-projection algorithm for photoacoustic computed tomography,” Phys. Rev. E 71, 016706 (2005). 10.1103/PhysRevE.71.016706 [DOI] [PubMed] [Google Scholar]
Xu Y., Feng D., and Wang L. V., “Exact frequency-domain reconstruction for thermoacoustic tomography: I. Planar geometry,” IEEE Trans. Med. Imaging 21, 823–828 (2002). 10.1109/TMI.2002.801172 [DOI] [PubMed] [Google Scholar]
Paltauf G., Viator J. A., Prahl S. A., and Jacques S. L., “Iterative reconstruction algorithm for optoacoustic imaging,” J. Acoust. Soc. Am. 112, 1536–1544 (2002). 10.1121/1.1501898 [DOI] [PubMed] [Google Scholar]
Yuan Z. and Jiang H., “Three-dimensional finite-element-based photoacoustic tomography: Reconstruction algorithm and simulations,” Med. Phys. 34, 538–546 (2007). 10.1118/1.2409234 [DOI] [PubMed] [Google Scholar]
Ephrat P., Keenliside L., Seabrook A., Prato F. S., and Carson J. J. L., “Three-dimensional photoacoustic imaging by sparse-array detection and iterative image reconstruction,” J. Biomed. Opt. 13, 054052 (2008). 10.1117/1.2992131 [DOI] [PubMed] [Google Scholar]
Zhang J., Anastasio M., La Riviere P., and Wang L., “Effects of different imaging models on least-squares image reconstruction accuracy in photoacoustic tomography,” IEEE Trans. Med. Imaging 28, 1781–1790 (2009). 10.1109/TMI.2009.2024082 [DOI] [PubMed] [Google Scholar]
Provost J. and Lesage F., “The application of compressed sensing for photo-acoustic tomography,” IEEE Trans. Med. Imaging 28, 585–594 (2009). 10.1109/TMI.2008.2007825 [DOI] [PubMed] [Google Scholar]
Wang K., Ermilov S. A., Su R., Brecht H.-P., Oraevsky A. A., and Anastasio M. A., “An imaging model incorporating ultrasonic transducer properties for three-dimensional optoacoustic tomography,” IEEE Trans. Med. Imaging 30, 203–214 (2011). 10.1109/TMI.2010.2072514 [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo Z., Li C., Song L., and Wang L. V., “Compressed sensing in photoacoustic tomography in vivo,” J. Biomed. Opt. 15, 021311 (2010). 10.1117/1.3381187 [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang C., Oraevsky A. A., and Anastasio M. A., “Investigation of limited-view image reconstruction in optoacoustic tomography employing a priori structural information,” Proc. SPIE 7800, 780004 (2010). 10.1117/12.861005 [DOI] [Google Scholar]
Xu Z., Li C., and Wang L. V., “Photoacoustic tomography of water in phantoms and tissue,” J. Biomed. Opt. 15, 036019 (2010). 10.1117/1.3443793 [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu Z., Zhu Q., and Wang L. V., “In vivo photoacoustic tomography of mouse cerebral edema induced by cold injury,” J. Biomed. Opt. 16, 066020 (2011). 10.1117/1.3584847 [DOI] [PMC free article] [PubMed] [Google Scholar]
Buehler A., Rosenthal A., Jetzfellner T., Dima A., Razansky D., and Ntziachristos V., “Model-based optoacoustic inversions with incomplete projection data,” Med. Phys. 38, 1694–1704 (2011). 10.1118/1.3556916 [DOI] [PubMed] [Google Scholar]
Bu S., Liu Z., Shiina T., Kondo K., Yamakawa M., Fukutani K., Someda Y., and Asao Y., “Model-based reconstruction integrated with fluence compensation for photoacoustic tomography,” IEEE Trans. Biomed. Eng. 59, 1354–1363 (2012). 10.1109/TBME.2012.2187649 [DOI] [PubMed] [Google Scholar]
Wang K., Su R., Oraevsky A. A., and Anastasio M. A., “Investigation of iterative image reconstruction in optoacoustic tomography,” Proc. SPIE 8223, 82231Y (2012). 10.1117/12.909610 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang K., Su R., Oraevsky A. A., and Anastasio M. A., “Investigation of iterative image reconstruction in three-dimensional optoacoustic tomography,” Phys. Med. Biol. 57, 5399 (2012). 10.1088/0031-9155/57/17/5399 [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang C., Nie L., Schoonover R. W., Guo Z., Schirra C. O., Anastasio M. A., and Wang L. V., “Aberration correction for transcranial photoacoustic tomography of primates employing adjunct image data,” J. Biomed. Opt. 17, 066016 (2012). 10.1117/1.JBO.17.6.066016 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dean-Ben X., Buehler A., Ntziachristos V., and Razansky D., “Accurate model-based reconstruction algorithm for three-dimensional optoacoustic tomography,” IEEE Trans. Med. Imaging 31, 1922–1928 (2012). 10.1109/TMI.2012.2208471 [DOI] [PubMed] [Google Scholar]
Huang C., Nie L., Schoonover R. W., Wang L. V., and Anastasio M. A., “Photoacoustic computed tomography correcting for heterogeneity and attenuation,” J. Biomed. Opt. 17, 061211 (2012). 10.1117/1.JBO.17.6.061211 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang B., Xiang L., Jiang M. S., Yang J., Zhang Q., Carney P. R., and Jiang H., “Photoacoustic tomography system for noninvasive real-time three-dimensional imaging of epilepsy,” Biomed. Opt. Express 3, 1427–1432 (2012). 10.1364/BOE.3.001427 [DOI] [PMC free article] [PubMed] [Google Scholar]
Buehler A., Deán-Ben X. L., Claussen J., Ntziachristos V., and Razansky D., “Three-dimensional optoacoustic tomography at video rate,” Opt. Express 20, 22712–22719 (2012). 10.1364/OE.20.022712 [DOI] [PubMed] [Google Scholar]
Lindholm E., Nickolls J., Oberman S., and Montrym J., “Nvidia tesla: A unified graphics and computing architecture,” IEEE MICRO 28, 39–55 (2008). 10.1109/MM.2008.31 [DOI] [Google Scholar]
NVIDIA CUDA Programming Guide 2.0 NVIDIA, 2008.
Zhao X., Hu J.-J., and Zhang P., “GPU-based 3D cone-beam CT image reconstruction for large data volume,” J. Biomed. Imaging 2009(8), 1–8 (2009). 10.1155/2009/149079 [DOI] [PMC free article] [PubMed] [Google Scholar]
Okitsu Y., Ino F., and Hagihara K., “High-performance cone beam reconstruction using CUDA compatible GPUs,” Parallel Comput. 36, 129–141 (2010). 10.1016/j.parco.2010.01.004 [DOI] [Google Scholar]
Chou C.-Y., Chuo Y.-Y., Hung Y., and Wang W., “A fast forward projection using multithreads for multirays on GPUs in medical image reconstruction,” Med. Phys. 38, 4052–4065 (2011). 10.1118/1.3591994 [DOI] [PubMed] [Google Scholar]
Stone S., Haldar J., Tsao S., Hwu W.-M. W., Sutton B., and Liang Z.-P., “Accelerating advanced MRI reconstructions on GPUs,” J. Parallel Distrib. Comput. 68, 1307–1318 (2008). 10.1016/j.jpdc.2008.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
Treeby B. E. and Cox B. T., “k-wave: plbibsc-matlab toolbox for the simulation and reconstruction of photoacoustic wave fields,” J. Biomed. Opt. 15, 021314 (2010). 10.1117/1.3360308 [DOI] [PubMed] [Google Scholar]
Barrett H. and Myers K., Foundations of Image Science, Wiley Series in Pure and Applied Optics (John Wiley & Son, Hoboken, NJ, 2004). [Google Scholar]
Wang L. V. and Wu H.-I., Biomedical Optics, Principles and Imaging (Wiley, Hoboken, NJ, 2007). [Google Scholar]
Xu M. and Wang L. V., “Photoacoustic imaging in biomedicine,” Rev. Sci. Instrum. 77, 041101–041122 (2006). 10.1063/1.2195024 [DOI] [Google Scholar]
Wang K., and Anastasio M. A., “Photoacoustic and thermoacoustic tomography: Image formation principles,” in Handbook of Mathematical Methods in Imaging, edited by Scherzer O. (Springer, New York, NY, 2011), Chap. 18. [Google Scholar]
Rosenthal A., Ntziachristos V., and Razansky D., “Optoacoustic methods for frequency calibration of ultrasonic sensors,” IEEE Trans. Ultrason. Ferroelectr. Freq. Control 58, 316–326 (2011). 10.1109/TUFFC.2011.1809 [DOI] [PubMed] [Google Scholar]
Kak A. C. and Slaney M., Principles of Computerized Tomographic Imaging (IEEE, New York, NY, 1988). [Google Scholar]
Anastasio M. A., Zhang J., Pan X., Zou Y., Keng G., and Wang L. V., “Half-time image reconstruction in thermoacoustic tomography,” IEEE Trans. Med. Imaging 24, 199–210 (2005). 10.1109/TMI.2004.839682 [DOI] [PubMed] [Google Scholar]
Anastasio M., Zhang J., Sidky E., Zou Y., Xia D., and Pan X., “Feasibility of half-data image reconstruction in 3-D reflectivity tomography with a spherical aperture,” IEEE Trans. Med. Imaging 24, 1100–1112 (2005). 10.1109/TMI.2005.852055 [DOI] [PubMed] [Google Scholar]
Claerbout J. F., Earth Sounding Analysis: Processing Versus Inversion (Blackwell Scientific, Cambridge, MA, 1992). [Google Scholar]
Khokhlova T. D., Pelivanov I. M., Kozhushko V. V., Zharinov A. N., Solomatin V. S., and Karabutov A. A., “Optoacoustic imaging of absorbing objects in a turbid medium: Ultimate sensitivity and application to breast cancer diagnostics,” Appl. Opt. 46, 262–272 (2007). 10.1364/AO.46.000262 [DOI] [PubMed] [Google Scholar]
Aarsvold J. N., Emission Tomography: The Fundamentals of PET and SPECT (Elsevier Academic, San Diego, CA, 2004). [Google Scholar]
Zeng G. and Gullberg G., “Unmatched projector/backprojector pairs in an iterative reconstruction algorithm,” IEEE Trans. Med. Imaging 19, 548–555 (2000). 10.1109/42.870265 [DOI] [PMC free article] [PubMed] [Google Scholar]
Morton K. W. and Mayers D. F., Numerical Solution of Partial Differential Equations: An Introduction (Cambridge University Press, New York, NY, 2005). [Google Scholar]
Brecht H.-P., Su R., Fronheiser M., Ermilov S. A., Conjusteau A., and Oraevsky A. A., “Whole-body three-dimensional optoacoustic tomography system for small animals,” J. Biomed. Opt. 14, 064007 (2009). 10.1117/1.3259361 [DOI] [PMC free article] [PubMed] [Google Scholar]
Anastasio M. A., Zhang J., Modgil D., and La Riviere P., “Application of inverse source concepts to photoacoustic tomography,” Inverse Probl. 23, S21–S35 (2007). 10.1088/0266-5611/23/6/S03 [DOI] [Google Scholar]
Fessler J. A., “Penalized weighted least-squares reconstruction for positron emission tomography,” IEEE Trans. Med. Imaging 13, 290–300 (1994). 10.1109/42.293921 [DOI] [PubMed] [Google Scholar]
Shewchuk J. R., “An introduction to the conjugate gradient method without the agonizing pain,” Technical Report No. CMU-CS-94-125 (Carnegie Mellon University, Pittsburgh, PA, 1994).
Fessler J. and Booth S., “Conjugate-gradient preconditioning methods for shift-variant PET image reconstruction,” IEEE Trans. Image Process. 8, 688–699 (1999). 10.1109/83.760336 [DOI] [PubMed] [Google Scholar]
Meng J., Wang L. V., Ying L., Liang D., and Song L., “Compressed-sensing photoacoustic computed tomography in vivo with partially known support,” Opt. Express 20, 16510–16523 (2012). 10.1364/OE.20.016510 [DOI] [Google Scholar]
Beck A. and Teboulle M., “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. Image Process. 18, 2419–2434 (2009). 10.1109/TIP.2009.2028250 [DOI] [PubMed] [Google Scholar]
Boyd S., Parikh N., Chu E., Peleato B., and Eckstein J., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn. 3, 1–122 (2011). 10.1561/2200000016 [DOI] [Google Scholar]
Xu M. and Wang L. V., “Time-domain reconstruction for thermoacoustic tomography in a spherical geometry,” IEEE Trans. Med. Imaging 21, 814–822 (2002). 10.1109/TMI.2002.801176 [DOI] [PubMed] [Google Scholar]
Finch D., Haltmeier M., and Rakesh, “Inversion of spherical means and the wave equation in even dimensions,” SIAM J. Appl. Math. 68, 392–412 (2007). 10.1137/070682137 [DOI] [Google Scholar]
Elbau P., Scherzer O., and Schulze R., “Reconstruction formulas for photoacoustic sectional imaging,” Inverse Probl. 28, 045004 (2012). 10.1088/0266-5611/28/4/045004 [DOI] [Google Scholar]
Rosenthal A., Razansky D., and Ntziachristos V., “Fast semi-analytical model-based acoustic inversion for quantitative optoacoustic tomography,” IEEE Trans. Med. Imaging 29, 1275–1285 (2010). 10.1109/TMI.2010.2044584 [DOI] [PubMed] [Google Scholar]

[c1] Oraevsky A. A. and Karabutov A. A., “Optoacoustic tomography,” in Biomedical Photonics Handbook, edited by Vo-Dinh T. (CRC, Boca Raton, FL, 2003), Chap. 34. [Google Scholar]

[c2] Wang L. V., “Tutorial on photoacoustic microscopy and computed tomography,” IEEE J. Sel. Top. Quantum Electron. 14, 171–179 (2008). 10.1109/JSTQE.2007.913398 [DOI] [Google Scholar]

[c3] Kruger R., Reinecke D., and Kruger G., “Thermoacoustic computed tomography: Technical considerations,” Med. Phys. 26, 1832–1837 (1999). 10.1118/1.598688 [DOI] [PubMed] [Google Scholar]

[c4] Cox B. T., Arridge S. R., Köstli K. P., and Beard P. C., “Two-dimensional quantitative photoacoustic image reconstruction of absorption distributions in scattering media by use of a simple iterative method,” Appl. Opt. 45, 1866–1875 (2006). 10.1364/AO.45.001866 [DOI] [PubMed] [Google Scholar]

[c5] Kunyansky L. A., “Explicit inversion formulae for the spherical mean Radon transform,” Inverse Probl. 23, 373–383 (2007). 10.1088/0266-5611/23/1/021 [DOI] [Google Scholar]

[c6] Finch D., Patch S., and Rakesh, “Determining a function from its mean values over a family of spheres,” SIAM J. Math. Anal. 35, 1213–1240 (2004). 10.1137/S0036141002417814 [DOI] [Google Scholar]

[c7] Xu M. and Wang L. V., “Universal back-projection algorithm for photoacoustic computed tomography,” Phys. Rev. E 71, 016706 (2005). 10.1103/PhysRevE.71.016706 [DOI] [PubMed] [Google Scholar]

[c8] Xu Y., Feng D., and Wang L. V., “Exact frequency-domain reconstruction for thermoacoustic tomography: I. Planar geometry,” IEEE Trans. Med. Imaging 21, 823–828 (2002). 10.1109/TMI.2002.801172 [DOI] [PubMed] [Google Scholar]

[c9] Paltauf G., Viator J. A., Prahl S. A., and Jacques S. L., “Iterative reconstruction algorithm for optoacoustic imaging,” J. Acoust. Soc. Am. 112, 1536–1544 (2002). 10.1121/1.1501898 [DOI] [PubMed] [Google Scholar]

[c10] Yuan Z. and Jiang H., “Three-dimensional finite-element-based photoacoustic tomography: Reconstruction algorithm and simulations,” Med. Phys. 34, 538–546 (2007). 10.1118/1.2409234 [DOI] [PubMed] [Google Scholar]

[c11] Ephrat P., Keenliside L., Seabrook A., Prato F. S., and Carson J. J. L., “Three-dimensional photoacoustic imaging by sparse-array detection and iterative image reconstruction,” J. Biomed. Opt. 13, 054052 (2008). 10.1117/1.2992131 [DOI] [PubMed] [Google Scholar]

[c12] Zhang J., Anastasio M., La Riviere P., and Wang L., “Effects of different imaging models on least-squares image reconstruction accuracy in photoacoustic tomography,” IEEE Trans. Med. Imaging 28, 1781–1790 (2009). 10.1109/TMI.2009.2024082 [DOI] [PubMed] [Google Scholar]

[c13] Provost J. and Lesage F., “The application of compressed sensing for photo-acoustic tomography,” IEEE Trans. Med. Imaging 28, 585–594 (2009). 10.1109/TMI.2008.2007825 [DOI] [PubMed] [Google Scholar]

[c14] Wang K., Ermilov S. A., Su R., Brecht H.-P., Oraevsky A. A., and Anastasio M. A., “An imaging model incorporating ultrasonic transducer properties for three-dimensional optoacoustic tomography,” IEEE Trans. Med. Imaging 30, 203–214 (2011). 10.1109/TMI.2010.2072514 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c15] Guo Z., Li C., Song L., and Wang L. V., “Compressed sensing in photoacoustic tomography in vivo,” J. Biomed. Opt. 15, 021311 (2010). 10.1117/1.3381187 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c16] Huang C., Oraevsky A. A., and Anastasio M. A., “Investigation of limited-view image reconstruction in optoacoustic tomography employing a priori structural information,” Proc. SPIE 7800, 780004 (2010). 10.1117/12.861005 [DOI] [Google Scholar]

[c17] Xu Z., Li C., and Wang L. V., “Photoacoustic tomography of water in phantoms and tissue,” J. Biomed. Opt. 15, 036019 (2010). 10.1117/1.3443793 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c18] Xu Z., Zhu Q., and Wang L. V., “In vivo photoacoustic tomography of mouse cerebral edema induced by cold injury,” J. Biomed. Opt. 16, 066020 (2011). 10.1117/1.3584847 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c19] Buehler A., Rosenthal A., Jetzfellner T., Dima A., Razansky D., and Ntziachristos V., “Model-based optoacoustic inversions with incomplete projection data,” Med. Phys. 38, 1694–1704 (2011). 10.1118/1.3556916 [DOI] [PubMed] [Google Scholar]

[c20] Bu S., Liu Z., Shiina T., Kondo K., Yamakawa M., Fukutani K., Someda Y., and Asao Y., “Model-based reconstruction integrated with fluence compensation for photoacoustic tomography,” IEEE Trans. Biomed. Eng. 59, 1354–1363 (2012). 10.1109/TBME.2012.2187649 [DOI] [PubMed] [Google Scholar]

[c21] Wang K., Su R., Oraevsky A. A., and Anastasio M. A., “Investigation of iterative image reconstruction in optoacoustic tomography,” Proc. SPIE 8223, 82231Y (2012). 10.1117/12.909610 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c22] Wang K., Su R., Oraevsky A. A., and Anastasio M. A., “Investigation of iterative image reconstruction in three-dimensional optoacoustic tomography,” Phys. Med. Biol. 57, 5399 (2012). 10.1088/0031-9155/57/17/5399 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c23] Huang C., Nie L., Schoonover R. W., Guo Z., Schirra C. O., Anastasio M. A., and Wang L. V., “Aberration correction for transcranial photoacoustic tomography of primates employing adjunct image data,” J. Biomed. Opt. 17, 066016 (2012). 10.1117/1.JBO.17.6.066016 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c24] Dean-Ben X., Buehler A., Ntziachristos V., and Razansky D., “Accurate model-based reconstruction algorithm for three-dimensional optoacoustic tomography,” IEEE Trans. Med. Imaging 31, 1922–1928 (2012). 10.1109/TMI.2012.2208471 [DOI] [PubMed] [Google Scholar]

[c25] Huang C., Nie L., Schoonover R. W., Wang L. V., and Anastasio M. A., “Photoacoustic computed tomography correcting for heterogeneity and attenuation,” J. Biomed. Opt. 17, 061211 (2012). 10.1117/1.JBO.17.6.061211 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c26] Wang B., Xiang L., Jiang M. S., Yang J., Zhang Q., Carney P. R., and Jiang H., “Photoacoustic tomography system for noninvasive real-time three-dimensional imaging of epilepsy,” Biomed. Opt. Express 3, 1427–1432 (2012). 10.1364/BOE.3.001427 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c27] Buehler A., Deán-Ben X. L., Claussen J., Ntziachristos V., and Razansky D., “Three-dimensional optoacoustic tomography at video rate,” Opt. Express 20, 22712–22719 (2012). 10.1364/OE.20.022712 [DOI] [PubMed] [Google Scholar]

[c28] Lindholm E., Nickolls J., Oberman S., and Montrym J., “Nvidia tesla: A unified graphics and computing architecture,” IEEE MICRO 28, 39–55 (2008). 10.1109/MM.2008.31 [DOI] [Google Scholar]

[c29] NVIDIA CUDA Programming Guide 2.0 NVIDIA, 2008.

[c30] Zhao X., Hu J.-J., and Zhang P., “GPU-based 3D cone-beam CT image reconstruction for large data volume,” J. Biomed. Imaging 2009(8), 1–8 (2009). 10.1155/2009/149079 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c31] Okitsu Y., Ino F., and Hagihara K., “High-performance cone beam reconstruction using CUDA compatible GPUs,” Parallel Comput. 36, 129–141 (2010). 10.1016/j.parco.2010.01.004 [DOI] [Google Scholar]

[c32] Chou C.-Y., Chuo Y.-Y., Hung Y., and Wang W., “A fast forward projection using multithreads for multirays on GPUs in medical image reconstruction,” Med. Phys. 38, 4052–4065 (2011). 10.1118/1.3591994 [DOI] [PubMed] [Google Scholar]

[c33] Stone S., Haldar J., Tsao S., Hwu W.-M. W., Sutton B., and Liang Z.-P., “Accelerating advanced MRI reconstructions on GPUs,” J. Parallel Distrib. Comput. 68, 1307–1318 (2008). 10.1016/j.jpdc.2008.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c34] Treeby B. E. and Cox B. T., “k-wave: plbibsc-matlab toolbox for the simulation and reconstruction of photoacoustic wave fields,” J. Biomed. Opt. 15, 021314 (2010). 10.1117/1.3360308 [DOI] [PubMed] [Google Scholar]

[c35] Barrett H. and Myers K., Foundations of Image Science, Wiley Series in Pure and Applied Optics (John Wiley & Son, Hoboken, NJ, 2004). [Google Scholar]

[c36] Wang L. V. and Wu H.-I., Biomedical Optics, Principles and Imaging (Wiley, Hoboken, NJ, 2007). [Google Scholar]

[c37] Xu M. and Wang L. V., “Photoacoustic imaging in biomedicine,” Rev. Sci. Instrum. 77, 041101–041122 (2006). 10.1063/1.2195024 [DOI] [Google Scholar]

[c38] Wang K., and Anastasio M. A., “Photoacoustic and thermoacoustic tomography: Image formation principles,” in Handbook of Mathematical Methods in Imaging, edited by Scherzer O. (Springer, New York, NY, 2011), Chap. 18. [Google Scholar]

[c39] Rosenthal A., Ntziachristos V., and Razansky D., “Optoacoustic methods for frequency calibration of ultrasonic sensors,” IEEE Trans. Ultrason. Ferroelectr. Freq. Control 58, 316–326 (2011). 10.1109/TUFFC.2011.1809 [DOI] [PubMed] [Google Scholar]

[c40] Kak A. C. and Slaney M., Principles of Computerized Tomographic Imaging (IEEE, New York, NY, 1988). [Google Scholar]

[c41] Anastasio M. A., Zhang J., Pan X., Zou Y., Keng G., and Wang L. V., “Half-time image reconstruction in thermoacoustic tomography,” IEEE Trans. Med. Imaging 24, 199–210 (2005). 10.1109/TMI.2004.839682 [DOI] [PubMed] [Google Scholar]

[c42] Anastasio M., Zhang J., Sidky E., Zou Y., Xia D., and Pan X., “Feasibility of half-data image reconstruction in 3-D reflectivity tomography with a spherical aperture,” IEEE Trans. Med. Imaging 24, 1100–1112 (2005). 10.1109/TMI.2005.852055 [DOI] [PubMed] [Google Scholar]

[c43] Claerbout J. F., Earth Sounding Analysis: Processing Versus Inversion (Blackwell Scientific, Cambridge, MA, 1992). [Google Scholar]

[c44] Khokhlova T. D., Pelivanov I. M., Kozhushko V. V., Zharinov A. N., Solomatin V. S., and Karabutov A. A., “Optoacoustic imaging of absorbing objects in a turbid medium: Ultimate sensitivity and application to breast cancer diagnostics,” Appl. Opt. 46, 262–272 (2007). 10.1364/AO.46.000262 [DOI] [PubMed] [Google Scholar]

[c45] Aarsvold J. N., Emission Tomography: The Fundamentals of PET and SPECT (Elsevier Academic, San Diego, CA, 2004). [Google Scholar]

[c46] Zeng G. and Gullberg G., “Unmatched projector/backprojector pairs in an iterative reconstruction algorithm,” IEEE Trans. Med. Imaging 19, 548–555 (2000). 10.1109/42.870265 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c47] Morton K. W. and Mayers D. F., Numerical Solution of Partial Differential Equations: An Introduction (Cambridge University Press, New York, NY, 2005). [Google Scholar]

[c48] Brecht H.-P., Su R., Fronheiser M., Ermilov S. A., Conjusteau A., and Oraevsky A. A., “Whole-body three-dimensional optoacoustic tomography system for small animals,” J. Biomed. Opt. 14, 064007 (2009). 10.1117/1.3259361 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c49] Anastasio M. A., Zhang J., Modgil D., and La Riviere P., “Application of inverse source concepts to photoacoustic tomography,” Inverse Probl. 23, S21–S35 (2007). 10.1088/0266-5611/23/6/S03 [DOI] [Google Scholar]

[c50] Fessler J. A., “Penalized weighted least-squares reconstruction for positron emission tomography,” IEEE Trans. Med. Imaging 13, 290–300 (1994). 10.1109/42.293921 [DOI] [PubMed] [Google Scholar]

[c51] Shewchuk J. R., “An introduction to the conjugate gradient method without the agonizing pain,” Technical Report No. CMU-CS-94-125 (Carnegie Mellon University, Pittsburgh, PA, 1994).

[c52] Fessler J. and Booth S., “Conjugate-gradient preconditioning methods for shift-variant PET image reconstruction,” IEEE Trans. Image Process. 8, 688–699 (1999). 10.1109/83.760336 [DOI] [PubMed] [Google Scholar]

[c53] Meng J., Wang L. V., Ying L., Liang D., and Song L., “Compressed-sensing photoacoustic computed tomography in vivo with partially known support,” Opt. Express 20, 16510–16523 (2012). 10.1364/OE.20.016510 [DOI] [Google Scholar]

[c54] Beck A. and Teboulle M., “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. Image Process. 18, 2419–2434 (2009). 10.1109/TIP.2009.2028250 [DOI] [PubMed] [Google Scholar]

[c55] Boyd S., Parikh N., Chu E., Peleato B., and Eckstein J., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn. 3, 1–122 (2011). 10.1561/2200000016 [DOI] [Google Scholar]

[c56] Xu M. and Wang L. V., “Time-domain reconstruction for thermoacoustic tomography in a spherical geometry,” IEEE Trans. Med. Imaging 21, 814–822 (2002). 10.1109/TMI.2002.801176 [DOI] [PubMed] [Google Scholar]

[c57] Finch D., Haltmeier M., and Rakesh, “Inversion of spherical means and the wave equation in even dimensions,” SIAM J. Appl. Math. 68, 392–412 (2007). 10.1137/070682137 [DOI] [Google Scholar]

[c58] Elbau P., Scherzer O., and Schulze R., “Reconstruction formulas for photoacoustic sectional imaging,” Inverse Probl. 28, 045004 (2012). 10.1088/0266-5611/28/4/045004 [DOI] [Google Scholar]

[c59] Rosenthal A., Razansky D., and Ntziachristos V., “Fast semi-analytical model-based acoustic inversion for quantitative optoacoustic tomography,” IEEE Trans. Med. Imaging 29, 1275–1285 (2010). 10.1109/TMI.2010.2044584 [DOI] [PubMed] [Google Scholar]

PERMALINK

Accelerating image reconstruction in three-dimensional optoacoustic tomography on graphics processing units

Kun Wang

Chao Huang

Yu-Jiun Kao

Cheng-Ying Chou

Alexander A Oraevsky

Mark A Anastasio

Abstract

INTRODUCTION

BACKGROUND

Continuous-to-continuous (C-C) imaging models and analytic image reconstruction algorithms

Discrete-to-discrete imaging models and iterative image reconstruction algorithms

Interpolation-based D-D imaging model

Figure 1.

Spherical-voxel-based D-D imaging model

Adjoints of the system matrices

GPU architecture and CUDA programming

GPU-ACCELERATED RECONSTRUCTION ALGORITHMS

Measurement geometry

Implementation of the FBP algorithm

Table .

Table .

Implementation of Hint and H int †

Table .

Table .

Table .

Table .

Implementation of Hsph and H sph †

Table .

Table .

Table .

Table .

Table .

Table .

DESCRIPTIONS OF COMPUTER SIMULATION AND EXPERIMENTAL STUDIES

Computer-simulation studies

Numerical phantom

Figure 2.

Simulated projection data

Reconstruction algorithms

Performance assessment

Hardware specifications

Experimental studies

RESULTS

Computational efficiency

Table 1.

Computational accuracy

Figure 5.

Figure 3.

Figure 6.

Figure 4.

Experimental results

Figure 7.

DISCUSSION AND CONCLUSION

ACKNOWLEDGMENT

APPENDIX: DISCRETIZATION OF SRT

Derivation of Eq. 12

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Implementation of H_int and $H_{int}^{†}$

Implementation of H_sph and $H_{sph}^{†}$