A Krasnoselskii-Mann Algorithm with an Improved EM Preconditioner for PET Image Reconstruction

Yizun Lin; C Ross Schmidtlein; Qia Li; Si Li; Yuesheng Xu

doi:10.1109/TMI.2019.2898271

. Author manuscript; available in PMC: 2020 Oct 1.

Published in final edited form as: IEEE Trans Med Imaging. 2019 Feb 19;38(9):2114–2126. doi: 10.1109/TMI.2019.2898271

A Krasnoselskii-Mann Algorithm with an Improved EM Preconditioner for PET Image Reconstruction

Yizun Lin ¹, C Ross Schmidtlein ², Qia Li ³, Si Li ⁴, Yuesheng Xu ⁵

PMCID: PMC7528397 NIHMSID: NIHMS1621857 PMID: 30794510

Abstract

This paper presents a preconditioned Krasnoselskii-Mann (KM) algorithm with an improved EM preconditioner (IEM-PKMA) for higher-order total variation (HOTV) regularized positron emission tomography (PET) image reconstruction. The PET reconstruction problem can be formulated as a three-term convex optimization model consisting of the Kullback–Leibler (KL) fidelity term, a nonsmooth penalty term, and a nonnegative constraint term which is also nonsmooth. We develop an efficient KM algorithm for solving this optimization problem based on a fixed-point characterization of its solution, with a preconditioner and a momentum technique for accelerating convergence. By combining the EM precondtioner, a thresholding, and a good inexpensive estimate of the solution, we propose an improved EM preconditioner that can not only accelerate convergence but also avoid the reconstructed image being “stuck at zero.” Numerical results in this paper show that the proposed IEM-PKMA outperforms existing state-of-the-art algorithms including, the optimization transfer descent algorithm and the preconditioned L-BFGS-B algorithm for the differentiable smoothed anisotropic total variation regularized model, the preconditioned alternating projection algorithm, and the alternating direction method of multipliers for the nondifferentiable HOTV regularized model. Encouraging initial experiments using clinical data are presented.

Keywords: Krasnoselskii-Mann algorithm, image reconstruction, maximum likelihood estimation, positron emission tomography, total variation

I. Introduction

POSITRON emission tomography (PET) is a well-established technique for molecular imaging, where it can produce spatial (and temporal) estimates of a particular tracer’s bio-distribution (images). These images have been instrumental in diagnosis, staging, and are rapidly extending into therapy and response assessment. As a result, improved estimates of the tracer distribution may allow physicians and scientists to more accurately interpret the implications of these bio-distributions and improve patient outcomes. However, because of important health consideration, basic physical processes and work-flow issues, the data is inherently count and resolution limited and must be reconstructed within a few minutes of acquisition. As a result, a great deal of effort has gone into improving both the speed and accuracy of the estimation of the PET tracer distributions.

At present, these efforts have evolved in a penalized likelihood (PL) model where a data fidelity term (KL-divergence for PET, e.g., Poisson data noise models) plus one or more penalty terms to regularize the estimated images are optimized to produce the most likely image given the data and penalty. Despite the introduction of penalty terms can improve the quality of reconstructed images, these penalty terms may be nondifferentiable or even nonconvex, which makes it difficult to develop efficient algorithms for solving the PL model. To address this issue, many algorithms and penalties have been proposed, but without guidance, it is difficult to know if a particular algorithm or penalty is suitable for routine imaging.

The total variation (TV) penalty (regularization) introduced by [1] has had a great deal of success in image denoising by removing unwanted noise while preserving important details such as edges, and has been used successfully in medical imaging modalities, such as CT or MR. Unfortunately, TV can introduce piecewise constant (staircase) artifacts in the smoothly varying regions of the images, which make it unsuitable for PET. However, the addition of a second-order term, which we refer to throughout this paper as a higher-order TV (HOTV) penalty [2], [3], allows the preservation of edges at known tracer boundaries (e.g. liver, bladder, heart, brain, and, in the case of dynamic imaging, at the boundaries of blood vessels), while suppressing the undesired staircase artifacts.

Classical gradient-type algorithms are unable to solve TV or HOTV regularized models due to the nondifferentiability of their objective functions. To address this while maintaining their edge-preserving properties, [4]-[7] have used smooth approximations of the absolute value function to replace the ℓ₁ regularized model with a differentiable one. Efficient algorithms, including the preconditioned conjugate-gradient algorithm (PCG) [4], the optimization transfer descent algorithm (OTDA) [6], [8] and the preconditioned limited-memory Broyden-Fletcher-Goldfarb-Shanno with boundary constraints algorithm (L-BFGS-B-PC) [7], have been proposed for solving these smooth edge-preserving regularized models. Though the smooth approximation method used here can avoid the problem of nondifferentiability, it requires a tuning parameter, which is not part of the penalty weight. If this parameter is set too large, the smooth penalty will lose its edge-preserving property, and if it is set too small, as mentioned in [8], convergence of classical gradient-type algorithms may be slow. Therefore, it is necessary to develop new efficient algorithms to directly solve nondifferentiable models.

Recently, many algorithms have been proposed for nondifferentiable regularized models, including EM-based methods [9]-[11], projected quasi-Newton methods [12]-[14], forward-backward approach [15], primal-dual methods [16]-[18], augmented Lagrangian methods [19], [20], and fixed-point proximity methods [3], [21], [22]. Among these algorithms, the alternating direction method of multipliers (ADMM) [19], [20] and preconditioned alternating projection algorithm (PAPA) [3], [21] have shown good performance for regularized emission computed tomography (ECT) image reconstruction. In particular, [21] and [3] have shown that PAPA outperforms the nested EM-TV algorithm [11], the one-step-late method with TV regularization [9], and the preconditioned primal-dual algorithm [17], [23].

The fast proximity-gradient algorithm (FPGA) [24], derived from the multi-step fixed-point proximity framework [25], was developed for general three-term optimization problems. This algorithm has some desirable properties. For example, its development was guided by constructing an averaged nonexpansive operator for fixed-point iteration that guarantees monotone convergence. That is, as the fixed-point iteration proceeds, the distance between the iterative result and the true solution is monotone decreasing. Moreover, the inner iteration, required for ADMM, is unnecessary, which results in more stable convergence. To improve the convergence speed, FPGA introduced the Nesterov momentum technique [26], [27]. However, FPGA does not provide a strategy for choosing efficient preconditioners for three-term optimization problems.

An alternative fixed-point algorithm, PAPA, introduces the EM preconditioner instead of a step-size parameter, which greatly improves its convergence. Although the use of the EM preconditioner in PAPA makes it an efficient algorithm for solving the HOTV regularized PET reconstruction model, both PAPA and the preconditioner introduce problems. In PAPA, the use of extra-gradient step (via reuse of the forward and backward projections), makes it difficult to prove the convergence of PAPA with a momentum technique. This arises because such a method may not be reformulated as a fixed-point iteration of an averaged nonexpansive operator. Another problem is that the classical EM preconditioning matrix may not be strictly positive definite. If a component of the reconstructed image becomes zero or very close to zero at an iteration step, then this component will be stuck at zero in all the remaining iterations.

Based on the construction of averaged nonexpansive operator in FPGA and the EM preconditioner, we propose an efficient preconditioned KM algorithm (PKMA) with an improved EM (IEM) preconditioner that can be easily proven convergent with a momentum technique as well as a preconditioner, and can at the same time avoid the reconstructed images being “stuck at zero.” In PKMA, we propose a simpler but more general form of momentum parameters compared to those used in the Nesterov momentum scheme. The generality of our proposed momentum parameters is demonstrated by two properties. First, our proposed momentum parameters includes the absence of momentum as a special case. Second, we prove that the Nesterov momentum parameters are asymptotically equivalent to a special case of our proposed momentum parameters. The use of the KM based momentum scheme obtains a better approximation of the solution by adding the current fixed-point update to the difference between the current fixed-point update and the update from prior iteration. We will show that, with proper selection of a factor of the difference term (momentum term), this KM based momentum scheme can ensure faster convergence toward the solution. The KM approach is a generalization of fixed-point approach, which leads to a simple convergence proof and provides insight toward choosing the algorithmic parameters.

This paper is organized in four sections and two appendices. In section II, we first describe the HOTV regularized PET image reconstruction model, then develop a preconditioned KM algorithm with an improved EM preconditioner for solving this model. A proof for convergence of PKMA is included. Section III presents simulation results for comparison of our proposed IEM-PKMA with OTDA and L-BFGS-B-PC for the differentiable smoothed anisotropic TV (SATV) regularized model, and that with PAPA and ADMM for the nondifferentiable HOTV regularized model. Initial experiments using clinical data are also presented. Section IV offers a conclusion. In the appendices, we provide technical details for development of PKMA and analysis of its convergence.

II. PET Image Reconstruction

We develop in this section a preconditioned KM algorithm with an improved EM preconditioner for solving the HOTV regularized PET reconstruction model. Here we limit the discussion to only the case of the first-order plus the second-order TV penalty.

A. HOTV Regularized PET Image Reconstruction Model

We begin with describing the HOTV regularized PET reconstruction model. We denote by $R_{+}$ the set of all nonnegative real numbers. For two positive integers m and d, $A \in R_{+}^{m \times d}$ denotes the PET system matrix whose (i, j)^th entry equals to the probability of detection of the photon pairs emitted from the jth voxel of the radiotracer distribution $f \in R_{+}^{d}$ within a patient (or a phantom) by the ith detector bin pair. The vector $γ \in R_{+}^{m}$ represents the mean value of the background noise from random and scatter coincidences. The relation of projection data $g \in R_{+}^{m}$ of a PET system with the radiotracer distribution f can be described by the Poisson model

g = Poisson (A f + γ),

(1)

where Poisson(x) denotes a Poisson-distributed random vector with mean x. System (1) may be solved by minimizing the following fidelity term

F (f) ≔ 〈 A f, 1_{m} 〉 - 〈 ln (A f + γ), g 〉,

(2)

where $1_{m} \in R^{m}$ is the vector whose components are all 1, the logarithmic function at $x \in R^{n}$ is defined by ln x := [ln x₁, ln x₂, … , ln x_n]^⊤, and $〈 x, y 〉 ≔ \sum_{i = 1}^{n} x_{i} y_{i}$ is the inner product of x, $y \in R^{n}$ .

Minimization of the fidelity term is well-known to be ill-posed [5], which results in severe over-fitting in the reconstructed image. To avoid this over-fitting problem, regularization terms were introduced as part of the reconstruction model. In this study, using both the first-order and the second-order TV penalties, we get the following HOTV regularized PET image reconstruction model

\underset{f \in R^{d}}{arg min} {F (f) + λ_{1} (φ_{1} \circ B_{1}) (f) + λ_{2} (φ_{2} \circ B_{2}) (f) + ı (f)},

(3)

where φ₁ ∘ B₁ and φ₂ ∘ B₂ represent the first-order and second-order TV, respectively. The two functions φ₁, φ₂ are defined by the ℓ₁-norm for the anisotropic TV or the ℓ₂-norm for the isotropic TV, and thus they are convex. Here $B_{1} \in R^{m_{1} \times d}$ , $B_{2} \in R^{m_{2} \times d}$ are the first-order and second-order difference matrices, respectively, and λ₁, $λ_{2} \in R_{+}$ are the corresponding regularization parameters. For the detailed definitions of φ₁, φ₂ and B₁, B₂, see Appendix A. The indicator function ı on $R_{+}^{d}$ is defined by

ı (x) ≔ {\begin{matrix} 0, & if x \in R_{+}^{d}, \\ + \infty, & else . \end{matrix}

To simplify notation, we define m₀ := m₁ + m₂, for $z ≔ [\begin{matrix} x \\ y \end{matrix}] \in R^{m_{0}}$ with $x \in R^{m_{1}}$ , $y \in R^{m_{2}}$ ,

φ (z) ≔ λ_{1} φ_{1} (x) + λ_{2} φ_{2} (y), and B ≔ [\begin{matrix} B_{1} \\ B_{2} \end{matrix}] .

(4)

Then model (3) can be written in a compact form

\underset{f \in R^{d}}{arg min} {F (f) + φ (B f) + ı (f)} .

(5)

This is the model on which the proposed reconstruction algorithm is based.

B. Preconditioned Krasnoselskii-Mann Algorithm

We next characterize a solution of model (5) as a fixed-point of a mapping defined via the proximity operator of functions φ and ı. To this end, we let $S_{+}^{n}$ denote the set of n × n symmetric positive definite matrices. For $H \in S_{+}^{n}$ , the H-weighted inner product is defined by ⟨x, y⟩_H := ⟨x, Hy⟩ for x, $y \in R^{n}$ and the corresponding H-weighted ℓ₂-norm is defined by $‖ x ‖_{H} ≔ 〈 x, x {〉_{H}}^{\frac{1}{2}}$ . According to [28], for a convex function $ψ : R^{n} \to R$ , the proximity operator of ψ with respect to $H \in S_{+}^{n}$ at $x \in R^{n}$ is defined by

{prox}_{ψ, H} (x) ≔ \underset{u \in R^{n}}{arg min} {\frac{1}{2} ‖ u - x ‖_{H}^{2} + ψ (u)} .

In particular, we use prox_ψ for prox_ψ,I. We let $Γ_{0} (R^{n})$ denote the class of all proper lower semicontinuous convex functions defined on $R^{n}$ and recall that the conjugate ψ* of ψ is defined by $ψ^{*} (z) ≔ {sup}_{x \in R^{n}} {〈 z, x 〉 - ψ (x)}$ . Now we have the following fixed-point characterization of a solution of model (5).

Theorem 1: If $f \in R^{d}$ is a solution of model (5), then for any $P \in S_{+}^{d}$ and $Q \in S_{+}^{m_{0}}$ , there exists a vector $h \in R^{m_{0}}$ such that

f = {prox}_{ı, P^{- 1}} (f - P (\nabla F (f) + B^{T} h)),

(6)

h = {prox}_{φ^{*}, Q^{- 1}} (h + Q B f) .

(7)

Conversely, if there exist $P \in S_{+}^{d}$ , $Q \in S_{+}^{m_{0}}$ , and $h \in R^{m_{0}}$ such that $f \in R^{d}$ satisfies equations (6) and (7), then f is a solution of model (5).

Proof: It can be verified that in (5), $F : R^{d} \to R$ is convex and differentiable with a Lipschitz continuous gradient, $φ \in Γ_{0} (R^{m_{0}})$ and $ı \in Γ_{0} (R^{d})$ . Using [21, Th. 3.1], we conclude that this theorem holds true. ■

Note that the gradient of the fidelity term F is given by

\nabla F (f) = A^{T} (1_{m} - \frac{g}{A f + γ}) .

(8)

We shall develop a convergent efficient algorithm for solving model (5) from the fixed-point equations (6) and (7). To this end, we write equations (6) and (7) in a compact form. Define

v ≔ [\begin{matrix} f \\ h \end{matrix}] \in R^{d + m_{0}}, r (v) ≔ F (f),

(9)

T (v) ≔ [\begin{matrix} {prox}_{ı, P^{- 1}} (f) \\ {prox}_{φ^{*}, Q^{- 1}} (h) \end{matrix}], E ≔ [\begin{matrix} I_{d} & - {PB}^{T} \\ QB & I_{m_{0}} \end{matrix}],

where I_n denotes the n × n identity matrix, $P \in S_{+}^{d}$ , $Q \in S_{+}^{m_{0}}$ are two introduced preconditioning matrices, and define

R ≔ [\begin{matrix} P & 0 \\ 0 & Q \end{matrix}] .

(10)

For the matrix $E \in R^{(d + m_{0}) \times (d + m_{0})}$ and the operator $R \nabla r : R^{d + m_{0}} \to R^{d + m_{0}}$ , we define E – R∇r by (E – R∇r)(v) := E(v)–R∇r(v) for $v \in R^{d + m_{0}}$ . Then (6) and (7) can be written as the fixed-point equation of T composed with E – R∇r,

v = (T \circ (E - R \nabla r)) (v) .

This means that if v := [v₁, v₂, …, v_d+m₀]^⊤ is a fixed-point of T ∘ (E – R∇r), then f = [v₁, v₂, … , v_d]^⊤, the subvector of the first d components of v, is a solution of model (5). It was shown in [25] that E is expansive. The fixed-point iteration of T ∘ (E – R∇r) may fail to yield a convergent sequence. To develop a convergent fixed-point algorithm, we choose matrix $G \in R^{(d + m_{0}) \times (d + m_{0})}$ by

G ≔ [\begin{matrix} I_{d} & - {PB}^{T} \\ - QB & I_{m_{0}} \end{matrix}]

(11)

and proceed the fixed-point iteration

v^{k + 1} = T ((E - G) v^{k + 1} + (G - R \nabla r) v^{k}), k \in N_{0},

(12)

where $N_{0} ≔ {0, 1, \dots}$ . The iteration (12) is in fact explicit since E – G is strictly block lower triangular (in the block partition as E). We define $T_{G} : R^{d + m_{0}} \to R^{d + m_{0}}$ by

T_{G} : u \to {\begin{matrix} v : (u, v) satisfies that \\ v = T ((E - G) v + Gv) \end{matrix}},

(13)

W := R⁻¹G, and let

T_{W} : = T_{G} \circ (I - W^{- 1} \nabla r),

(14)

where $I$ denotes the identity operator.

It is important to verify that T_G is well-defined. To show this, we let $u ≔ [\begin{matrix} {\tilde{u}}_{1} \\ {\tilde{u}}_{2} \end{matrix}]$ be a given vector, where ${\tilde{u}}_{1} \in R^{d}$ and ${\tilde{u}}_{2} \in R^{m_{0}}$ . The implicit fixed-point equation (13) then can be written by

{\tilde{v}}_{1} = {prox}_{ı, P^{- 1}} ({\tilde{u}}_{1} - {PB}^{T} {\tilde{u}}_{2}),

(15)

{\tilde{v}}_{2} = {prox}_{φ^{*}, Q^{- 1}} (2 QB {\tilde{v}}_{1} - QB {\tilde{u}}_{1} + {\tilde{u}}_{2}) .

(16)

Here ${\tilde{v}}_{1} ≔ [v_{1}, v_{2}, \dots, v_{d}]^{T}$ , ${\tilde{v}}_{2} ≔ [v_{d + 1}, v_{d + 2}, \dots, v_{d + m_{0}}]^{T}$ . Since ${\tilde{u}}_{1}$ and ${\tilde{u}}_{2}$ in equation (15) are given, from the definition of proximity operator, we know that the solution ${\tilde{v}}_{1}$ of (15) exists and is unique, which implies the existence and uniqueness of the solution ${\tilde{v}}_{2}$ of equation (16). This shows that for any given $u \in R^{d + m_{0}}$ in the equation contained in (13), there exists a unique solution v.

Using the definition of $T_{W}$ , iteration (12) can be rewritten as the fixed-point iteration

v^{k + 1} = T_{W} (v^{k}), k \in N_{0} .

(17)

We now comment on convergence of iteration (17). According to [21], ∇F is Lipschitz continuous if all components of γ are positive. Let L denote the Lipschitz constant of ∇F and λ_W the smallest eigenvalue of W. We will recall the definition of nonexpansive, firmly nonexpansive and averaged nonexpansive operators in Appendix B, and show in Lemma 7 that if $λ_{W} > \frac{L}{2}$ , then $T_{W}$ is ζ-averaged nonexpansive with respect to W, where $ζ ≔ \frac{2 λ_{W}}{4 λ_{W} - L} \in (\frac{1}{2}, 1)$ . The KM theorem (Theorem 4 in Appendix B) implies that iteration (17) converges to a fixed-point of $T_{W}$ .

To accelerate the convergence speed of iteration (17), for α > 0, we define $T^{α} ≔ (1 - α) I + α T_{W}$ . Since $T^{α}$ is αζ-averaged nonexpansive with respect to W, by the KM theorem, we know that the fixed-point iteration

v^{k + 1} = T^{α} (v^{k}), k \in N_{0}

(18)

converges to a fixed-point of $T_{W}$ if $α \in (0, \frac{1}{ζ})$ . We observe numerically that corresponding to a larger $α \in (0, \frac{1}{ζ}) \subset (0, 2)$ , the fixed-point iteration (18) converges faster. This observation inspires us to choose α ∈ (1, 2) for (18). Iteration scheme (18) may be interpreted as an application of the momentum technique. To see this, we write

T^{α} (v^{k}) = T_{W} (v^{k}) + (α - 1) (T_{W} (v^{k}) - v^{k}), k \in N_{0} .

Clearly, $T^{α}$ is an extension of $T_{W}$ by using a momentum technique. However, we found that iteration (18) is not robust for large α ∈ (1, 2). If the components of the initial vector v⁰ are too large, to guarantee convergence, the entries of the preconditioning matrix P should be set small, which leads to slow convergence. To overcome this obstacle and obtain a robust fixed-point iteration with momentum acceleration, we use the KM iteration by allowing the momentum parameter α to vary in each iteration. That is, we introduce a sequence of parameters, for given ϱ ∈ (−1, 1) and δ ≥ 0, by

α_{k} ≔ 1 + \frac{ϱ k}{k + δ}, k \in N_{0},

(19)

and construct a sequence of operators

T^{α_{k}} ≔ (1 + α_{k}) I + α_{k} T_{W}, k \in N_{0} .

(20)

Given an initialization $v^{(0)} \in R^{d + m_{0}}$ , we then proceed the KM based momentum scheme by

v^{k + 1} = T^{α_{k}} (v^{k}), k \in N_{0} .

(21)

We next show that the Nesterov momentum parameters are asymptotically equivalent to a special case of our proposed momentum parameters, defined by (19), by appropriately setting the sub-parameters ϱ and δ. Recall that the Nesterov momentum parameters are given by

t_{0} ≔ 1, t_{k + 1} ≔ \frac{1 + \sqrt{1 + 4 t_{k}^{2}}}{2}, α_{k}^{'} ≔ 1 + \frac{t_{k} - 1}{t_{k + 1}}, k \in N_{0} .

(22)

When k is sufficient large, ${α_{k}^{'}}_{k \in N_{0}}$ in (22) then corresponds to ${α_{k}}_{k \in N_{0}}$ in (19) with the sub-parameter choice ϱ = 1 and δ = 3, in the sense given by Proposition 2. Moreover, the absence of momentum (α_k = 1 for all $k \in N_{0}$ ) is a special case of our proposed momentum scheme with ϱ = 0, which is not the case for the Nesterov momentum since $α_{k}^{'} > 1$ for k ≥ 1. The generality of our proposed momentum scheme provides us a variety of different parameter choices for different scenarios. We now state Proposition 2, whose proof is provided in the supplementary materials. The supplementary materials are available in the supplementary files tab.

Proposition 2: Let ${α_{k}}_{k \in N_{0}}$ be given by (19) with ρ = 1 and δ = 3, ${t_{k}}_{k \in N_{0}}$ and ${α_{k}^{'}}_{k \in N_{0}}$ be given by (22). Then ${α_{k}}_{k \in N_{0}}$ and ${α_{k}^{'}}_{k \in N_{0}}$ converge to the same value with the same convergence rate.

We now provide specific choice of preconditioning matrices P and Q for iteration (21). We let P := βS and Q := diag (ρ₁1_m₁, ρ₂1_m₂), where β, ρ₁ and ρ₂ are positive numbers, S is a d × d diagonal matrix with positive diagonal entries. In this case, it follows from the definition of proximity operator that for $x \in R^{d}$ ,

{prox}_{ı, P^{- 1}} (x) = {prox}_{ı} (x) = P_{R_{+}^{d}} (x) ≔ \max (x, 0) .

(23)

The maxima in the above equation is taken component-wise. By using the well-known Moreau decomposition [29]

I = {prox}_{φ^{*}, Q^{- 1}} + Q \circ {prox}_{φ, Q} \circ Q^{- 1},

and the equation

{prox}_{φ, Q} (z) = [\begin{matrix} {prox}_{\frac{λ}{ρ_{1}} φ_{1}} (x) \\ {prox}_{\frac{λ}{ρ_{2}} φ_{2}} (y) \end{matrix}]

for $z : = [\begin{matrix} x \\ y \end{matrix}] \in R^{m_{0}}$ with $x \in R^{m_{1}}$ , $y \in R^{m_{2}}$ , we have

{prox}_{φ^{*}, Q^{- 1}} (z) = [\begin{matrix} ρ_{1} (I - {prox}_{\frac{λ}{ρ_{1}} φ_{1}}) (\frac{1}{ρ_{1}} x) \\ ρ_{2} (I - {prox}_{\frac{λ}{ρ_{2}} φ_{2}}) (\frac{1}{ρ_{2}} y) \end{matrix}] .

Now the preconditioned KM algorithm (PKMA) for solving model (3) is given as follows.

{\tilde{f}}^{k} = P_{R_{+}^{d}} (f^{k} - β S (\nabla F (f^{k}) + B_{1}^{T} b^{k} + B_{2}^{T} c^{k})) {\tilde{b}}^{k + 1} = ρ_{1} (I - {prox}_{\frac{λ_{1}}{ρ_{1}} φ_{1}}) (\frac{1}{ρ_{1}} b^{k} + B_{1} (2 {\tilde{f}}^{k + 1} - f^{k})) {\tilde{c}}^{k + 1} = ρ_{2} (I - {prox}_{\frac{λ_{2}}{ρ_{2}} φ_{2}}) (\frac{1}{ρ_{2}} c^{k} + B_{2} (2 {\tilde{f}}^{k + 1} - f^{k})) α_{k} = 1 + \frac{ϱ k}{k + δ} f^{k + 1} = (1 - α_{k}) f^{k} + α_{k} {\tilde{f}}^{k + 1} b^{k + 1} = (1 - α_{k}) b^{k} + α_{k} {\tilde{b}}^{k + 1} c^{k + 1} = (1 - α_{k}) c^{k} + α_{k} {\tilde{c}}^{k + 1}

One can find the explicit form of prox_ωφ₁ (x) for $x \in R^{m_{1}}$ and prox_ωφ₂ (x) for $x \in R^{m_{2}}$ , where ω > 0, in Appendix A.

In the following theorem, we consider the convergence of PKMA. We let S_max denote the largest diagonal entry of the diagonal matrix S, $p ≔ \frac{1}{β S_{\max}}$ and $ξ ≔ \frac{L}{2 (1 - \max {ϱ, 0})}$ .

Theorem 3: Let $f^{0} \in R^{d}$ , $B^{0} \in R^{m_{1}}$ and $c^{0} \in R^{m_{2}}$ be given vectors, ${f^{k}}_{k \in N_{0}}$ be the sequence generated by PKMA, where ϱ ∈ (−1, 1), δ ≥ 0 and β, ρ₁, ρ₂ are positive in PKMA. For a given diagonal matrix $S \in R^{d \times d}$ with positive diagonal entries, if $β < \frac{1}{ξ S_{m a x}}$ , $ρ_{1} < \frac{p - ξ}{2 ‖ B_{1} ‖_{2}^{2} + (p - ξ) ξ}$ and $ρ_{2} < \frac{p - ξ}{2 ‖ B_{2} ‖_{2}^{2} + (p - ξ) ξ}$ , then ${f^{k}}_{k \in N_{0}}$ converges to a solution of model (3).

Proof: By Theorem 8, to prove the theorem, it suffices to verify that λ_W > ξ. This is done in Lemma 9. ■

Note that in our proof, the preconditioner should be fixed in each iteration. However, we can update it during early iterations and fix it in later iterations to guarantee the convergence.

C. Improved EM Preconditioner

In this subsection, we propose an improved EM (IEM) preconditioner for PKMA to accelerate convergence. We begin by recalling the classical EM preconditioner. To avoid zero components in A^⊤1_m, we define $Λ \in R^{d}$ as a vector such that

Λ_{j} = {\begin{matrix} (A^{T} 1_{m})_{j}, & if (A^{T} 1_{m})_{j} > 0, \\ 1, & otherwise, \end{matrix}

(24)

for j = 1, 2, …, d. As shown in [21], the EM preconditioner

S_{EM} ≔ diag (\frac{f^{k}}{Λ})

(25)

performs better than the identity preconditioner I_d and the diagonal normalization (DN) preconditioner

S_{DN} ≔ diag (\frac{1_{d}}{Λ}) .

(26)

However, when using the EM preconditioner, once some components of f^k are equal to zero, these components will be stuck at zero thereafter, which results in holes in the reconstructed images. Low count data and ordered subsets type algorithms, are particularly susceptible to this problem. From Theorem 3, we know that to guarantee convergence of PKMA, the entries of the diagonal preconditioning matrix S should be positive. However, this may not be the case with the EM preconditioner since some components of f^k may be zero. To guarantee the positivity of the preconditioner as well as accelerate convergence, we propose the following IEM preconditioner by including a true mean count (TMC) based thresholding and a good estimate $\hat{f}$ of the true solution f* in the EM preconditioner,

S_{IEM} (\hat{f}) ≔ diag (\frac{\max {η 1_{d}, \hat{f}, f^{k}}}{Λ}),

(27)

where the positive constant η is set to 0.1 · TMC and

TMC ≔ \frac{ACTc}{NPFOV \cdot NPA} .

(28)

Here ACTc, NPFOV, and NPA represent the total attenuation corrected true counts, the number of pixels within the field of view, and the number of projection angles, respectively.

In the definition of IEM preconditioner given by (27), our choice of the threshold constant 0.1 · TMC was based on TMC being the mean of the components of f*. In addition, as shown in Fig. 2, we empirically found that the inclusion of a good estimate of f* used in the preconditioner leads to faster convergence. This can be exploited by using a good inexpensive estimate of f* for $\hat{f}$ in (27). For example, we can use an image reconstructed by filtered backprojection f_FBP as $\hat{f}$ . Alternatively, if we set $\hat{f} = 0$ in (27), then the IEM preconditioner reduces to the case considered in [30].

Fig. 2. — NOFV (left) and NRMSE (right) versus CPU time by PKMA with preconditioners S_EM, S_IEM(0), $S_{IEM} ({\bar{f}}^{^{2}})$ , $S_{IEM} ({\bar{f}}^{^{20}})$ and $S_{IEM} ({\bar{f}}^{^{200}})$ .

III. Numerical Results

In this section, we present several numerical results. First, we show the performance of different choices of $\hat{f}$ in the IEM preconditioner (27). Then, we compare the performance of different preconditioners and different momentum parameters for PKMA. To see the performance of PKMA with the IEM preconditioner (IEM-PKMA) for differentiable regularized PET image reconstruction model, we compare it with the existing OTDA and L-BFGS-B-PC for the SATV regularized reconstruction model. Following this, we present comparison of PKMA with two existing algorithms, PAPA and ADMM, suitable for nonsmooth penalties. Finally, we provide some initial 3D clinical results of a relaxed ordered subsets version of PKMA (ROS-PKMA).

A. Simulation Setup

We implemented the algorithms via Matlab through a 2D PET simulation model as described in [31]. The number of counts used in these 2D simulations were set to be equivalent to those from a 3D PET brain patient acquisition (administered 370 MBq FDG and imaged 1-hour post injection) collected from the central axial slice via GE D690/710 PET/CT. The resulting reference count distribution was used as the Poisson parameters for the noise realizations. An area integral projection method was used to build the projection matrix based on a cylindrical detector ring consisting of 576 detectors whose width are 4 mm. We set the FOV as 300 mm and use 288 projection angles to reconstruct a 256×256 image.

To simulate the physical factors that will affect the resolution of the reconstructed image, such as positron range, detector width, penetration, residual momentum of the positron and imperfect decoding, the phantom was convolved with an idealized (space-invariant, Gaussian) point spread function (PSF), which was set as a constant over the whole FOV. The full width half maximum (FWHM) of this PSF was set to 6.59 mm based on physical measurements from acceptance testing and [32]. The true count projection data was produced by forward-projecting the phantom convolved by the PSF. Uniform water attenuation (with the attenuation coefficient 0.096 cm⁻¹) was simulated using the PET image support. The background noise was implemented as describe in [33] and was based on 25% scatter fraction and 25% random fraction, given by SF := Sc/(Tc + Sc) and RF:=Rc/(Tc+Sc+Rc), respectively, where Tc, Sc and Rc represent true, random, and scatter counts respectively. Scatter was added by forward-projecting a highly smoothed version of the images, which was added to the attenuated image sinogram scaled by the scatter fraction. Random counts were simulated by adding a uniform distribution to the true and scatter count distributions scaled by the random fraction. We call the summation of Tc, Sc, and Rc the total counts and denote it by TC. In our simulations, we set TC = 6.8 × 10⁶ for the high-count data, and TC = 6.8 × 10⁵ for the low-count data.

We next provide the figure-of-merits used for the comparisons. They include the normalized objective function value (NOFV), normalized root mean square error (NRMSE), normalized relative contrast (NRC) and central line profile (CLP). The NOFV is defined by

NOFV (f^{k}) ≔ \frac{Φ (f^{k}) - Φ_{ref}}{Φ_{0} - Φ_{ref}},

where Φ denotes the objective function, Φ₀ is the objective function value of the initial image, and Φ_ref denotes the reference objective function value. For simulation results, we set Φ_ref to the objective function value of the image reconstructed by 1000 iterations of IEM-PKMA. The NRMSE is defined by

NRMSE (f^{k}) ≔ \frac{‖ f^{k} - f_{true} ‖_{2}}{‖ f_{true} ‖_{2}},

where $f_{true} \in R^{d}$ is the ground truth, and ∥ · ∥₂ is the ℓ₂-norm defined by $‖ x ‖_{2} ≔ {(\sum_{i = 1}^{d} x_{i}^{2})}^{\frac{1}{2}}$ for $x \in R^{d}$ . For the definition of NRC, we let ROI_H be a region within a specific hot sphere, ROI_B be a background region that is not close to any hot sphere and its size is the same as ROI_H. Define the relative contrast (RC) by $RC ≔ \frac{∣ E_{{ROI}_{H}} - E_{{ROI}_{B}} ∣}{E_{{ROI}_{B}}}$ , where E_{ROI_H} and E_{ROI_B} represent the average activities of ROI_H and ROI_B, respectively. Then the normalized relative contrast is defined by

NRC (f^{k}) ≔ \frac{{RC}_{f^{k}}}{{RC}_{true}},

where RC_f^k and RC_true are the relative contrast of f^k and the ground truth respectively.

Two 256 × 256 numerical phantoms shown in Fig. 1 were used for our simulations. The brain phantom was obtained from a high quality clinical PET brain image. The uniform phantom consists of the uniform background with six uniform hot spheres with distinct radii 4, 6, 8, 10, 12, 14 pixels. The activity ratio of the uniform hot spheres to the uniform background is 4:1. In the simulation experiments, we show comparison of NOFV, NRMSE by reconstructing the brain phantom, and NRC, CLP by reconstructing the uniform phantom. All simulations were performed in a 64-bit windows 10 laptop with Intel Core i7-7500U Processor at 2.70 GHz, 8 GB of DDR4 memory and 256GB SATA SSD.

We show the setting of the regularization parameters and the algorithmic parameters of PKMA in Table 1. For the reconstruction of the brain phantom, to suppress the staircase artifacts and avoid over-smoothed images, we empirically found that setting λ₂ = λ₁ was reasonable and simplified the search for optimal regularization parameters based on the minimum NRMSE. For the uniform phantom, the second-order TV regularization parameters λ₂ was set to 0, due to its piecewise constant nature. The setting of ρ₁ and ρ₂ in PKMA were based on the convergence conditions and the fact that $‖ B_{1} ‖_{2}^{2} \leq 8$ , $‖ B_{2} ‖_{2}^{2} \leq 64$ for the 2D case [34]. The parameters ϱ and δ in the momentum step were set to satisfy that ϱ ∈ (−1, 1) and δ ≥ 0. In addition, we denote by f_UD the uniform disk TMC · 1_disk with the same size as the FOV, where TMC is defined by (28) and 1_disk is the image whose values are 1 within the disk and are 0 outside the disk.

TABLE I.

Regularization Parameters and Algorithmic Parameters for 2D Simulation

Regularization parameters	Brain phantom	High-count: λ₁ = λ₂ = 0.04 Low-count: λ₁ = λ₂ = 0.34
Regularization parameters	Uniform phantom	High-count: λ₁ = 0.4, λ₂ = 0
Algoritdmic parameters	PKMA	β = 1, $ρ_{1} = \frac{1}{2 \times 8 \times S_{\max}}$ , $ρ_{2} = \frac{1}{2 \times 64 \times S_{\max}}$ ϱ = 0.9, δ = 0.1
	PAPA	β = 1, $ρ_{1} = \frac{1}{2 \times 8 \times S_{\max}}$ , $ρ_{2} = \frac{1}{2 \times 64 \times S_{\max}}$
	ADMM	σ = τ = 0.1, μ = 1.2

Open in a new tab

B. Simulation Results for IEM-PKMA

1). Comparison of Preconditioners and Momentum Techniques:

In this subsection, we use high-count data to reconstruct the brain phantom for comparison of PKMA with different preconditioners and momentum techniques. The initial image for PKMA was set to f_UD. We show in Fig. 2 the plots of NOFV and NRMSE versus CPU time for PKMA with the EM preconditioner and with four different IEM preconditioners: S_IEM(0), $S_{IEM} ({\bar{f}}^{^{2}})$ , $S_{IEM} ({\bar{f}}^{^{20}})$ and $S_{IEM} ({\bar{f}}^{^{200}})$ , where ${\bar{f}}^{^{2}}$ , ${\bar{f}}^{^{20}}$ and ${\bar{f}}^{^{200}}$ represent the images reconstructed by EM-PKMA with 2, 20 and 200 iterations respectively. From these figures, we can see that as $\hat{f}$ in the IEM preconditioner (27) is made closer to the solution f*, the convergence of PKMA with the preconditioner $S_{IEM} (\hat{f})$ improves. For all the experiments of Fig. 3-12, $\hat{f}$ in the IEM preconditioner was always set to f_FBP.

Fig. 12. — Reconstructed images of the uniform phantom by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk initialization using high-count data: top to bottom rows are reconstructed by 5, 10 and 100 iterations respectively.

In Fig. 3, we compare the results for three different preconditioners: IEM, EM and DN preconditioners, which are defined by (27), (25) and (26) respectively. FPGA and IEM-PKMA with the Nesterov momentum parameters (IEM-PKMA-NM) were also presented. There are two differences between FPGA [24] and IEM-PKMA for model (3), which are the choices of preconditioner and momentum parameters. Specifically, FPGA uses I_d as the preconditioner, while IEM-PKMA uses the IEM preconditioner, and FPGA selects momentum parameters from Nesterov’s update, while we provide a more general form of momentum parameters given by (19) for IEM-PKMA. To compare the performance of our proposed KM based momentum scheme with Nesterov’s, we replace the momentum parameters of IEM-PKMA by (22), yielding a new method we refer to as IEM-PKMA-NM. Optimal step-size β was tuned for each of DN-PKMA (β = 0.3) and FPGA (β = 0.003) based on the performance of objective function value.

In Fig. 4, we present the normalized root mean square difference (NRMSD) $\frac{‖ f_{IEM}^{k} - f_{DN}^{k} ‖_{2}}{‖ f_{true} ‖_{2}}$ between the images reconstructed by IEM-PKMA and DN-PKMA, as well as the reconstructed images with 5000 iterations. This figure shows that the use of two different positive definite preconditioners S_IEM and S_DN give the same converged images.

2). Comparison of Algorithms for SATV Regularization:

In this subsection, we show how IEM-PKMA performs using a smoothed approximation of a first-order edge-preserving regularized model (smoothed anisotropic TV penalty) and compare it to two other state-of-the-art algorithms suitable for smooth penalties. The SATV regularized reconstruction model is given by

\underset{f \in R^{d}}{arg min} {F (f) + λ \sum_{j = 1}^{d} \sum_{i \in N_{j}} ϕ_{θ} (f_{j} - f_{i})},

(29)

where F is defined by (2), $N_{j}$ consists of the indices of both left and up neighbor pixels of the jth pixel of image f, and ϕ_θ is the Lange function [35] defined by

ϕ_{θ} (t) ≔ ∣ t ∣ - θ ln (1 + \frac{∣ t ∣}{θ}), θ > 0 .

If we let ϕ_θ be the absolute value function, then model (29) becomes the anisotropic TV regularized model. By defining $R (f) ≔ \sum_{j = 1}^{d} \sum_{i \in N_{j}} ϕ_{θ} (f_{j} - f_{i})$ , IEM-PKMA for solving (29) can be given by

{\tilde{f}}^{k + 1} = P_{R_{+}^{d}} (f^{k} - β S_{IEM} (\nabla F (f^{k}) + λ \nabla R (f^{k}))) f^{k + 1} = (1 - α_{k}) f^{k} + α_{k} {\tilde{f}}^{k + 1}

where ${α_{k}}_{k \in N_{0}}$ is given by (19). Two state-of-the-art algorithms including OTDA and L-BFGS-B-PC for the SATV regularized model (29) were used for comparison. OTDA is based on the surrogate function method and uses the conjugate direction to accelerate convergence, which has been shown to outperform PCG [8]. L-BFGS-B-PC is based on the quasi-Newton method and uses the diagonal of inverse Hessian matrix for preconditioning. These two algorithms perform well for a differentiable regularization model. However, they both need an additional forward- and back-projection for the line search step, which makes each iteration more time consuming.

Fig. 5 shows the performance of IEM-PKMA, OTDA and L-BFGS-PC for solving model (29) with θ = 0.001 (to ensure edge preservation). High-count data was used and the regularization parameter λ was set to 0.06. For both initialization and the preconditioners in IEM-PKMA and L-BFGS-PC, f_FBP was used.

3). Comparison of Algorithms for HOTV Regularization:

In this subsection, we compare the performance of PKMA, PAPA and ADMM for the HOTV regularized model. For this purpose, we recall the iteration schemes of PAPA and ADMM. It follows from [3] that PAPA for solving model (3) can be written by

h^{k} = P_{R_{+}^{d}} (f^{k} - β S_{EM} (\nabla F (f^{k}) + B_{1}^{T} b^{k} + B_{2}^{T} c^{k})) b^{k + 1} = ρ_{1} (I - {prox}_{\frac{λ_{1}}{ρ_{1}} φ_{1}}) (\frac{1}{ρ_{1}} b^{k} + B_{1} h^{k}) c^{k + 1} = ρ_{2} (I - {prox}_{\frac{λ_{2}}{ρ_{2}} φ_{2}}) (\frac{1}{ρ_{2}} c^{k} + B_{2} h^{k}) f^{k + 1} = P_{R_{+}^{d}} (f^{k} - β S_{EM} (\nabla F (f^{k}) + B_{1}^{T} b^{k + 1} + B_{2}^{T} c^{k + 1}))

According to [20], ADMM for solving model (3) consists of the following three steps:

{\begin{matrix} u^{k + 1} = \underset{u \in R^{d}}{arg min} φ (Bu) + \frac{μ}{2} ‖ f^{k} - u + q^{k} ‖_{2}^{2}, \\ f^{k + 1} = \underset{f \in R_{+}^{d}}{arg min} F (f) + \frac{μ}{2} ‖ f - u^{k + 1} + q^{k} ‖_{2}^{2}, \\ q^{k + 1} = q^{k} + f^{k + 1} - u^{k + 1} . \end{matrix}

(30)

Unlike in [20], the term φ in our model is not differentiable and the proximity operator of φ ∘ B has no explicit form, requiring the first sub-problem of (30) to be solved via the first-order primal-dual algorithm (FOPDA) [36]. Here we perform five FOPDA sub-iterations in each complete ADMM iteration to guarantee the convergence. For the second sub-problem, we use the surrogate function strategy as described in [20].

Then we get the following ADMM iteration scheme:

b^{k, 0} = b^{k}, c^{k, 0} = c^{k}, u^{k, 0} = u^{k}, {\tilde{u}}^{k, 0} = {\tilde{u}}^{k} F o r l = 1 : 5 b^{k, l} = σ (I - {prox}_{\frac{λ_{1}}{σ} φ_{1}}) (\frac{1}{σ} b^{k, l - 1} + B_{1} {\tilde{u}}^{k, l - 1}) c^{k, l} = σ (I - {prox}_{\frac{λ_{2}}{σ} φ_{1}}) (\frac{1}{σ} c^{k, l - 1} + B_{2} {\tilde{u}}^{k, l - 1}) u^{k, l} = \frac{1}{1 + τ μ} (u^{k, l - 1} - τ (B_{1}^{T} b^{k, l} + B_{2}^{T} c^{k, l} + τ μ (f^{k} + q^{k})) {\tilde{u}}^{k, l} = 2 u^{k, l} - u^{k, l - 1} E n d b^{k + 1} = b^{k, 5}, c^{k + 1} = c^{k, 5}, u^{k + 1} = u^{k, 5}, {\tilde{u}}^{k + 1} = {\tilde{u}}^{k, 5} w = A^{T} 1_{m} - μ (u^{k + 1} - q^{k}) v = diag {f^{k}} A^{T} = \frac{g}{A f^{k} + γ} f^{k = 1} = \frac{- w + \sqrt{w^{2} + 4 μ v}}{2 μ} q^{k + 1} = q^{k} + f^{k + 1} - u^{k + 1}

The choice of parameters in PAPA and ADMM for 2D simulation are shown in table 1. For PAPA, the parameters were chosen according to [3]. For ADMM, the parameters σ and τ were set by the convergence condition $σ τ < \frac{1}{‖ B ‖_{2}^{2}}$ of FOPDA [36], optimal μ was chosen empirically based on the performance of objective function value.

We first show the performance of these algorithms for the reconstruction of the brain phantom using high-count data. Two different initial images, including the uniform disk f_UD and $\hat{f}$ , were used for comparison. Here we set Φ₀ = Φ(f_UD) in the definition of NOFV for both uniform disk and $\hat{f}$ initialization. In Fig. 6, we show the NOFV and NRMSE versus CPU time by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk and $\hat{f}$ initialization. It shows that IEM-PKMA converges more rapidly than both PAPA and ADMM. The reconstructed brain images with 5, 10 and 100 iterations in Fig. 7 show that IEM-PKMA is able to obtain a reasonably good image very rapidly. We comment here that the results from the use of IEM preconditioner are more pronounced if a uniform image is used for initialization, and less so when initialized by $\hat{f}$ in the IEM preconditioner, though still show improvement.

Fig. 6. — NOFV (left) and NRMSE (right) versus CPU time by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk (top row) and $\hat{f}$ (bottom row) initialization using high-count data.

Fig. 7. — Reconstructed brain images by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk initialization using high-count data: top to bottom rows are reconstructed by 5, 10, and 100 iterations respectively.

To demonstrate the performance of IEM-PKMA for low-count data, we show in Fig. 8 comparisons of NOFV and NRMSE versus CPU time for these algorithms with uniform disk and $\hat{f}$ initialization. The reconstructed brain images using low-count data are shown in Fig. 9.

Fig. 9. — Reconstructed brain images by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk initialization using low-count data: top to bottom rows are reconstructed by 5, 10 and 100 iterations respectively.

We next examine the performance of these algorithms for reconstructing the uniform phantom with uniform disk initialization and high-count data. Fig. 10 shows NRC versus CPU time for the largest and the smallest hot spheres of the uniform phantom. Comparisons of central line profiles of the images reconstructed by 5 and 10 iterations are shown in Fig. 11. The reconstructed images of the uniform phantom are shown in Fig. 12. These figures show that for the uniform phantom, IEM-PKMA outperforms the other algorithms.

Fig. 10. — NRC of the largest hot sphere (left) and the smallest hot sphere (right) versus CPU time by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk initialization using high-count data.

Fig. 11. — CLP of the images reconstructed by 5 iterations (left) and 10 iterations (right) of IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk initialization using high-count data.

C. Initial Clinical Results

In this subsection, we present some promising initial 3D clinical results that are based on a relaxed ordered subsets version, ROS-PKMA, according to [31] and [37]. The details of this algorithm and the parameters used are provided in the supplementary materials which are available in the supplementary files tab. We implemented ROS-PKMA, ROS-PAPA [31] and ROS-EM algorithms on a GE D690 PET/CT using a modified version of the GE PET Toolbox release 2.0. A brain scan of 52-year-old male with brain metastases was acquired 1 hour post-injection (370 MBq nominal) for 10 minutes. The images were reconstructed using time-of-flight (TOF) information with a 300 mm FOV using 256 × 256 matrix and an accurate model of the detector PSF (“sharpIR”).

Initial results shown in Fig. 13 appear to indicate that ROS-PKMA with 12 subsets can obtain even better images than both ROS-PAPA and ROS-EM (also with 12 subsets) using only half the iterations.

IV. Conclusion

This study presents an efficient, easily implemented and mathematically sound preconditioned Krasnoselskii-Mann algorithm for HOTV regularized PET image reconstruction. We prove that PKMA enjoys nice theoretical convergence in the case that the preconditioner is fixed after finite number of iterations. In addition, we show that our proposed generating function for the momentum parameters is more general than the one proposed by Nesterov, able to include both the momentum-free case and an asymptotically equivalent form of the Nesterov momentum parameters as special cases. An improved EM preconditioner that can avoid the reconstructed images being “stuck at zero,” was proposed for accelerating convergence. Numerical experiments demonstrate that the IEM preconditioner improves convergence speed more than the classical EM preconditioner, IEM-PKMA outperforms OTDA, L-BFGS-B-PC for the SATV regularized model, and PAPA, ADMM for the HOTV regularized model. Moreover, for clinical data, promising initial results indicate that ROS-PKMA may be able to obtain sufficiently converged images more rapidly than both ROS-PAPA and ROS-EM, but more research is necessary to properly evaluate these results.

Supplementary Material

Supplementary Materials

NIHMS1621857-supplement-Supplementary_Materials.pdf^{(177.3KB, pdf)}

Acknowledgement

The authors are grateful to Dr. Guobao Wang and Dr. Jinyi Qi for providing their codes for OTDA, to Dr. Charles W. Stearns, Dr. Sangtae Ahn, Dr. Kris Thielemans and Yu-Jung Tsai for helpful discussions on implementation of L-BFGS-B-PC, and to an anonymous referee for bringing reference [30] to our attention. They are grateful to GE for providing C. R. Schmidtlein, through a research agreement with MSKCC, the PET toolbox for the clinical experiments.

This work was supported in part by the Special Project on High-performance Computing through the National Key R&D Program under Grant 2016YFB0200602, in part by the Natural Science Foundation of China under Grant 11771464, Grant 11601537, Grant 11471013, and Grant 11501584, in part by the Fundamental Research Funds for the Central Universities of China, and in part by the Imaging and Radiation Sciences subaward of the MSK Cancer Center Support Grant/Core Grant under Grant P30 CA008748.

Appendix A

We provide the definition of 2D first-order and second-order isotropic TV (ITV) and the explicit form of the corresponding functions’ proximity operator. For the 3D case, please refer to [3]. The first-order and second-order ITV can be written as φ₁ ∘ B₁ and φ₂ ∘ B₂ respectively. For the definition of B₁ and B₂, we let $N = \sqrt{d}$ , I_N denote the N × N identity matrix, D denote the N × N backward difference matrix such that D_j,j = 1 and D_j,j–1 = −1 for j = 2, 3, … , N, and all other entries of D are zero. Through the matrix Kronecker product ⊗, $B_{1} \in R^{2 d \times d}$ and $B_{2} \in R^{4 d \times d}$ are defined, respectively, by

B_{1} ≔ [\begin{matrix} I_{N} \otimes D \\ D \otimes I_{N} \end{matrix}] B_{2} ≔ [\begin{matrix} I_{N} \otimes (- D^{T} D) \\ (- D^{T}) \otimes D \\ (- D^{T} D) \otimes I_{N} \\ D \otimes (- D^{T}) \end{matrix}] .

For $x \in R^{2 d}$ , define $φ_{1} (x) ≔ \sum_{i = 1}^{d} ‖ z_{1}^{(i)} ‖_{2}$ , where

z_{1}^{(i)} ≔ [x_{i}, x_{d + i}]^{T}, i = 1, 2, \dots, d .

(31)

For $x \in R^{4 d}$ , define $φ_{2} (x) ≔ \sum_{i = 1}^{d} ‖ z_{2}^{(i)} ‖_{2}$ , where

z_{2}^{(i)} ≔ [x_{i}, x_{d + i}, x_{2 d + i}, x_{3 d + i}]^{T}, i = 1, 2, \dots, d .

(32)

Next, we provide explicit forms of the proximity operator of φ₁ and φ₂. Let ω be a positive number. As shown in [3], for $x \in R^{2 d}$ , by denoting u := prox_ωφ₁ (x), we have

u_{j d + i} = (\max {‖ z_{1}^{(i)} ‖_{2} - ω, 0} ∕ ‖ z_{1}^{(i)} ‖_{2}) \cdot x_{j d + i}

for j = 0, 1, i = 1, 2,…, d, where $z_{1}^{(i)}$ is defined by (31). For $x \in R^{4 d}$ , we denote v := prox_ωφ₂(x). Then

v_{j d + i} = (\max {‖ z_{2}^{(i)} ‖_{2} - ω, 0} ∕ ‖ z_{2}^{(i)} ‖_{2}) \cdot x_{j d + i}

for j = 0, 1, 2, 3, i = 1, 2, … , d, where $z_{2}^{(i)}$ is defined by (32).

Appendix B

In this appendix, we prove the convergence of PKMA by employing the KM theorem.

To recall the KM theorem, we first recall the definition of nonexpansive operator. Let $H \in S_{+}^{n}$ . We say that $T : R^{n} \to R^{n}$ is nonexpansive with respect to H if for any $x \in R^{n}$ , $y \in R^{n}$ , $‖ T x - T y ‖_{H} \leq ‖ x - y ‖_{H}$ .

Theorem 4 (Krasnoselskii-Mann [38]-[40]): Let $T : R^{n} \to R^{n}$ be a nonexpansive operator such that the set of its fixed points is non-empty. For ${ω_{k}}_{k \in N_{0}} \subset (0, 1)$ and $x^{0} \in R^{n}$ , define

x^{k + 1} = (1 - ω_{k}) x^{k} + ω_{k} T x^{k}, k \in N_{0} .

(33)

If $\sum_{k = 0}^{\infty} ω_{k} (1 - ω_{k}) = + \infty$ , then ${x^{k}}_{k \in N_{0}}$ converges to a fixed-point of $T$ .

We shall employ Theorem 4 to prove the convergence of PKMA. For this purpose, we rewrite iteration (21) in the form of (33). To this end, we first prove that $T_{W}$ is averaged nonexpansive with respect to R⁻¹ G. We recall the definition of firmly nonexpansive and averaged nonexpansive operators, and two related lemmas.

An operator $T : R^{n} \to R^{n}$ is called firmly nonexpansive with respect to $H \in S_{+}^{n}$ if $‖ T x - T y ‖_{H}^{2} \leq 〈 T x - T y, x - y 〉_{H}$ for any $x \in R^{n}$ , $y \in R^{n}$ . If there exists a nonexpansive operator $N : R^{n} \to R^{n}$ with respect to H such that $T = (1 - α) I + α N$ , we say that $T$ is α-averaged nonexpansive with respect to H. Firmly nonexpansiveness of an operator corresponds to its $\frac{1}{2}$ -averaged nonexpansiveness by [41, Remark 4.24].

Lemma 5 (Baillon-Haddad [41]): Let $ψ : R^{n} \to R$ be differentiable and convex, L be a positive real number. Then ∇ψ is L-Lipschitz continuous if and only if for all x, $y \in R^{n}$ , $‖ \nabla ψ (x) - \nabla ψ (y) ‖_{2}^{2} \leq L 〈 x - y$ , $\nabla ψ (x) - \nabla ψ (y) 〉$ .

Lemma 6 (Combettes-Yamada [42]): Let $H \in S_{+}^{n}$ , 0 < α₁ < 1 and 0 < α₂ < 1. If $T_{1} : R^{n} \to R^{n}$ is α₁-averaged nonexpansive with respect to H, and $T_{2} : R^{n} \to R^{n}$ is α₂-averaged nonexpansive with respect to H, then T₁ ∘ T₂ is $\frac{α_{1} + α_{2} - 2 α_{1} α_{2}}{1 - α_{1} α_{2}}$ -averaged nonexpansive with respect to H.

Notice that $W ≔ R^{- 1} G = [\begin{matrix} P^{- 1} & - B^{T} \\ - B & Q^{- 1} \end{matrix}]$ is symmetric since P, Q are both symmetric. Therefore, $W \in S_{+}^{d + m_{0}}$ if and only if its smallest eigenvalue λ_W > 0. Next we prove that $T_{W}$ is averaged nonexpansive with respect to W.

Lemma 7: Let T_G, $T_{W}$ , r, R and G be defined by (13), (14), (9), (10) and (11) respectively, and $ζ ≔ \frac{2 λ_{W}}{4 λ_{W} - L}$ . If $λ_{W} > \frac{L}{2}$ , then $T_{W}$ is ζ-averaged nonexpansive with respect to W.

Proof: $λ_{W} > \frac{L}{2} > 0$ gives that $W \in S_{+}^{d + m_{0}}$ and

‖ W^{- \frac{1}{2}} ‖_{2}^{2} = λ_{\max} (W^{- 1}) = λ_{W}^{- 1},

(34)

where λ_max(W) denotes the largest eigenvalue of W. [24, Lemma 3.2] shows that T_G is firmly nonexpansive with respect to W. Hence it is $\frac{1}{2}$ -averaged nonexpansive with respect to W. According to Lemma 6 and the definition of $T_{W}$ , to prove that $T_{W}$ is averaged nonexpansive with respect to W, it suffices to show that $I - W^{- 1} \nabla r$ is averaged nonexpansive with respect to W.

Define $α ≔ \frac{L}{2 λ_{W}}$ and $N ≔ I - \frac{1}{α} W^{- 1} \nabla r$ . Then 0 < α < 1 and $I - W^{- 1} \nabla r = (1 - α) I + α N$ . It is easy to verify that r is convex and differentiable with an L-Lipschitz continuous gradient. For w, $v \in R^{d + m_{0}}$ , let z := ∇r (w) – ∇r (v). By (34), Lemma 5 and by $α = \frac{L}{2 λ_{W}}$ , we have that

\frac{1}{α^{2}} ‖ W^{- \frac{1}{2}} z ‖_{2}^{2} \leq \frac{1}{α^{2}} ‖ W^{- \frac{1}{2}} ‖_{2}^{2} ‖ z ‖_{2}^{2} \leq \frac{2}{α} 〈 w - v, z 〉 .

Hence

‖ N w - N v ‖_{W}^{2} = {‖ (w - v) - \frac{1}{α} W^{- 1} z ‖}_{W}^{2} = ‖ w - v ‖_{W}^{2} + \frac{1}{α^{2}} ‖ W^{- \frac{1}{2}} z ‖_{2}^{2} - \frac{2}{α} 〈 w - v, z 〉 \leq ‖ w - v ‖_{W}^{2},

which implies that $N$ is nonexpansive with respect to W. Thus $I - W^{- 1} \nabla r$ is α-averaged nonexpansive with respect to W. Therefore, $T_{W} = T_{G} \circ (I - W^{- 1} \nabla r)$ is ζ-averaged nonexpansive with respect to W by Lemma 6. ■

Now we prove the convergence of iteration (21).

Theorem 8: Let $T_{W}$ , α_k, $T^{α_{k}}$ be defined by (14), (19) and (20) respectively, where ϱ ∈ (−1, 1) and δ ≥ 0. For given $v^{0} \in R^{d + m_{0}}$ , let ${v^{k}}_{k \in N_{0}}$ be a sequence generated by iteration (21), and $f^{k} ≔ [v_{1}^{k}, v_{2}^{k}, \dots, v_{d}^{k}]^{T}$ , $k \in N_{0}$ . If $λ_{W} > \frac{L}{2 (1 - \max {ϱ, 0})}$ , then ${f^{k}}_{k \in N_{0}}$ converges to a solution of model (5).

Proof: We shall employ Theorem 4 to prove this theorem. To this end, we define $ζ ≔ \frac{2 λ_{W}}{4 λ_{W} - L}$ , and show below that there exists a nonexpansive operator $N$ with respect to W such that

T^{α_{k}} = (1 - α_{k} ζ) I + α_{k} ζ N, k \in N_{0},

(35)

and

α_{k} ζ \in (0, 1), \sum_{k = 0}^{\infty} α_{k} ζ (1 - α_{k} ζ) = + \infty .

(36)

It follows from Lemma 7 that $T_{W}$ is ζ-averaged nonexpansive with respect to W. By the definition of averaged nonexpansiveness, there exists a nonexpansive operator $N$ with respect to W such that $T_{W} = (1 - ζ) I + ζ N$ . Substituting this equation into the definition (20) of $T^{α_{k}}$ , we obtain (35).

It remains to verify that α_kζ satisfies (36) for $k \in N_{0}$ . If −1 < ϱ < 0, then $λ_{W} > \frac{L}{2}$ and 1 + ϱ ≤ α_k ≤ 1, $k \in N_{0}$ . Hence $\frac{1}{2} < ζ < 1$ and $0 < \frac{1 + ϱ}{2} < α_{k} ζ \leq ζ < 1$ , $k \in N_{0}$ . In this case, $\sum_{k = 0}^{\infty} α_{k} ζ (1 - α_{k} ζ) > \sum_{k = 1}^{\infty} \frac{1 + ϱ}{2} (1 - ζ) = + \infty$ . If 0 ≤ ϱ < 1, then $λ_{W} > \frac{L}{2 (1 - ϱ)}$ and 1 ≤ α_k ≤ 1 + ϱ, $k \in N_{0}$ . Hence $\frac{1}{2} < ζ < \frac{1}{1 + ϱ}$ and $\frac{1}{2} < α_{k} ζ < 1$ , $k \in N_{0}$ . Given ζ, 0 ≤ ϱ < 1 implies that there exists ϱ′ such that ϱ < ϱ′ < 1 and $\frac{1}{2} < ζ < \frac{1}{1 + ϱ^{'}}$ . Thus $α_{k} ζ < \frac{1 + ϱ}{1 + ϱ^{'}}$ , and then $\sum_{k = 0}^{\infty} α_{k} ζ (1 - α_{k} ζ) > \sum_{k = 0}^{\infty} \frac{1}{2} (1 - \frac{1 + ϱ}{1 + ϱ^{'}}) = + \infty$ .

Therefore, by Theorem 4, ${v^{k}}_{k \in N_{0}}$ converges to a fixed-point of $T_{W}$ . By Theorem 1, we conclude that ${f^{k}}_{k \in N_{0}}$ converges to a solution of model (5). ■

For specific choice of the preconditioning matrices P and Q, we have the following lemma for proving the convergence of PKMA.

Lemma 9: Let β, ρ₁, ρ₂ and ξ be positive numbers, $S \in R^{d \times d}$ be a diagonal matrix with positive diagonal entries, P := βS, Q := diag(ρ₁1_m₁, ρ₂1_m₂), $p ≔ \frac{1}{β S_{m a x}}$ , $B ≔ [\begin{matrix} B_{1} \\ B_{2} \end{matrix}]$ , and $W ≔ [\begin{matrix} P^{- 1} & - B^{T} \\ - B & Q^{- 1} \end{matrix}]$ . If $β < \frac{1}{ξ S_{m a x}}$ , $ρ_{1} < \frac{p - ξ}{2 ‖ B_{1} ‖_{2}^{2} + (p - ξ) ξ}$ and $ρ_{2} < \frac{p - ξ}{2 ‖ B_{2} ‖_{2}^{2} + (p - ξ) ξ}$ , then λ_W > ξ.

Proof: Clearly, W is symmetric. To prove that λ_W > ξ, it suffices to show that

W - ξ I_{d + m_{0}} = [\begin{matrix} P^{- 1} - ξ I_{d} & - B^{T} \\ - B & Q^{- 1} - ξ I_{m_{0}} \end{matrix}]

is positive definite. Let $t_{1} ≔ {(\frac{1}{ρ_{1}} - ξ)}^{- 1}$ , $t_{2} ≔ {(\frac{1}{ρ_{2}} - ξ)}^{- 1}$ . If β, ρ₁ and ρ₂ satisfy the conditions in this lemma, then p – ξ > 0, $0 < t_{1} < \frac{p - ξ}{2 ‖ B_{1} ‖_{2}^{2}}$ , $0 < t_{2} < \frac{p - ξ}{2 ‖ B_{2} ‖_{2}^{2}}$ , and moreover, P⁻¹–ξI_d and Q⁻¹–ξI_m₀ are both positive definite. Define

\tilde{B} ≔ (Q^{- 1} - ξ I_{m_{0}})^{- \frac{1}{2}} B (P^{- 1} - ξ I_{d})^{- \frac{1}{2}} .

Then

\tilde{B} = [\begin{matrix} \sqrt{t_{1}} B_{1} (P^{- 1} - ξ I_{d})^{- \frac{1}{2}} \\ \sqrt{t_{2}} B_{2} (P^{- 1} - ξ I_{d})^{- \frac{1}{2}} \end{matrix}]

by the definitions of Q and B. It follows from [25, Lemma 6.2] that W – ξI_d+m₀ is positive definite if and only if $‖ \tilde{B} ‖_{2} < 1$ , which we verify below. For any $E_{1} \in R^{m_{1} \times d}$ and $E_{2} \in R^{m_{2} \times d}$ , by the definition of matrix ℓ₂ norm, we have

{‖ [\begin{matrix} E_{1} \\ E_{2} \end{matrix}] ‖}_{2}^{2} = \max_{‖ x ‖_{2} = 1} {‖ [\begin{matrix} E_{1} x \\ E_{2} x \end{matrix}] ‖}_{2}^{2} \leq \max_{‖ x ‖_{2} = 1} ‖ E_{1} x ‖_{2}^{2} + \max_{‖ y ‖_{2} = 1} ‖ E_{2} y ‖_{2}^{2} = ‖ E_{1} ‖_{2}^{2} + ‖ E_{2} ‖_{2}^{2} .

Thus

‖ \tilde{B} ‖_{2}^{2} \leq t_{1} ‖ B_{1} (P^{- 1} - ξ I_{d})^{- \frac{1}{2}} ‖_{2}^{2} + t_{2} ‖ B_{2} (P^{- 1} - ξ I_{d})^{- \frac{1}{2}} ‖_{2}^{2} \leq t_{1} ‖ B_{1} ‖_{2}^{2} \cdot (p - ξ)^{- 1} + t_{2} ‖ B_{2} ‖_{2}^{2} \cdot (p - ξ)^{- 1} < \frac{1}{2} + \frac{1}{2} = 1,

which completes the proof.

Footnotes

This article has supplementary downloadable material available at http://ieeexplore.ieee.org, provided by the author.

Contributor Information

Yizun Lin, School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China.

C. Ross Schmidtlein, Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065 USA.

Qia Li, School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510275, China.

Si Li, School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China.

Yuesheng Xu, Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529 USA, and also with the Guangdong Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou 510275, China.

References

[1].Rudin LI, Osher S, and Fatemi E, “Nonlinear total variation based noise removal algorithms,” Phys. D, Nonlinear Phenomena, vol. 60, nos. 1–4, pp. 259–268, 1992. [Google Scholar]
[2].Bredies K, Kunisch K, and Pock T, “Total generalized variation,” SIAM J. Imag. Sci, vol. 3, no. 3, pp. 492–526, 2010. [Google Scholar]
[3].Li S et al. , “Effective noise-suppressed and artifact-reduced reconstruction of SPECT data using a preconditioned alternating projection algorithm,” Med. Phys, vol. 42, no. 8, pp. 4872–4887, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Fessler JA and Booth SD, “Conjugate-gradient preconditioning methods for shift-variant PET image reconstruction,” IEEE Trans. Image Process, vol. 8, no. 5, pp. 688–699, May 1999. [DOI] [PubMed] [Google Scholar]
[5].Yu DF and Fessler JA, “Edge-preserving tomographic reconstruction with nonlocal regularization,” IEEE Trans. Med. Imag, vol. 21, no. 2, pp. 159–173, February 2002. [DOI] [PubMed] [Google Scholar]
[6].Wang G and Qi J, “Penalized likelihood PET image reconstruction using patch-based edge-preserving regularization,” IEEE Trans. Med. Imag, vol. 31, no. 12, pp. 2194–2204, December 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Tsai Y-J et al. , “Fast quasi-Newton algorithms for penalized reconstruction in emission tomography and further improvements via preconditioning,” IEEE Trans. Med. Imag, vol. 37, no. 4, pp. 1000–1010, April 2018. [DOI] [PubMed] [Google Scholar]
[8].Wang G and Qi J, “Edge-preserving PET image reconstruction using trust optimization transfer,” IEEE Trans. Med. Imag, vol. 34, no. 4, pp. 930–939, April 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Panin VY, Zeng GL, and Gullberg GT, “Total variation regulated EM algorithm,” IEEE Trans. Nucl. Sci, vol. 46, no. 6, pp. 2202–2210, December 1999. [Google Scholar]
[10].Jonsson E, Huang S-C, and Chan T, “Total variation regularization in positron emission tomography,” Univ. California, Los Angeles, Los Angeles, CA, USA, CAM-Rep. 98-48, 1998, pp. 1–25. [Google Scholar]
[11].Sawatzky A, Brune C, Wubbeling F, Kosters T, Schafers K, and Burger M, “Accurate EM-TV algorithm in PET with low SNR,” in Proc. IEEE Nucl. Sci. Symp. Conf. Rec., October 2008, pp. 5133–5137. [Google Scholar]
[12].Bardsley JM, “An efficient computational method for total variation-penalized Poisson likelihood estimation,” Inverse Problems Imag, vol. 2, no. 2, pp. 167–185, 2008. [Google Scholar]
[13].Bardsley JM and Luttman A, “Total variation-penalized Poisson likelihood estimation for ill-posed problems,” Adv. Comput. Math, vol. 31, nos. 1–3, p. 35, 2009. [Google Scholar]
[14].Bardsley JM and Goldes J, “Regularization parameter selection and an efficient algorithm for total variation-regularized positron emission tomography,” Numer. Algorithms, vol. 57, no. 2, pp. 255–271, 2011. [Google Scholar]
[15].Chaux C, Pesquet J-C, and Pustelnik N, “Nested iterative algorithms for convex constrained image recovery problems,” SIAM J. Imag. Sci, vol. 2, no. 2, pp. 730–762, 2009. [Google Scholar]
[16].Bonettini S and Ruggiero V, “An alternating extragradient method for total variation-based image restoration from Poisson data,” Inverse Problems, vol. 27, no. 9, 2011, Art. no. 095001. [Google Scholar]
[17].Sidky EY, Jørgensen JH, and Pan X, “Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle–Pock algorithm,” Phys. Med. Biol, vol. 57, no. 10, p. 3065, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Komodakis N and Pesquet J-C, “Playing with Duality: An overview of recent primal–dual approaches for solving large-scale optimization problems,” IEEE Signal Process. Mag, vol. 32, no. 6, pp. 31–54, November 2015. [Google Scholar]
[19].Boyd S, Parikh N, Chu E, Peleato B, and Eckstein J, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn, vol. 3, no. 1, pp. 1–122, January 2011. [Google Scholar]
[20].Chun SY, Dewaraja YK, and Fessler JA, “Alternating direction method of multiplier for tomography with nonlocal regularizers,” IEEE Trans. Med. Imag, vol. 33, no. 10, pp. 1960–1968, October 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Krol A, Li S, Shen L, and Xu Y, “Preconditioned alternating projection algorithms for maximum a posteriori ECT reconstruction,” Inverse Problems, vol. 28, no. 11, 2012, Art. no. 115005. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Wu Z, Li S, Zeng X, Xu Y, and Krol A, “Reducing staircasing artifacts in SPECT reconstruction by an infimal convolution regularization,” J. Comput. Math, vol. 34, no. 6, pp. 626–647, 2016. [Google Scholar]
[23].Pock T and Chambolle A, “Diagonal preconditioning for first order primal-dual algorithms in convex optimization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), November 2011, pp. 1762–1769. [Google Scholar]
[24].Li Q and Zhang N, “Fast proximity-gradient algorithms for structured convex optimization problems,” Appl. Comput. Harmon. Anal, vol. 41, no. 2, pp. 491–517, 2016. [Google Scholar]
[25].Li Q, Shen L, Xu Y, and Zhang N, “Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing,” Adv. Comput. Math, vol. 41, no. 2, pp. 387–422, 2015. [Google Scholar]
[26].Nesterov YE, “A method of solving a convex programming problem with convergence rate O(1/k²),” Sov. Math. Dokl, vol. 27, no. 2, pp. 372–376, 1983. [Google Scholar]
[27].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci, vol. 2, no. 1, pp. 183–202, 2009. [Google Scholar]
[28].Moreau JJ, “Proximité et dualité dans un espace Hilbertien,” Bull. Soc. Math. France, vol. 93, no. 2, pp. 273–299, 1965. [Google Scholar]
[29].Moreau JJ, “Fonctions convexes duales et points proximaux dans un espace Hilbertien,” C. R. Acad. Sci. Paris A, Math, vol. 255, pp. 2897–2899, December 1962. [Google Scholar]
[30].Mumcuoglu EU, Leahy R, Cherry SR, and Zhou Z, “Fast gradient-based methods for Bayesian reconstruction of transmission and emission PET images,” IEEE Trans. Med. Imag, vol. 13, no. 4, pp. 687–701, December 1994. [DOI] [PubMed] [Google Scholar]
[31].Schmidtlein CR et al. , “Relaxed ordered subset preconditioned alternating projection algorithm for PET reconstruction with automated penalty weight selection,” Med. Phys, vol. 44, no. 8, pp. 4083–4097, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Moses WW, “Fundamental limits of spatial resolution in PET,” Nucl. Instrum. Methods Phys. Res. A, Accel. Spectrom. Detect. Assoc. Equip, vol. 648, pp. S236–S240, August 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Berthon B et al. , “PETSTEP: Generation of synthetic PET lesions for fast evaluation of segmentation methods,” Phys. Med, vol. 31, no. 8, pp. 969–980, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Micchelli CA, Shen L, and Xu Y, “Proximity algorithms for image models: Denoising,” Inverse Problems, vol. 27, no. 4, 2011, Art. no. 045009. [Google Scholar]
[35].Lange K, “Convergence of EM image reconstruction algorithms with Gibbs smoothing,” IEEE Trans. Med. Imag, vol. 9, no. 4, pp. 439–446, December 1990. [DOI] [PubMed] [Google Scholar]
[36].Chambolle A and Pock T, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imag. Vis, vol. 40, no. 1, pp. 120–145, 2011. [Google Scholar]
[37].Kim D, Ramani S, and Fessler JA, “Combining ordered subsets and momentum for accelerated X-ray CT image reconstruction,” IEEE Trans. Med. Imag, vol. 34, no. 1, pp. 167–178, January 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Krasnosel’skii MA, “Two remarks on the method of successive approximations,” Uspekhi Matematicheskikh Nauk, vol. 10, no. 1, pp. 123–127, 1955. [Google Scholar]
[39].Mann WR, “Mean value methods in iteration,” Proc. Amer. Math. Soc, vol. 4, no. 3, pp. 506–510, 1953. [Google Scholar]
[40].Bauschke HH and Combettes PL, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. New York, NY, USA: Springer, 2011. [Google Scholar]
[41].Baillon J-B and Haddad G, “Quelques propriétés des opérateurs anglebornés etn-cycliquement monotones,” Isr. J. Math, vol. 26, no. 2, pp. 137–150, 1977. [Google Scholar]
[42].Combettes PL and Yamada I, “Compositions and convex combinations of averaged nonexpansive operators,” J. Math. Anal. Appl, vol. 425, no. 1, pp. 55–70, 2015. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1621857-supplement-Supplementary_Materials.pdf^{(177.3KB, pdf)}

[R1] [1].Rudin LI, Osher S, and Fatemi E, “Nonlinear total variation based noise removal algorithms,” Phys. D, Nonlinear Phenomena, vol. 60, nos. 1–4, pp. 259–268, 1992. [Google Scholar]

[R2] [2].Bredies K, Kunisch K, and Pock T, “Total generalized variation,” SIAM J. Imag. Sci, vol. 3, no. 3, pp. 492–526, 2010. [Google Scholar]

[R3] [3].Li S et al. , “Effective noise-suppressed and artifact-reduced reconstruction of SPECT data using a preconditioned alternating projection algorithm,” Med. Phys, vol. 42, no. 8, pp. 4872–4887, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Fessler JA and Booth SD, “Conjugate-gradient preconditioning methods for shift-variant PET image reconstruction,” IEEE Trans. Image Process, vol. 8, no. 5, pp. 688–699, May 1999. [DOI] [PubMed] [Google Scholar]

[R5] [5].Yu DF and Fessler JA, “Edge-preserving tomographic reconstruction with nonlocal regularization,” IEEE Trans. Med. Imag, vol. 21, no. 2, pp. 159–173, February 2002. [DOI] [PubMed] [Google Scholar]

[R6] [6].Wang G and Qi J, “Penalized likelihood PET image reconstruction using patch-based edge-preserving regularization,” IEEE Trans. Med. Imag, vol. 31, no. 12, pp. 2194–2204, December 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Tsai Y-J et al. , “Fast quasi-Newton algorithms for penalized reconstruction in emission tomography and further improvements via preconditioning,” IEEE Trans. Med. Imag, vol. 37, no. 4, pp. 1000–1010, April 2018. [DOI] [PubMed] [Google Scholar]

[R8] [8].Wang G and Qi J, “Edge-preserving PET image reconstruction using trust optimization transfer,” IEEE Trans. Med. Imag, vol. 34, no. 4, pp. 930–939, April 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Panin VY, Zeng GL, and Gullberg GT, “Total variation regulated EM algorithm,” IEEE Trans. Nucl. Sci, vol. 46, no. 6, pp. 2202–2210, December 1999. [Google Scholar]

[R10] [10].Jonsson E, Huang S-C, and Chan T, “Total variation regularization in positron emission tomography,” Univ. California, Los Angeles, Los Angeles, CA, USA, CAM-Rep. 98-48, 1998, pp. 1–25. [Google Scholar]

[R11] [11].Sawatzky A, Brune C, Wubbeling F, Kosters T, Schafers K, and Burger M, “Accurate EM-TV algorithm in PET with low SNR,” in Proc. IEEE Nucl. Sci. Symp. Conf. Rec., October 2008, pp. 5133–5137. [Google Scholar]

[R12] [12].Bardsley JM, “An efficient computational method for total variation-penalized Poisson likelihood estimation,” Inverse Problems Imag, vol. 2, no. 2, pp. 167–185, 2008. [Google Scholar]

[R13] [13].Bardsley JM and Luttman A, “Total variation-penalized Poisson likelihood estimation for ill-posed problems,” Adv. Comput. Math, vol. 31, nos. 1–3, p. 35, 2009. [Google Scholar]

[R14] [14].Bardsley JM and Goldes J, “Regularization parameter selection and an efficient algorithm for total variation-regularized positron emission tomography,” Numer. Algorithms, vol. 57, no. 2, pp. 255–271, 2011. [Google Scholar]

[R15] [15].Chaux C, Pesquet J-C, and Pustelnik N, “Nested iterative algorithms for convex constrained image recovery problems,” SIAM J. Imag. Sci, vol. 2, no. 2, pp. 730–762, 2009. [Google Scholar]

[R16] [16].Bonettini S and Ruggiero V, “An alternating extragradient method for total variation-based image restoration from Poisson data,” Inverse Problems, vol. 27, no. 9, 2011, Art. no. 095001. [Google Scholar]

[R17] [17].Sidky EY, Jørgensen JH, and Pan X, “Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle–Pock algorithm,” Phys. Med. Biol, vol. 57, no. 10, p. 3065, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Komodakis N and Pesquet J-C, “Playing with Duality: An overview of recent primal–dual approaches for solving large-scale optimization problems,” IEEE Signal Process. Mag, vol. 32, no. 6, pp. 31–54, November 2015. [Google Scholar]

[R19] [19].Boyd S, Parikh N, Chu E, Peleato B, and Eckstein J, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn, vol. 3, no. 1, pp. 1–122, January 2011. [Google Scholar]

[R20] [20].Chun SY, Dewaraja YK, and Fessler JA, “Alternating direction method of multiplier for tomography with nonlocal regularizers,” IEEE Trans. Med. Imag, vol. 33, no. 10, pp. 1960–1968, October 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Krol A, Li S, Shen L, and Xu Y, “Preconditioned alternating projection algorithms for maximum a posteriori ECT reconstruction,” Inverse Problems, vol. 28, no. 11, 2012, Art. no. 115005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Wu Z, Li S, Zeng X, Xu Y, and Krol A, “Reducing staircasing artifacts in SPECT reconstruction by an infimal convolution regularization,” J. Comput. Math, vol. 34, no. 6, pp. 626–647, 2016. [Google Scholar]

[R23] [23].Pock T and Chambolle A, “Diagonal preconditioning for first order primal-dual algorithms in convex optimization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), November 2011, pp. 1762–1769. [Google Scholar]

[R24] [24].Li Q and Zhang N, “Fast proximity-gradient algorithms for structured convex optimization problems,” Appl. Comput. Harmon. Anal, vol. 41, no. 2, pp. 491–517, 2016. [Google Scholar]

[R25] [25].Li Q, Shen L, Xu Y, and Zhang N, “Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing,” Adv. Comput. Math, vol. 41, no. 2, pp. 387–422, 2015. [Google Scholar]

[R26] [26].Nesterov YE, “A method of solving a convex programming problem with convergence rate O(1/k²),” Sov. Math. Dokl, vol. 27, no. 2, pp. 372–376, 1983. [Google Scholar]

[R27] [27].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci, vol. 2, no. 1, pp. 183–202, 2009. [Google Scholar]

[R28] [28].Moreau JJ, “Proximité et dualité dans un espace Hilbertien,” Bull. Soc. Math. France, vol. 93, no. 2, pp. 273–299, 1965. [Google Scholar]

[R29] [29].Moreau JJ, “Fonctions convexes duales et points proximaux dans un espace Hilbertien,” C. R. Acad. Sci. Paris A, Math, vol. 255, pp. 2897–2899, December 1962. [Google Scholar]

[R30] [30].Mumcuoglu EU, Leahy R, Cherry SR, and Zhou Z, “Fast gradient-based methods for Bayesian reconstruction of transmission and emission PET images,” IEEE Trans. Med. Imag, vol. 13, no. 4, pp. 687–701, December 1994. [DOI] [PubMed] [Google Scholar]

[R31] [31].Schmidtlein CR et al. , “Relaxed ordered subset preconditioned alternating projection algorithm for PET reconstruction with automated penalty weight selection,” Med. Phys, vol. 44, no. 8, pp. 4083–4097, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Moses WW, “Fundamental limits of spatial resolution in PET,” Nucl. Instrum. Methods Phys. Res. A, Accel. Spectrom. Detect. Assoc. Equip, vol. 648, pp. S236–S240, August 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Berthon B et al. , “PETSTEP: Generation of synthetic PET lesions for fast evaluation of segmentation methods,” Phys. Med, vol. 31, no. 8, pp. 969–980, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Micchelli CA, Shen L, and Xu Y, “Proximity algorithms for image models: Denoising,” Inverse Problems, vol. 27, no. 4, 2011, Art. no. 045009. [Google Scholar]

[R35] [35].Lange K, “Convergence of EM image reconstruction algorithms with Gibbs smoothing,” IEEE Trans. Med. Imag, vol. 9, no. 4, pp. 439–446, December 1990. [DOI] [PubMed] [Google Scholar]

[R36] [36].Chambolle A and Pock T, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imag. Vis, vol. 40, no. 1, pp. 120–145, 2011. [Google Scholar]

[R37] [37].Kim D, Ramani S, and Fessler JA, “Combining ordered subsets and momentum for accelerated X-ray CT image reconstruction,” IEEE Trans. Med. Imag, vol. 34, no. 1, pp. 167–178, January 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].Krasnosel’skii MA, “Two remarks on the method of successive approximations,” Uspekhi Matematicheskikh Nauk, vol. 10, no. 1, pp. 123–127, 1955. [Google Scholar]

[R39] [39].Mann WR, “Mean value methods in iteration,” Proc. Amer. Math. Soc, vol. 4, no. 3, pp. 506–510, 1953. [Google Scholar]

[R40] [40].Bauschke HH and Combettes PL, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. New York, NY, USA: Springer, 2011. [Google Scholar]

[R41] [41].Baillon J-B and Haddad G, “Quelques propriétés des opérateurs anglebornés etn-cycliquement monotones,” Isr. J. Math, vol. 26, no. 2, pp. 137–150, 1977. [Google Scholar]

[R42] [42].Combettes PL and Yamada I, “Compositions and convex combinations of averaged nonexpansive operators,” J. Math. Anal. Appl, vol. 425, no. 1, pp. 55–70, 2015. [Google Scholar]

PERMALINK

A Krasnoselskii-Mann Algorithm with an Improved EM Preconditioner for PET Image Reconstruction

Yizun Lin

C Ross Schmidtlein

Qia Li

Si Li

Yuesheng Xu

Roles

Abstract

I. Introduction

II. PET Image Reconstruction

A. HOTV Regularized PET Image Reconstruction Model

B. Preconditioned Krasnoselskii-Mann Algorithm

C. Improved EM Preconditioner

Fig. 2.

III. Numerical Results

A. Simulation Setup

Fig. 1.

TABLE I.

B. Simulation Results for IEM-PKMA

1). Comparison of Preconditioners and Momentum Techniques:

Fig. 3.

Fig. 12.

Fig. 4.

2). Comparison of Algorithms for SATV Regularization:

Fig. 5.

3). Comparison of Algorithms for HOTV Regularization:

Fig. 6.

Fig. 7.

Fig. 8.

Fig. 9.

Fig. 10.

Fig. 11.

C. Initial Clinical Results

Fig. 13.

IV. Conclusion

Supplementary Material

Acknowledgement

Appendix A

Appendix B

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases