Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Oct 1.
Published in final edited form as: IEEE Trans Med Imaging. 2019 Feb 19;38(9):2114–2126. doi: 10.1109/TMI.2019.2898271

A Krasnoselskii-Mann Algorithm with an Improved EM Preconditioner for PET Image Reconstruction

Yizun Lin 1, C Ross Schmidtlein 2, Qia Li 3, Si Li 4, Yuesheng Xu 5
PMCID: PMC7528397  NIHMSID: NIHMS1621857  PMID: 30794510

Abstract

This paper presents a preconditioned Krasnoselskii-Mann (KM) algorithm with an improved EM preconditioner (IEM-PKMA) for higher-order total variation (HOTV) regularized positron emission tomography (PET) image reconstruction. The PET reconstruction problem can be formulated as a three-term convex optimization model consisting of the Kullback–Leibler (KL) fidelity term, a nonsmooth penalty term, and a nonnegative constraint term which is also nonsmooth. We develop an efficient KM algorithm for solving this optimization problem based on a fixed-point characterization of its solution, with a preconditioner and a momentum technique for accelerating convergence. By combining the EM precondtioner, a thresholding, and a good inexpensive estimate of the solution, we propose an improved EM preconditioner that can not only accelerate convergence but also avoid the reconstructed image being “stuck at zero.” Numerical results in this paper show that the proposed IEM-PKMA outperforms existing state-of-the-art algorithms including, the optimization transfer descent algorithm and the preconditioned L-BFGS-B algorithm for the differentiable smoothed anisotropic total variation regularized model, the preconditioned alternating projection algorithm, and the alternating direction method of multipliers for the nondifferentiable HOTV regularized model. Encouraging initial experiments using clinical data are presented.

Keywords: Krasnoselskii-Mann algorithm, image reconstruction, maximum likelihood estimation, positron emission tomography, total variation

I. Introduction

POSITRON emission tomography (PET) is a well-established technique for molecular imaging, where it can produce spatial (and temporal) estimates of a particular tracer’s bio-distribution (images). These images have been instrumental in diagnosis, staging, and are rapidly extending into therapy and response assessment. As a result, improved estimates of the tracer distribution may allow physicians and scientists to more accurately interpret the implications of these bio-distributions and improve patient outcomes. However, because of important health consideration, basic physical processes and work-flow issues, the data is inherently count and resolution limited and must be reconstructed within a few minutes of acquisition. As a result, a great deal of effort has gone into improving both the speed and accuracy of the estimation of the PET tracer distributions.

At present, these efforts have evolved in a penalized likelihood (PL) model where a data fidelity term (KL-divergence for PET, e.g., Poisson data noise models) plus one or more penalty terms to regularize the estimated images are optimized to produce the most likely image given the data and penalty. Despite the introduction of penalty terms can improve the quality of reconstructed images, these penalty terms may be nondifferentiable or even nonconvex, which makes it difficult to develop efficient algorithms for solving the PL model. To address this issue, many algorithms and penalties have been proposed, but without guidance, it is difficult to know if a particular algorithm or penalty is suitable for routine imaging.

The total variation (TV) penalty (regularization) introduced by [1] has had a great deal of success in image denoising by removing unwanted noise while preserving important details such as edges, and has been used successfully in medical imaging modalities, such as CT or MR. Unfortunately, TV can introduce piecewise constant (staircase) artifacts in the smoothly varying regions of the images, which make it unsuitable for PET. However, the addition of a second-order term, which we refer to throughout this paper as a higher-order TV (HOTV) penalty [2], [3], allows the preservation of edges at known tracer boundaries (e.g. liver, bladder, heart, brain, and, in the case of dynamic imaging, at the boundaries of blood vessels), while suppressing the undesired staircase artifacts.

Classical gradient-type algorithms are unable to solve TV or HOTV regularized models due to the nondifferentiability of their objective functions. To address this while maintaining their edge-preserving properties, [4]-[7] have used smooth approximations of the absolute value function to replace the 1 regularized model with a differentiable one. Efficient algorithms, including the preconditioned conjugate-gradient algorithm (PCG) [4], the optimization transfer descent algorithm (OTDA) [6], [8] and the preconditioned limited-memory Broyden-Fletcher-Goldfarb-Shanno with boundary constraints algorithm (L-BFGS-B-PC) [7], have been proposed for solving these smooth edge-preserving regularized models. Though the smooth approximation method used here can avoid the problem of nondifferentiability, it requires a tuning parameter, which is not part of the penalty weight. If this parameter is set too large, the smooth penalty will lose its edge-preserving property, and if it is set too small, as mentioned in [8], convergence of classical gradient-type algorithms may be slow. Therefore, it is necessary to develop new efficient algorithms to directly solve nondifferentiable models.

Recently, many algorithms have been proposed for nondifferentiable regularized models, including EM-based methods [9]-[11], projected quasi-Newton methods [12]-[14], forward-backward approach [15], primal-dual methods [16]-[18], augmented Lagrangian methods [19], [20], and fixed-point proximity methods [3], [21], [22]. Among these algorithms, the alternating direction method of multipliers (ADMM) [19], [20] and preconditioned alternating projection algorithm (PAPA) [3], [21] have shown good performance for regularized emission computed tomography (ECT) image reconstruction. In particular, [21] and [3] have shown that PAPA outperforms the nested EM-TV algorithm [11], the one-step-late method with TV regularization [9], and the preconditioned primal-dual algorithm [17], [23].

The fast proximity-gradient algorithm (FPGA) [24], derived from the multi-step fixed-point proximity framework [25], was developed for general three-term optimization problems. This algorithm has some desirable properties. For example, its development was guided by constructing an averaged nonexpansive operator for fixed-point iteration that guarantees monotone convergence. That is, as the fixed-point iteration proceeds, the distance between the iterative result and the true solution is monotone decreasing. Moreover, the inner iteration, required for ADMM, is unnecessary, which results in more stable convergence. To improve the convergence speed, FPGA introduced the Nesterov momentum technique [26], [27]. However, FPGA does not provide a strategy for choosing efficient preconditioners for three-term optimization problems.

An alternative fixed-point algorithm, PAPA, introduces the EM preconditioner instead of a step-size parameter, which greatly improves its convergence. Although the use of the EM preconditioner in PAPA makes it an efficient algorithm for solving the HOTV regularized PET reconstruction model, both PAPA and the preconditioner introduce problems. In PAPA, the use of extra-gradient step (via reuse of the forward and backward projections), makes it difficult to prove the convergence of PAPA with a momentum technique. This arises because such a method may not be reformulated as a fixed-point iteration of an averaged nonexpansive operator. Another problem is that the classical EM preconditioning matrix may not be strictly positive definite. If a component of the reconstructed image becomes zero or very close to zero at an iteration step, then this component will be stuck at zero in all the remaining iterations.

Based on the construction of averaged nonexpansive operator in FPGA and the EM preconditioner, we propose an efficient preconditioned KM algorithm (PKMA) with an improved EM (IEM) preconditioner that can be easily proven convergent with a momentum technique as well as a preconditioner, and can at the same time avoid the reconstructed images being “stuck at zero.” In PKMA, we propose a simpler but more general form of momentum parameters compared to those used in the Nesterov momentum scheme. The generality of our proposed momentum parameters is demonstrated by two properties. First, our proposed momentum parameters includes the absence of momentum as a special case. Second, we prove that the Nesterov momentum parameters are asymptotically equivalent to a special case of our proposed momentum parameters. The use of the KM based momentum scheme obtains a better approximation of the solution by adding the current fixed-point update to the difference between the current fixed-point update and the update from prior iteration. We will show that, with proper selection of a factor of the difference term (momentum term), this KM based momentum scheme can ensure faster convergence toward the solution. The KM approach is a generalization of fixed-point approach, which leads to a simple convergence proof and provides insight toward choosing the algorithmic parameters.

This paper is organized in four sections and two appendices. In section II, we first describe the HOTV regularized PET image reconstruction model, then develop a preconditioned KM algorithm with an improved EM preconditioner for solving this model. A proof for convergence of PKMA is included. Section III presents simulation results for comparison of our proposed IEM-PKMA with OTDA and L-BFGS-B-PC for the differentiable smoothed anisotropic TV (SATV) regularized model, and that with PAPA and ADMM for the nondifferentiable HOTV regularized model. Initial experiments using clinical data are also presented. Section IV offers a conclusion. In the appendices, we provide technical details for development of PKMA and analysis of its convergence.

II. PET Image Reconstruction

We develop in this section a preconditioned KM algorithm with an improved EM preconditioner for solving the HOTV regularized PET reconstruction model. Here we limit the discussion to only the case of the first-order plus the second-order TV penalty.

A. HOTV Regularized PET Image Reconstruction Model

We begin with describing the HOTV regularized PET reconstruction model. We denote by R+ the set of all nonnegative real numbers. For two positive integers m and d, AR+m×d denotes the PET system matrix whose (i, j)th entry equals to the probability of detection of the photon pairs emitted from the jth voxel of the radiotracer distribution fR+d within a patient (or a phantom) by the ith detector bin pair. The vector γR+m represents the mean value of the background noise from random and scatter coincidences. The relation of projection data gR+m of a PET system with the radiotracer distribution f can be described by the Poisson model

g=Poisson(Af+γ), (1)

where Poisson(x) denotes a Poisson-distributed random vector with mean x. System (1) may be solved by minimizing the following fidelity term

F(f)Af,1mln(Af+γ),g, (2)

where 1mRm is the vector whose components are all 1, the logarithmic function at xRn is defined by ln x := [ln x1, ln x2, … , ln xn], and x,yi=1nxiyi is the inner product of x, yRn.

Minimization of the fidelity term is well-known to be ill-posed [5], which results in severe over-fitting in the reconstructed image. To avoid this over-fitting problem, regularization terms were introduced as part of the reconstruction model. In this study, using both the first-order and the second-order TV penalties, we get the following HOTV regularized PET image reconstruction model

arg minfRd{F(f)+λ1(φ1B1)(f)+λ2(φ2B2)(f)+ı(f)}, (3)

where φ1B1 and φ2B2 represent the first-order and second-order TV, respectively. The two functions φ1, φ2 are defined by the 1-norm for the anisotropic TV or the 2-norm for the isotropic TV, and thus they are convex. Here B1Rm1×d, B2Rm2×d are the first-order and second-order difference matrices, respectively, and λ1, λ2R+ are the corresponding regularization parameters. For the detailed definitions of φ1, φ2 and B1, B2, see Appendix A. The indicator function ı on R+d is defined by

ı(x){0,ifxR+d,+,else.}

To simplify notation, we define m0 := m1 + m2, for z[xy]Rm0 with xRm1, yRm2,

φ(z)λ1φ1(x)+λ2φ2(y),andB[B1B2]. (4)

Then model (3) can be written in a compact form

arg minfRd{F(f)+φ(Bf)+ı(f)}. (5)

This is the model on which the proposed reconstruction algorithm is based.

B. Preconditioned Krasnoselskii-Mann Algorithm

We next characterize a solution of model (5) as a fixed-point of a mapping defined via the proximity operator of functions φ and ı. To this end, we let S+n denote the set of n × n symmetric positive definite matrices. For HS+n, the H-weighted inner product is defined by ⟨x, yH := ⟨x, Hy⟩ for x, yRn and the corresponding H-weighted 2-norm is defined by xHx,xH12. According to [28], for a convex function ψ:RnR, the proximity operator of ψ with respect to HS+n at xRn is defined by

proxψ,H(x)arg minuRn{12uxH2+ψ(u)}.

In particular, we use proxψ for proxψ,I. We let Γ0(Rn) denote the class of all proper lower semicontinuous convex functions defined on Rn and recall that the conjugate ψ* of ψ is defined by ψ(z)supxRn{z,xψ(x)}. Now we have the following fixed-point characterization of a solution of model (5).

Theorem 1: If fRd is a solution of model (5), then for any PS+d and QS+m0, there exists a vector hRm0 such that

f=proxı,P1(fP(F(f)+BTh)), (6)
h=proxφ,Q1(h+QBf). (7)

Conversely, if there exist PS+d, QS+m0, and hRm0 such that fRd satisfies equations (6) and (7), then f is a solution of model (5).

Proof: It can be verified that in (5), F:RdR is convex and differentiable with a Lipschitz continuous gradient, φΓ0(Rm0) and ıΓ0(Rd). Using [21, Th. 3.1], we conclude that this theorem holds true. ■

Note that the gradient of the fidelity term F is given by

F(f)=AT(1mgAf+γ). (8)

We shall develop a convergent efficient algorithm for solving model (5) from the fixed-point equations (6) and (7). To this end, we write equations (6) and (7) in a compact form. Define

v[fh]Rd+m0,r(v)F(f), (9)
T(v)[proxı,P1(f)proxφ,Q1(h)],E[IdPBTQBIm0],

where In denotes the n × n identity matrix, PS+d, QS+m0 are two introduced preconditioning matrices, and define

R[P00Q]. (10)

For the matrix ER(d+m0)×(d+m0) and the operator Rr:Rd+m0Rd+m0, we define ERr by (ERr)(v) := E(v)–Rr(v) for vRd+m0. Then (6) and (7) can be written as the fixed-point equation of T composed with ERr,

v=(T(ERr))(v).

This means that if v := [v1, v2, …, vd+m0] is a fixed-point of T ∘ (ERr), then f = [v1, v2, … , vd], the subvector of the first d components of v, is a solution of model (5). It was shown in [25] that E is expansive. The fixed-point iteration of T ∘ (ERr) may fail to yield a convergent sequence. To develop a convergent fixed-point algorithm, we choose matrix GR(d+m0)×(d+m0) by

G[IdPBTQBIm0] (11)

and proceed the fixed-point iteration

vk+1=T((EG)vk+1+(GRr)vk),kN0, (12)

where N0{0,1,}. The iteration (12) is in fact explicit since EG is strictly block lower triangular (in the block partition as E). We define TG:Rd+m0Rd+m0 by

TG:u{v:(u,v)satisfies thatv=T((EG)v+Gv)}, (13)

W := R−1G, and let

TW:=TG(IW1r), (14)

where I denotes the identity operator.

It is important to verify that TG is well-defined. To show this, we let u[u~1u~2] be a given vector, where u~1Rd and u~2Rm0. The implicit fixed-point equation (13) then can be written by

v~1=proxı,P1(u~1PBTu~2), (15)
v~2=proxφ,Q1(2QBv~1QBu~1+u~2). (16)

Here v~1[v1,v2,,vd]T, v~2[vd+1,vd+2,,vd+m0]T. Since u~1 and u~2 in equation (15) are given, from the definition of proximity operator, we know that the solution v~1 of (15) exists and is unique, which implies the existence and uniqueness of the solution v~2 of equation (16). This shows that for any given uRd+m0 in the equation contained in (13), there exists a unique solution v.

Using the definition of TW, iteration (12) can be rewritten as the fixed-point iteration

vk+1=TW(vk),kN0. (17)

We now comment on convergence of iteration (17). According to [21], ∇F is Lipschitz continuous if all components of γ are positive. Let L denote the Lipschitz constant of ∇F and λW the smallest eigenvalue of W. We will recall the definition of nonexpansive, firmly nonexpansive and averaged nonexpansive operators in Appendix B, and show in Lemma 7 that if λW>L2, then TW is ζ-averaged nonexpansive with respect to W, where ζ2λW4λWL(12,1). The KM theorem (Theorem 4 in Appendix B) implies that iteration (17) converges to a fixed-point of TW.

To accelerate the convergence speed of iteration (17), for α > 0, we define Tα(1α)I+αTW. Since Tα is αζ-averaged nonexpansive with respect to W, by the KM theorem, we know that the fixed-point iteration

vk+1=Tα(vk),kN0 (18)

converges to a fixed-point of TW if α(0,1ζ). We observe numerically that corresponding to a larger α(0,1ζ)(0,2), the fixed-point iteration (18) converges faster. This observation inspires us to choose α ∈ (1, 2) for (18). Iteration scheme (18) may be interpreted as an application of the momentum technique. To see this, we write

Tα(vk)=TW(vk)+(α1)(TW(vk)vk),kN0.

Clearly, Tα is an extension of TW by using a momentum technique. However, we found that iteration (18) is not robust for large α ∈ (1, 2). If the components of the initial vector v0 are too large, to guarantee convergence, the entries of the preconditioning matrix P should be set small, which leads to slow convergence. To overcome this obstacle and obtain a robust fixed-point iteration with momentum acceleration, we use the KM iteration by allowing the momentum parameter α to vary in each iteration. That is, we introduce a sequence of parameters, for given ϱ ∈ (−1, 1) and δ ≥ 0, by

αk1+ϱkk+δ,kN0, (19)

and construct a sequence of operators

Tαk(1+αk)I+αkTW,kN0. (20)

Given an initialization v(0)Rd+m0, we then proceed the KM based momentum scheme by

vk+1=Tαk(vk),kN0. (21)

We next show that the Nesterov momentum parameters are asymptotically equivalent to a special case of our proposed momentum parameters, defined by (19), by appropriately setting the sub-parameters ϱ and δ. Recall that the Nesterov momentum parameters are given by

t01,tk+11+1+4tk22,αk1+tk1tk+1,kN0. (22)

When k is sufficient large, {αk}kN0 in (22) then corresponds to {αk}kN0 in (19) with the sub-parameter choice ϱ = 1 and δ = 3, in the sense given by Proposition 2. Moreover, the absence of momentum (αk = 1 for all kN0) is a special case of our proposed momentum scheme with ϱ = 0, which is not the case for the Nesterov momentum since αk>1 for k ≥ 1. The generality of our proposed momentum scheme provides us a variety of different parameter choices for different scenarios. We now state Proposition 2, whose proof is provided in the supplementary materials. The supplementary materials are available in the supplementary files tab.

Proposition 2: Let {αk}kN0 be given by (19) with ρ = 1 and δ = 3, {tk}kN0 and {αk}kN0 be given by (22). Then {αk}kN0 and {αk}kN0 converge to the same value with the same convergence rate.

We now provide specific choice of preconditioning matrices P and Q for iteration (21). We let P := βS and Q := diag (ρ11m1, ρ21m2), where β, ρ1 and ρ2 are positive numbers, S is a d × d diagonal matrix with positive diagonal entries. In this case, it follows from the definition of proximity operator that for xRd,

proxı,P1(x)=proxı(x)=PR+d(x)max(x,0). (23)

The maxima in the above equation is taken component-wise. By using the well-known Moreau decomposition [29]

I=proxφ,Q1+Qproxφ,QQ1,

and the equation

proxφ,Q(z)=[proxλρ1φ1(x)proxλρ2φ2(y)]

for z:=[xy]Rm0 with xRm1, yRm2, we have

proxφ,Q1(z)=[ρ1(Iproxλρ1φ1)(1ρ1x)ρ2(Iproxλρ2φ2)(1ρ2y)].

Now the preconditioned KM algorithm (PKMA) for solving model (3) is given as follows.

f~k=PR+d(fkβS(F(fk)+B1Tbk+B2Tck))b~k+1=ρ1(Iproxλ1ρ1φ1)(1ρ1bk+B1(2f~k+1fk))c~k+1=ρ2(Iproxλ2ρ2φ2)(1ρ2ck+B2(2f~k+1fk))αk=1+ϱkk+δfk+1=(1αk)fk+αkf~k+1bk+1=(1αk)bk+αkb~k+1ck+1=(1αk)ck+αkc~k+1

One can find the explicit form of proxωφ1 (x) for xRm1 and proxωφ2 (x) for xRm2, where ω > 0, in Appendix A.

In the following theorem, we consider the convergence of PKMA. We let Smax denote the largest diagonal entry of the diagonal matrix S, p1βSmax and ξL2(1max{ϱ,0}).

Theorem 3: Let f0Rd, B0Rm1 and c0Rm2 be given vectors, {fk}kN0 be the sequence generated by PKMA, where ϱ ∈ (−1, 1), δ ≥ 0 and β, ρ1, ρ2 are positive in PKMA. For a given diagonal matrix SRd×d with positive diagonal entries, if β<1ξSmax, ρ1<pξ2B122+(pξ)ξ and ρ2<pξ2B222+(pξ)ξ, then {fk}kN0 converges to a solution of model (3).

Proof: By Theorem 8, to prove the theorem, it suffices to verify that λW > ξ. This is done in Lemma 9. ■

Note that in our proof, the preconditioner should be fixed in each iteration. However, we can update it during early iterations and fix it in later iterations to guarantee the convergence.

C. Improved EM Preconditioner

In this subsection, we propose an improved EM (IEM) preconditioner for PKMA to accelerate convergence. We begin by recalling the classical EM preconditioner. To avoid zero components in A1m, we define ΛRd as a vector such that

Λj={(AT1m)j,if(AT1m)j>0,1,otherwise,} (24)

for j = 1, 2, …, d. As shown in [21], the EM preconditioner

SEMdiag(fkΛ) (25)

performs better than the identity preconditioner Id and the diagonal normalization (DN) preconditioner

SDNdiag(1dΛ). (26)

However, when using the EM preconditioner, once some components of fk are equal to zero, these components will be stuck at zero thereafter, which results in holes in the reconstructed images. Low count data and ordered subsets type algorithms, are particularly susceptible to this problem. From Theorem 3, we know that to guarantee convergence of PKMA, the entries of the diagonal preconditioning matrix S should be positive. However, this may not be the case with the EM preconditioner since some components of fk may be zero. To guarantee the positivity of the preconditioner as well as accelerate convergence, we propose the following IEM preconditioner by including a true mean count (TMC) based thresholding and a good estimate f^ of the true solution f* in the EM preconditioner,

SIEM(f^)diag(max{η1d,f^,fk}Λ), (27)

where the positive constant η is set to 0.1 · TMC and

TMCACTcNPFOVNPA. (28)

Here ACTc, NPFOV, and NPA represent the total attenuation corrected true counts, the number of pixels within the field of view, and the number of projection angles, respectively.

In the definition of IEM preconditioner given by (27), our choice of the threshold constant 0.1 · TMC was based on TMC being the mean of the components of f*. In addition, as shown in Fig. 2, we empirically found that the inclusion of a good estimate of f* used in the preconditioner leads to faster convergence. This can be exploited by using a good inexpensive estimate of f* for f^ in (27). For example, we can use an image reconstructed by filtered backprojection fFBP as f^. Alternatively, if we set f^=0 in (27), then the IEM preconditioner reduces to the case considered in [30].

Fig. 2.

Fig. 2.

NOFV (left) and NRMSE (right) versus CPU time by PKMA with preconditioners SEM, SIEM(0), SIEM(f¯2), SIEM(f¯20) and SIEM(f¯200).

III. Numerical Results

In this section, we present several numerical results. First, we show the performance of different choices of f^ in the IEM preconditioner (27). Then, we compare the performance of different preconditioners and different momentum parameters for PKMA. To see the performance of PKMA with the IEM preconditioner (IEM-PKMA) for differentiable regularized PET image reconstruction model, we compare it with the existing OTDA and L-BFGS-B-PC for the SATV regularized reconstruction model. Following this, we present comparison of PKMA with two existing algorithms, PAPA and ADMM, suitable for nonsmooth penalties. Finally, we provide some initial 3D clinical results of a relaxed ordered subsets version of PKMA (ROS-PKMA).

A. Simulation Setup

We implemented the algorithms via Matlab through a 2D PET simulation model as described in [31]. The number of counts used in these 2D simulations were set to be equivalent to those from a 3D PET brain patient acquisition (administered 370 MBq FDG and imaged 1-hour post injection) collected from the central axial slice via GE D690/710 PET/CT. The resulting reference count distribution was used as the Poisson parameters for the noise realizations. An area integral projection method was used to build the projection matrix based on a cylindrical detector ring consisting of 576 detectors whose width are 4 mm. We set the FOV as 300 mm and use 288 projection angles to reconstruct a 256×256 image.

To simulate the physical factors that will affect the resolution of the reconstructed image, such as positron range, detector width, penetration, residual momentum of the positron and imperfect decoding, the phantom was convolved with an idealized (space-invariant, Gaussian) point spread function (PSF), which was set as a constant over the whole FOV. The full width half maximum (FWHM) of this PSF was set to 6.59 mm based on physical measurements from acceptance testing and [32]. The true count projection data was produced by forward-projecting the phantom convolved by the PSF. Uniform water attenuation (with the attenuation coefficient 0.096 cm−1) was simulated using the PET image support. The background noise was implemented as describe in [33] and was based on 25% scatter fraction and 25% random fraction, given by SF := Sc/(Tc + Sc) and RF:=Rc/(Tc+Sc+Rc), respectively, where Tc, Sc and Rc represent true, random, and scatter counts respectively. Scatter was added by forward-projecting a highly smoothed version of the images, which was added to the attenuated image sinogram scaled by the scatter fraction. Random counts were simulated by adding a uniform distribution to the true and scatter count distributions scaled by the random fraction. We call the summation of Tc, Sc, and Rc the total counts and denote it by TC. In our simulations, we set TC = 6.8 × 106 for the high-count data, and TC = 6.8 × 105 for the low-count data.

We next provide the figure-of-merits used for the comparisons. They include the normalized objective function value (NOFV), normalized root mean square error (NRMSE), normalized relative contrast (NRC) and central line profile (CLP). The NOFV is defined by

NOFV(fk)Φ(fk)ΦrefΦ0Φref,

where Φ denotes the objective function, Φ0 is the objective function value of the initial image, and Φref denotes the reference objective function value. For simulation results, we set Φref to the objective function value of the image reconstructed by 1000 iterations of IEM-PKMA. The NRMSE is defined by

NRMSE(fk)fkftrue2ftrue2,

where ftrueRd is the ground truth, and ∥ · ∥2 is the 2-norm defined by x2(i=1dxi2)12 for xRd. For the definition of NRC, we let ROIH be a region within a specific hot sphere, ROIB be a background region that is not close to any hot sphere and its size is the same as ROIH. Define the relative contrast (RC) by RCEROIHEROIBEROIB, where EROIH and EROIB represent the average activities of ROIH and ROIB, respectively. Then the normalized relative contrast is defined by

NRC(fk)RCfkRCtrue,

where RCfk and RCtrue are the relative contrast of fk and the ground truth respectively.

Two 256 × 256 numerical phantoms shown in Fig. 1 were used for our simulations. The brain phantom was obtained from a high quality clinical PET brain image. The uniform phantom consists of the uniform background with six uniform hot spheres with distinct radii 4, 6, 8, 10, 12, 14 pixels. The activity ratio of the uniform hot spheres to the uniform background is 4:1. In the simulation experiments, we show comparison of NOFV, NRMSE by reconstructing the brain phantom, and NRC, CLP by reconstructing the uniform phantom. All simulations were performed in a 64-bit windows 10 laptop with Intel Core i7-7500U Processor at 2.70 GHz, 8 GB of DDR4 memory and 256GB SATA SSD.

Fig. 1.

Fig. 1.

(a) Brain phantom: high quality clinical PET brain image. (b) Uniform phantom: uniform background with six uniform hot spheres of distinct radii.

We show the setting of the regularization parameters and the algorithmic parameters of PKMA in Table 1. For the reconstruction of the brain phantom, to suppress the staircase artifacts and avoid over-smoothed images, we empirically found that setting λ2 = λ1 was reasonable and simplified the search for optimal regularization parameters based on the minimum NRMSE. For the uniform phantom, the second-order TV regularization parameters λ2 was set to 0, due to its piecewise constant nature. The setting of ρ1 and ρ2 in PKMA were based on the convergence conditions and the fact that B1228, B22264 for the 2D case [34]. The parameters ϱ and δ in the momentum step were set to satisfy that ϱ ∈ (−1, 1) and δ ≥ 0. In addition, we denote by fUD the uniform disk TMC · 1disk with the same size as the FOV, where TMC is defined by (28) and 1disk is the image whose values are 1 within the disk and are 0 outside the disk.

TABLE I.

Regularization Parameters and Algorithmic Parameters for 2D Simulation

Regularization

parameters
Brain
phantom
High-count: λ1 = λ2 = 0.04
Low-count: λ1 = λ2 = 0.34
Uniform
phantom
High-count: λ1 = 0.4, λ2 = 0
Algoritdmic

parameters
PKMA β = 1, ρ1=12×8×Smax, ρ2=12×64×Smax ϱ = 0.9, δ = 0.1
PAPA β = 1, ρ1=12×8×Smax, ρ2=12×64×Smax
ADMM σ = τ = 0.1, μ = 1.2

B. Simulation Results for IEM-PKMA

1). Comparison of Preconditioners and Momentum Techniques:

In this subsection, we use high-count data to reconstruct the brain phantom for comparison of PKMA with different preconditioners and momentum techniques. The initial image for PKMA was set to fUD. We show in Fig. 2 the plots of NOFV and NRMSE versus CPU time for PKMA with the EM preconditioner and with four different IEM preconditioners: SIEM(0), SIEM(f¯2), SIEM(f¯20) and SIEM(f¯200), where f¯2, f¯20 and f¯200 represent the images reconstructed by EM-PKMA with 2, 20 and 200 iterations respectively. From these figures, we can see that as f^ in the IEM preconditioner (27) is made closer to the solution f*, the convergence of PKMA with the preconditioner SIEM(f^) improves. For all the experiments of Fig. 3-12, f^ in the IEM preconditioner was always set to fFBP.

Fig. 3.

Fig. 3.

NOFV (left) and NRMSE (right) versus CPU time by PKMA with IEM, EM and DN preconditioners, FPGA and IEM-PKMA with the Nesterov momentum parameters.

Fig. 12.

Fig. 12.

Reconstructed images of the uniform phantom by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk initialization using high-count data: top to bottom rows are reconstructed by 5, 10 and 100 iterations respectively.

In Fig. 3, we compare the results for three different preconditioners: IEM, EM and DN preconditioners, which are defined by (27), (25) and (26) respectively. FPGA and IEM-PKMA with the Nesterov momentum parameters (IEM-PKMA-NM) were also presented. There are two differences between FPGA [24] and IEM-PKMA for model (3), which are the choices of preconditioner and momentum parameters. Specifically, FPGA uses Id as the preconditioner, while IEM-PKMA uses the IEM preconditioner, and FPGA selects momentum parameters from Nesterov’s update, while we provide a more general form of momentum parameters given by (19) for IEM-PKMA. To compare the performance of our proposed KM based momentum scheme with Nesterov’s, we replace the momentum parameters of IEM-PKMA by (22), yielding a new method we refer to as IEM-PKMA-NM. Optimal step-size β was tuned for each of DN-PKMA (β = 0.3) and FPGA (β = 0.003) based on the performance of objective function value.

In Fig. 4, we present the normalized root mean square difference (NRMSD) fIEMkfDNk2ftrue2 between the images reconstructed by IEM-PKMA and DN-PKMA, as well as the reconstructed images with 5000 iterations. This figure shows that the use of two different positive definite preconditioners SIEM and SDN give the same converged images.

Fig. 4.

Fig. 4.

(a) NRMSD between the images reconstructed by IEM-PKMA and DN-PKMA versus iteration number. (b) Reconstructed brain images by IEM-PKMA (left) and DN-PKMA (right) with 5000 iterations.

2). Comparison of Algorithms for SATV Regularization:

In this subsection, we show how IEM-PKMA performs using a smoothed approximation of a first-order edge-preserving regularized model (smoothed anisotropic TV penalty) and compare it to two other state-of-the-art algorithms suitable for smooth penalties. The SATV regularized reconstruction model is given by

arg minfRd{F(f)+λj=1diNjϕθ(fjfi)}, (29)

where F is defined by (2), Nj consists of the indices of both left and up neighbor pixels of the jth pixel of image f, and ϕθ is the Lange function [35] defined by

ϕθ(t)tθln(1+tθ),θ>0.

If we let ϕθ be the absolute value function, then model (29) becomes the anisotropic TV regularized model. By defining R(f)j=1diNjϕθ(fjfi), IEM-PKMA for solving (29) can be given by

f~k+1=PR+d(fkβSIEM(F(fk)+λR(fk)))fk+1=(1αk)fk+αkf~k+1

where {αk}kN0 is given by (19). Two state-of-the-art algorithms including OTDA and L-BFGS-B-PC for the SATV regularized model (29) were used for comparison. OTDA is based on the surrogate function method and uses the conjugate direction to accelerate convergence, which has been shown to outperform PCG [8]. L-BFGS-B-PC is based on the quasi-Newton method and uses the diagonal of inverse Hessian matrix for preconditioning. These two algorithms perform well for a differentiable regularization model. However, they both need an additional forward- and back-projection for the line search step, which makes each iteration more time consuming.

Fig. 5 shows the performance of IEM-PKMA, OTDA and L-BFGS-PC for solving model (29) with θ = 0.001 (to ensure edge preservation). High-count data was used and the regularization parameter λ was set to 0.06. For both initialization and the preconditioners in IEM-PKMA and L-BFGS-PC, fFBP was used.

Fig. 5.

Fig. 5.

NOFV (left) and NRMSE (right) versus CPU time by IEM-PKMA, OTDA and L-BFGS-B-PC with fFBP initialization.

3). Comparison of Algorithms for HOTV Regularization:

In this subsection, we compare the performance of PKMA, PAPA and ADMM for the HOTV regularized model. For this purpose, we recall the iteration schemes of PAPA and ADMM. It follows from [3] that PAPA for solving model (3) can be written by

hk=PR+d(fkβSEM(F(fk)+B1Tbk+B2Tck))bk+1=ρ1(Iproxλ1ρ1φ1)(1ρ1bk+B1hk)ck+1=ρ2(Iproxλ2ρ2φ2)(1ρ2ck+B2hk)fk+1=PR+d(fkβSEM(F(fk)+B1Tbk+1+B2Tck+1))

According to [20], ADMM for solving model (3) consists of the following three steps:

{uk+1=arg minuRdφ(Bu)+μ2fku+qk22,fk+1=arg minfR+dF(f)+μ2fuk+1+qk22,qk+1=qk+fk+1uk+1.} (30)

Unlike in [20], the term φ in our model is not differentiable and the proximity operator of φB has no explicit form, requiring the first sub-problem of (30) to be solved via the first-order primal-dual algorithm (FOPDA) [36]. Here we perform five FOPDA sub-iterations in each complete ADMM iteration to guarantee the convergence. For the second sub-problem, we use the surrogate function strategy as described in [20].

Then we get the following ADMM iteration scheme:

bk,0=bk,ck,0=ck,uk,0=uk,u~k,0=u~kForl=1:5bk,l=σ(Iproxλ1σφ1)(1σbk,l1+B1u~k,l1)ck,l=σ(Iproxλ2σφ1)(1σck,l1+B2u~k,l1)uk,l=11+τμ(uk,l1τ(B1Tbk,l+B2Tck,l)+(τμ(fk+qk))u~k,l=2uk,luk,l1Endbk+1=bk,5,ck+1=ck,5,uk+1=uk,5,u~k+1=u~k,5w=AT1mμ(uk+1qk)v=diag{fk}AT=gAfk+γfk=1=w+w2+4μv2μqk+1=qk+fk+1uk+1

The choice of parameters in PAPA and ADMM for 2D simulation are shown in table 1. For PAPA, the parameters were chosen according to [3]. For ADMM, the parameters σ and τ were set by the convergence condition στ<1B22 of FOPDA [36], optimal μ was chosen empirically based on the performance of objective function value.

We first show the performance of these algorithms for the reconstruction of the brain phantom using high-count data. Two different initial images, including the uniform disk fUD and f^, were used for comparison. Here we set Φ0 = Φ(fUD) in the definition of NOFV for both uniform disk and f^ initialization. In Fig. 6, we show the NOFV and NRMSE versus CPU time by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk and f^ initialization. It shows that IEM-PKMA converges more rapidly than both PAPA and ADMM. The reconstructed brain images with 5, 10 and 100 iterations in Fig. 7 show that IEM-PKMA is able to obtain a reasonably good image very rapidly. We comment here that the results from the use of IEM preconditioner are more pronounced if a uniform image is used for initialization, and less so when initialized by f^ in the IEM preconditioner, though still show improvement.

Fig. 6.

Fig. 6.

NOFV (left) and NRMSE (right) versus CPU time by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk (top row) and f^ (bottom row) initialization using high-count data.

Fig. 7.

Fig. 7.

Reconstructed brain images by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk initialization using high-count data: top to bottom rows are reconstructed by 5, 10, and 100 iterations respectively.

To demonstrate the performance of IEM-PKMA for low-count data, we show in Fig. 8 comparisons of NOFV and NRMSE versus CPU time for these algorithms with uniform disk and f^ initialization. The reconstructed brain images using low-count data are shown in Fig. 9.

Fig. 8.

Fig. 8.

NOFV (left) and NRMSE (right) versus CPU time by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk (top row) and f^ (bottom row) initialization using low-count data.

Fig. 9.

Fig. 9.

Reconstructed brain images by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk initialization using low-count data: top to bottom rows are reconstructed by 5, 10 and 100 iterations respectively.

We next examine the performance of these algorithms for reconstructing the uniform phantom with uniform disk initialization and high-count data. Fig. 10 shows NRC versus CPU time for the largest and the smallest hot spheres of the uniform phantom. Comparisons of central line profiles of the images reconstructed by 5 and 10 iterations are shown in Fig. 11. The reconstructed images of the uniform phantom are shown in Fig. 12. These figures show that for the uniform phantom, IEM-PKMA outperforms the other algorithms.

Fig. 10.

Fig. 10.

NRC of the largest hot sphere (left) and the smallest hot sphere (right) versus CPU time by IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk initialization using high-count data.

Fig. 11.

Fig. 11.

CLP of the images reconstructed by 5 iterations (left) and 10 iterations (right) of IEM-PKMA, EM-PKMA, PAPA and ADMM with uniform disk initialization using high-count data.

C. Initial Clinical Results

In this subsection, we present some promising initial 3D clinical results that are based on a relaxed ordered subsets version, ROS-PKMA, according to [31] and [37]. The details of this algorithm and the parameters used are provided in the supplementary materials which are available in the supplementary files tab. We implemented ROS-PKMA, ROS-PAPA [31] and ROS-EM algorithms on a GE D690 PET/CT using a modified version of the GE PET Toolbox release 2.0. A brain scan of 52-year-old male with brain metastases was acquired 1 hour post-injection (370 MBq nominal) for 10 minutes. The images were reconstructed using time-of-flight (TOF) information with a 300 mm FOV using 256 × 256 matrix and an accurate model of the detector PSF (“sharpIR”).

Initial results shown in Fig. 13 appear to indicate that ROS-PKMA with 12 subsets can obtain even better images than both ROS-PAPA and ROS-EM (also with 12 subsets) using only half the iterations.

Fig. 13.

Fig. 13.

Reconstructed clinical brain images by ROS-PKMA, ROS-PAPA and ROS-EM with 12 subsets: top to bottom rows are reconstructed by 1, 2, 4 and 8 iterations respectively.

IV. Conclusion

This study presents an efficient, easily implemented and mathematically sound preconditioned Krasnoselskii-Mann algorithm for HOTV regularized PET image reconstruction. We prove that PKMA enjoys nice theoretical convergence in the case that the preconditioner is fixed after finite number of iterations. In addition, we show that our proposed generating function for the momentum parameters is more general than the one proposed by Nesterov, able to include both the momentum-free case and an asymptotically equivalent form of the Nesterov momentum parameters as special cases. An improved EM preconditioner that can avoid the reconstructed images being “stuck at zero,” was proposed for accelerating convergence. Numerical experiments demonstrate that the IEM preconditioner improves convergence speed more than the classical EM preconditioner, IEM-PKMA outperforms OTDA, L-BFGS-B-PC for the SATV regularized model, and PAPA, ADMM for the HOTV regularized model. Moreover, for clinical data, promising initial results indicate that ROS-PKMA may be able to obtain sufficiently converged images more rapidly than both ROS-PAPA and ROS-EM, but more research is necessary to properly evaluate these results.

Supplementary Material

Supplementary Materials

Acknowledgement

The authors are grateful to Dr. Guobao Wang and Dr. Jinyi Qi for providing their codes for OTDA, to Dr. Charles W. Stearns, Dr. Sangtae Ahn, Dr. Kris Thielemans and Yu-Jung Tsai for helpful discussions on implementation of L-BFGS-B-PC, and to an anonymous referee for bringing reference [30] to our attention. They are grateful to GE for providing C. R. Schmidtlein, through a research agreement with MSKCC, the PET toolbox for the clinical experiments.

This work was supported in part by the Special Project on High-performance Computing through the National Key R&D Program under Grant 2016YFB0200602, in part by the Natural Science Foundation of China under Grant 11771464, Grant 11601537, Grant 11471013, and Grant 11501584, in part by the Fundamental Research Funds for the Central Universities of China, and in part by the Imaging and Radiation Sciences subaward of the MSK Cancer Center Support Grant/Core Grant under Grant P30 CA008748.

Appendix A

We provide the definition of 2D first-order and second-order isotropic TV (ITV) and the explicit form of the corresponding functions’ proximity operator. For the 3D case, please refer to [3]. The first-order and second-order ITV can be written as φ1B1 and φ2B2 respectively. For the definition of B1 and B2, we let N=d, IN denote the N × N identity matrix, D denote the N × N backward difference matrix such that Dj,j = 1 and Dj,j–1 = −1 for j = 2, 3, … , N, and all other entries of D are zero. Through the matrix Kronecker product ⊗, B1R2d×d and B2R4d×d are defined, respectively, by

B1[INDDIN]B2[IN(DTD)(DT)D(DTD)IND(DT)].

For xR2d, define φ1(x)i=1dz1(i)2, where

z1(i)[xi,xd+i]T,i=1,2,,d. (31)

For xR4d, define φ2(x)i=1dz2(i)2, where

z2(i)[xi,xd+i,x2d+i,x3d+i]T,i=1,2,,d. (32)

Next, we provide explicit forms of the proximity operator of φ1 and φ2. Let ω be a positive number. As shown in [3], for xR2d, by denoting u := proxωφ1 (x), we have

ujd+i=(max{z1(i)2ω,0}z1(i)2)xjd+i

for j = 0, 1, i = 1, 2,…, d, where z1(i) is defined by (31). For xR4d, we denote v := proxωφ2(x). Then

vjd+i=(max{z2(i)2ω,0}z2(i)2)xjd+i

for j = 0, 1, 2, 3, i = 1, 2, … , d, where z2(i) is defined by (32).

Appendix B

In this appendix, we prove the convergence of PKMA by employing the KM theorem.

To recall the KM theorem, we first recall the definition of nonexpansive operator. Let HS+n. We say that T:RnRn is nonexpansive with respect to H if for any xRn, yRn, TxTyHxyH.

Theorem 4 (Krasnoselskii-Mann [38]-[40]): Let T:RnRn be a nonexpansive operator such that the set of its fixed points is non-empty. For {ωk}kN0(0,1) and x0Rn, define

xk+1=(1ωk)xk+ωkTxk,kN0. (33)

If k=0ωk(1ωk)=+, then {xk}kN0 converges to a fixed-point of T.

We shall employ Theorem 4 to prove the convergence of PKMA. For this purpose, we rewrite iteration (21) in the form of (33). To this end, we first prove that TW is averaged nonexpansive with respect to R−1 G. We recall the definition of firmly nonexpansive and averaged nonexpansive operators, and two related lemmas.

An operator T:RnRn is called firmly nonexpansive with respect to HS+n if TxTyH2TxTy,xyH for any xRn, yRn. If there exists a nonexpansive operator N:RnRn with respect to H such that T=(1α)I+αN, we say that T is α-averaged nonexpansive with respect to H. Firmly nonexpansiveness of an operator corresponds to its 12-averaged nonexpansiveness by [41, Remark 4.24].

Lemma 5 (Baillon-Haddad [41]): Let ψ:RnR be differentiable and convex, L be a positive real number. Thenψ is L-Lipschitz continuous if and only if for all x, yRn, ψ(x)ψ(y)22Lxy, ψ(x)ψ(y).

Lemma 6 (Combettes-Yamada [42]): Let HS+n, 0 < α1 < 1 and 0 < α2 < 1. If T1:RnRn is α1-averaged nonexpansive with respect to H, and T2:RnRn is α2-averaged nonexpansive with respect to H, then T1T2 is α1+α22α1α21α1α2-averaged nonexpansive with respect to H.

Notice that WR1G=[P1BTBQ1] is symmetric since P, Q are both symmetric. Therefore, WS+d+m0 if and only if its smallest eigenvalue λW > 0. Next we prove that TW is averaged nonexpansive with respect to W.

Lemma 7: Let TG, TW, r, R and G be defined by (13), (14), (9), (10) and (11) respectively, and ζ2λW4λWL. If λW>L2, then TW is ζ-averaged nonexpansive with respect to W.

Proof: λW>L2>0 gives that WS+d+m0 and

W1222=λmax(W1)=λW1, (34)

where λmax(W) denotes the largest eigenvalue of W. [24, Lemma 3.2] shows that TG is firmly nonexpansive with respect to W. Hence it is 12-averaged nonexpansive with respect to W. According to Lemma 6 and the definition of TW, to prove that TW is averaged nonexpansive with respect to W, it suffices to show that IW1r is averaged nonexpansive with respect to W.

Define αL2λW and NI1αW1r. Then 0 < α < 1 and IW1r=(1α)I+αN. It is easy to verify that r is convex and differentiable with an L-Lipschitz continuous gradient. For w, vRd+m0, let z := ∇r (w) – ∇r (v). By (34), Lemma 5 and by α=L2λW, we have that

1α2W12z221α2W1222z222αwv,z.

Hence

NwNvW2=(wv)1αW1zW2=wvW2+1α2W12z222αwv,zwvW2,

which implies that N is nonexpansive with respect to W. Thus IW1r is α-averaged nonexpansive with respect to W. Therefore, TW=TG(IW1r) is ζ-averaged nonexpansive with respect to W by Lemma 6. ■

Now we prove the convergence of iteration (21).

Theorem 8: Let TW, αk, Tαk be defined by (14), (19) and (20) respectively, where ϱ ∈ (−1, 1) and δ ≥ 0. For given v0Rd+m0, let {vk}kN0 be a sequence generated by iteration (21), and fk[v1k,v2k,,vdk]T, kN0. If λW>L2(1max{ϱ,0}), then {fk}kN0 converges to a solution of model (5).

Proof: We shall employ Theorem 4 to prove this theorem. To this end, we define ζ2λW4λWL, and show below that there exists a nonexpansive operator N with respect to W such that

Tαk=(1αkζ)I+αkζN,kN0, (35)

and

αkζ(0,1),k=0αkζ(1αkζ)=+. (36)

It follows from Lemma 7 that TW is ζ-averaged nonexpansive with respect to W. By the definition of averaged nonexpansiveness, there exists a nonexpansive operator N with respect to W such that TW=(1ζ)I+ζN. Substituting this equation into the definition (20) of Tαk, we obtain (35).

It remains to verify that αkζ satisfies (36) for kN0. If −1 < ϱ < 0, then λW>L2 and 1 + ϱαk ≤ 1, kN0. Hence 12<ζ<1 and 0<1+ϱ2<αkζζ<1, kN0. In this case, k=0αkζ(1αkζ)>k=11+ϱ2(1ζ)=+. If 0 ≤ ϱ < 1, then λW>L2(1ϱ) and 1 ≤ αk ≤ 1 + ϱ, kN0. Hence 12<ζ<11+ϱ and 12<αkζ<1, kN0. Given ζ, 0 ≤ ϱ < 1 implies that there exists ϱ′ such that ϱ < ϱ′ < 1 and 12<ζ<11+ϱ. Thus αkζ<1+ϱ1+ϱ, and then k=0αkζ(1αkζ)>k=012(11+ϱ1+ϱ)=+.

Therefore, by Theorem 4, {vk}kN0 converges to a fixed-point of TW. By Theorem 1, we conclude that {fk}kN0 converges to a solution of model (5). ■

For specific choice of the preconditioning matrices P and Q, we have the following lemma for proving the convergence of PKMA.

Lemma 9: Let β, ρ1, ρ2 and ξ be positive numbers, SRd×d be a diagonal matrix with positive diagonal entries, P := βS, Q := diag(ρ11m1, ρ21m2), p1βSmax, B[B1B2], and W[P1BTBQ1]. If β<1ξSmax, ρ1<pξ2B122+(pξ)ξ and ρ2<pξ2B222+(pξ)ξ, then λW > ξ.

Proof: Clearly, W is symmetric. To prove that λW > ξ, it suffices to show that

WξId+m0=[P1ξIdBTBQ1ξIm0]

is positive definite. Let t1(1ρ1ξ)1, t2(1ρ2ξ)1. If β, ρ1 and ρ2 satisfy the conditions in this lemma, then pξ > 0, 0<t1<pξ2B122, 0<t2<pξ2B222, and moreover, P−1ξId and Q−1ξIm0 are both positive definite. Define

B~(Q1ξIm0)12B(P1ξId)12.

Then

B~=[t1B1(P1ξId)12t2B2(P1ξId)12]

by the definitions of Q and B. It follows from [25, Lemma 6.2] that WξId+m0 is positive definite if and only if B~2<1, which we verify below. For any E1Rm1×d and E2Rm2×d, by the definition of matrix 2 norm, we have

[E1E2]22=maxx2=1[E1xE2x]22maxx2=1E1x22+maxy2=1E2y22=E122+E222.

Thus

B~22t1B1(P1ξId)1222+t2B2(P1ξId)1222t1B122(pξ)1+t2B222(pξ)1<12+12=1,

which completes the proof.

Footnotes

This article has supplementary downloadable material available at http://ieeexplore.ieee.org, provided by the author.

Contributor Information

Yizun Lin, School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China.

C. Ross Schmidtlein, Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065 USA.

Qia Li, School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510275, China.

Si Li, School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China.

Yuesheng Xu, Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529 USA, and also with the Guangdong Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou 510275, China.

References

  • [1].Rudin LI, Osher S, and Fatemi E, “Nonlinear total variation based noise removal algorithms,” Phys. D, Nonlinear Phenomena, vol. 60, nos. 1–4, pp. 259–268, 1992. [Google Scholar]
  • [2].Bredies K, Kunisch K, and Pock T, “Total generalized variation,” SIAM J. Imag. Sci, vol. 3, no. 3, pp. 492–526, 2010. [Google Scholar]
  • [3].Li S et al. , “Effective noise-suppressed and artifact-reduced reconstruction of SPECT data using a preconditioned alternating projection algorithm,” Med. Phys, vol. 42, no. 8, pp. 4872–4887, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Fessler JA and Booth SD, “Conjugate-gradient preconditioning methods for shift-variant PET image reconstruction,” IEEE Trans. Image Process, vol. 8, no. 5, pp. 688–699, May 1999. [DOI] [PubMed] [Google Scholar]
  • [5].Yu DF and Fessler JA, “Edge-preserving tomographic reconstruction with nonlocal regularization,” IEEE Trans. Med. Imag, vol. 21, no. 2, pp. 159–173, February 2002. [DOI] [PubMed] [Google Scholar]
  • [6].Wang G and Qi J, “Penalized likelihood PET image reconstruction using patch-based edge-preserving regularization,” IEEE Trans. Med. Imag, vol. 31, no. 12, pp. 2194–2204, December 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Tsai Y-J et al. , “Fast quasi-Newton algorithms for penalized reconstruction in emission tomography and further improvements via preconditioning,” IEEE Trans. Med. Imag, vol. 37, no. 4, pp. 1000–1010, April 2018. [DOI] [PubMed] [Google Scholar]
  • [8].Wang G and Qi J, “Edge-preserving PET image reconstruction using trust optimization transfer,” IEEE Trans. Med. Imag, vol. 34, no. 4, pp. 930–939, April 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Panin VY, Zeng GL, and Gullberg GT, “Total variation regulated EM algorithm,” IEEE Trans. Nucl. Sci, vol. 46, no. 6, pp. 2202–2210, December 1999. [Google Scholar]
  • [10].Jonsson E, Huang S-C, and Chan T, “Total variation regularization in positron emission tomography,” Univ. California, Los Angeles, Los Angeles, CA, USA, CAM-Rep. 98-48, 1998, pp. 1–25. [Google Scholar]
  • [11].Sawatzky A, Brune C, Wubbeling F, Kosters T, Schafers K, and Burger M, “Accurate EM-TV algorithm in PET with low SNR,” in Proc. IEEE Nucl. Sci. Symp. Conf. Rec., October 2008, pp. 5133–5137. [Google Scholar]
  • [12].Bardsley JM, “An efficient computational method for total variation-penalized Poisson likelihood estimation,” Inverse Problems Imag, vol. 2, no. 2, pp. 167–185, 2008. [Google Scholar]
  • [13].Bardsley JM and Luttman A, “Total variation-penalized Poisson likelihood estimation for ill-posed problems,” Adv. Comput. Math, vol. 31, nos. 1–3, p. 35, 2009. [Google Scholar]
  • [14].Bardsley JM and Goldes J, “Regularization parameter selection and an efficient algorithm for total variation-regularized positron emission tomography,” Numer. Algorithms, vol. 57, no. 2, pp. 255–271, 2011. [Google Scholar]
  • [15].Chaux C, Pesquet J-C, and Pustelnik N, “Nested iterative algorithms for convex constrained image recovery problems,” SIAM J. Imag. Sci, vol. 2, no. 2, pp. 730–762, 2009. [Google Scholar]
  • [16].Bonettini S and Ruggiero V, “An alternating extragradient method for total variation-based image restoration from Poisson data,” Inverse Problems, vol. 27, no. 9, 2011, Art. no. 095001. [Google Scholar]
  • [17].Sidky EY, Jørgensen JH, and Pan X, “Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle–Pock algorithm,” Phys. Med. Biol, vol. 57, no. 10, p. 3065, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Komodakis N and Pesquet J-C, “Playing with Duality: An overview of recent primal–dual approaches for solving large-scale optimization problems,” IEEE Signal Process. Mag, vol. 32, no. 6, pp. 31–54, November 2015. [Google Scholar]
  • [19].Boyd S, Parikh N, Chu E, Peleato B, and Eckstein J, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn, vol. 3, no. 1, pp. 1–122, January 2011. [Google Scholar]
  • [20].Chun SY, Dewaraja YK, and Fessler JA, “Alternating direction method of multiplier for tomography with nonlocal regularizers,” IEEE Trans. Med. Imag, vol. 33, no. 10, pp. 1960–1968, October 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Krol A, Li S, Shen L, and Xu Y, “Preconditioned alternating projection algorithms for maximum a posteriori ECT reconstruction,” Inverse Problems, vol. 28, no. 11, 2012, Art. no. 115005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Wu Z, Li S, Zeng X, Xu Y, and Krol A, “Reducing staircasing artifacts in SPECT reconstruction by an infimal convolution regularization,” J. Comput. Math, vol. 34, no. 6, pp. 626–647, 2016. [Google Scholar]
  • [23].Pock T and Chambolle A, “Diagonal preconditioning for first order primal-dual algorithms in convex optimization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), November 2011, pp. 1762–1769. [Google Scholar]
  • [24].Li Q and Zhang N, “Fast proximity-gradient algorithms for structured convex optimization problems,” Appl. Comput. Harmon. Anal, vol. 41, no. 2, pp. 491–517, 2016. [Google Scholar]
  • [25].Li Q, Shen L, Xu Y, and Zhang N, “Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing,” Adv. Comput. Math, vol. 41, no. 2, pp. 387–422, 2015. [Google Scholar]
  • [26].Nesterov YE, “A method of solving a convex programming problem with convergence rate O(1/k2),” Sov. Math. Dokl, vol. 27, no. 2, pp. 372–376, 1983. [Google Scholar]
  • [27].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci, vol. 2, no. 1, pp. 183–202, 2009. [Google Scholar]
  • [28].Moreau JJ, “Proximité et dualité dans un espace Hilbertien,” Bull. Soc. Math. France, vol. 93, no. 2, pp. 273–299, 1965. [Google Scholar]
  • [29].Moreau JJ, “Fonctions convexes duales et points proximaux dans un espace Hilbertien,” C. R. Acad. Sci. Paris A, Math, vol. 255, pp. 2897–2899, December 1962. [Google Scholar]
  • [30].Mumcuoglu EU, Leahy R, Cherry SR, and Zhou Z, “Fast gradient-based methods for Bayesian reconstruction of transmission and emission PET images,” IEEE Trans. Med. Imag, vol. 13, no. 4, pp. 687–701, December 1994. [DOI] [PubMed] [Google Scholar]
  • [31].Schmidtlein CR et al. , “Relaxed ordered subset preconditioned alternating projection algorithm for PET reconstruction with automated penalty weight selection,” Med. Phys, vol. 44, no. 8, pp. 4083–4097, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Moses WW, “Fundamental limits of spatial resolution in PET,” Nucl. Instrum. Methods Phys. Res. A, Accel. Spectrom. Detect. Assoc. Equip, vol. 648, pp. S236–S240, August 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Berthon B et al. , “PETSTEP: Generation of synthetic PET lesions for fast evaluation of segmentation methods,” Phys. Med, vol. 31, no. 8, pp. 969–980, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Micchelli CA, Shen L, and Xu Y, “Proximity algorithms for image models: Denoising,” Inverse Problems, vol. 27, no. 4, 2011, Art. no. 045009. [Google Scholar]
  • [35].Lange K, “Convergence of EM image reconstruction algorithms with Gibbs smoothing,” IEEE Trans. Med. Imag, vol. 9, no. 4, pp. 439–446, December 1990. [DOI] [PubMed] [Google Scholar]
  • [36].Chambolle A and Pock T, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imag. Vis, vol. 40, no. 1, pp. 120–145, 2011. [Google Scholar]
  • [37].Kim D, Ramani S, and Fessler JA, “Combining ordered subsets and momentum for accelerated X-ray CT image reconstruction,” IEEE Trans. Med. Imag, vol. 34, no. 1, pp. 167–178, January 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Krasnosel’skii MA, “Two remarks on the method of successive approximations,” Uspekhi Matematicheskikh Nauk, vol. 10, no. 1, pp. 123–127, 1955. [Google Scholar]
  • [39].Mann WR, “Mean value methods in iteration,” Proc. Amer. Math. Soc, vol. 4, no. 3, pp. 506–510, 1953. [Google Scholar]
  • [40].Bauschke HH and Combettes PL, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. New York, NY, USA: Springer, 2011. [Google Scholar]
  • [41].Baillon J-B and Haddad G, “Quelques propriétés des opérateurs anglebornés etn-cycliquement monotones,” Isr. J. Math, vol. 26, no. 2, pp. 137–150, 1977. [Google Scholar]
  • [42].Combettes PL and Yamada I, “Compositions and convex combinations of averaged nonexpansive operators,” J. Math. Anal. Appl, vol. 425, no. 1, pp. 55–70, 2015. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials

RESOURCES