Abstract
Medical image reconstruction with pre-trained score-based generative models (SGMs) has advantages over other existing state-of-the-art deep-learned reconstruction methods, including improved resilience to different scanner setups and advanced image distribution modeling. SGM-based reconstruction has recently been applied to simulated positron emission tomography (PET) datasets, showing improved contrast recovery for out-of-distribution lesions relative to the state-of-the-art. However, existing methods for SGM-based reconstruction from PET data suffer from slow reconstruction, burdensome hyperparameter tuning and slice inconsistency effects (in 3D). In this work, we propose a practical methodology for fully 3D reconstruction that accelerates reconstruction and reduces the number of critical hyperparameters by matching the likelihood of an SGM’s reverse diffusion process to a current iterate of the maximum-likelihood expectation maximization algorithm. Using the example of low-count reconstruction from simulated [
F]DPA-714 datasets, we show our methodology can match or improve on the NRMSE and SSIM of existing state-of-the-art SGM-based PET reconstruction while reducing reconstruction time and the need for hyperparameter tuning. We evaluate our methodology against state-of-the-art supervised and conventional reconstruction algorithms. Finally, we demonstrate a first-ever implementation of SGM-based reconstruction for real 3D PET data, specifically [
F]DPA-714 data, where we integrate perpendicular pre-trained SGMs to eliminate slice inconsistency issues.
Keywords: Score-based generative modeling, image reconstruction algorithms, positron emission tomography
I. Introduction
Positron emission tomography (PET) is a nuclear medicine imaging technique used widely in clinical practice and research to image functional processes within the body [1]. PET scans involve exposure to ionizing radiation from injecting a radioactive tracer, and this can be reduced by reducing the radioactive counts administered. However, low-count data suffers from high levels of Poisson noise, leading to visually noisy images when conventional model-based reconstruction methods are used [2]. Deep learning methods have been proposed to compensate for the poor signal-to-noise ratio in measured low-count data [3], [4].
Most work in deep-learned PET reconstruction utilizes supervised deep learning, where a mapping is directly learned from low-dose PET data (e.g. sinograms) to high-quality images, either with [5], [6] or without [7], [8] advance knowledge of the fixed PET forward model.
A recent trend in medical image reconstruction is to leverage a score-based generative model (SGM) that has been pre-trained on a relevant image dataset as an unsupervised prior [9]. To perform unsupervised SGM-based reconstruction, the generative steps of the SGM are interleaved with reconstruction steps to encourage consistency between the generated image and the measured data [10]. For 3D reconstruction, an SGM is typically pre-trained on diverse 2D transverse slices [11], and the score-based prior is applied to these planes, while conventional regularization ensures slice consistency in the axial direction [11], [12].
Unlike supervised reconstruction methods, unsupervised SGM-based reconstruction only needs unpaired high-quality images for training, decoupling scanner-specific factors and improving generalizability [10]. This simplifies training and allows greater flexibility at inference with varied dose levels and scanner parameters, though it may be less task-specific than supervised learning.
Some existing work has shown promise for SGM-based reconstruction of PET data [12], [13], [14]. However, existing SGM-based reconstruction methods suffer from issues including long reconstruction times and the need for burdensome hyperparameter tuning [9], [15]. In 3D, methods also suffer from inconsistency or blurring between axial slices (due to only applying the score-based prior in the transverse planes) [12].
In this work, we propose a likelihood-scheduling mechanism for SGM-based reconstruction to address the aforementioned issues of burdensome hyperparameter tuning, slice inconsistency and slow reconstruction. Our method first runs the maximum likelihood expectation maximization algorithm (MLEM) to generate a “likelihood schedule” for a given set of measured sinogram data (see Fig. 1). The likelihood schedule is then integrated into the reverse diffusion process of an SGM-based reconstruction, enabling dynamic adjustment of the balance between the prior and the likelihood contributions. By integrating this development with SGMs trained on perpendicular slice orientations [16], our method eliminates the slice inconsistency issue while reducing the number of critical regularization hyperparameters from 4 to 1.
Fig. 1.

Our proposed likelihood-scheduling methodology for SGM-based PET image reconstruction.
Previous methods have implicitly altered the balance between likelihood and prior via regularization hyperparameters; our proposal is the first to investigate choosing the target likelihood upfront, providing samples from the posterior distribution of image reconstruction conditioned on both a likelihood value and noisy measured data.
We conduct numerical experiments to validate our method’s effectiveness on low-count PET data, using the example of simulated 2D [18F]DPA-714 radiotracer distributions, and evaluate its performance against state-of-the-art conventional, supervised, and unsupervised SGM-based reconstruction algorithms. We then extend to the 3D case, showing quantitative and qualitative results for real fully 3D PET reconstruction.
This work makes the following contributions:
-
•
We propose a principled and efficient mechanism for dynamically balancing SGM denoising steps with likelihood update steps for image reconstruction. Our methodology enables direct sampling of possible reconstructions at a fixed likelihood value.
-
•
We show our method has a significantly lower hyperparameter selection burden than the state-of-the-art for unsupervised SGM-based reconstruction without compromising reconstruction accuracy.
-
•
We resolve slice inconsistency issues in 3D by integrating our method with perpendicular pre-trained 2D SGMs and demonstrate the first-ever fully 3D PET reconstructions from real data using SGM-based reconstruction, specifically from low-count data acquired with radiotracer [18F]DPA714.
II. Background
A. PET Reconstruction
Reconstructing an image from PET emission data is an inverse problem [4]. The true mean q of noisy measurements m (e.g. a sinogram) may be modeled as
![]() |
where x represents the true radiotracer distribution, A represents our system model and b models scatter and randoms components. The system model A includes the image space point spread function (PSF), projection between image and sinogram space as well as attenuation and normalization modeling.
PET emission data is generated as a set of random discrete emissions from radionuclides, and therefore follows a Poisson noise model. MLEM [17] is a convergent iterative algorithm that maximizes the Poisson log-likelihood (PLL) of emission data with respect to an image estimate, given by
![]() |
However, with low-count data and a high-dimensional x, the maximum likelihood estimate overfits to noisy measurement data. It is standard to compensate for this reduction in signal by conditioning on an image-based prior, thereby regularizing the reconstruction, via algorithms such as maximum a posteriori-expectation maximization (MAP-EM) [18]. For this purpose, let
be the prior probability density for an image x.
Such algorithms may be accelerated by partitioning sinograms into subsets and seeking the maxima of a set of corresponding sub-objectives, e.g. leading to Ordered-Subset Expectation Maximization (OSEM) [19] and Block-Sequential Regularization Expectation Maximization (BSREM) [20] for MLEM and MAP-EM respectively.
B. Score-Based Generative Models (SGMs)
Score-based generative modeling is a generative deep learning framework that enables state-of-the-art modeling and sampling from the probability distribution
of a set of images [21], [22], [23], [24]. SGMs work by reversing a diffusion stochastic differential equation (SDE) that maps the initial distribution
to a known distribution. In this paper, we consider the variance-preserving Itô SDE that maps to the known distribution of Gaussian noise [23]
![]() |
where
is a stochastic process indexed by time t and
is a standard Wiener process (multivariate Brownian motion). For each t,
has associated density
. The function
is chosen such that
(in this paper we fix
).
Anderson [25] gives the corresponding reverse-time SDE as
![]() |
where
is the time-reversed Wiener process and the term
is the score function. To computationally simulate the reverse SGM, we train a noise-level-dependent neural network
, parameterized by
, to approximate the score function. This is achieved with Denoising Score Matching (DSM) [26], yielding the optimization problem
![]() |
Sampling from this generative model begins with sampling
. We then use the learned score model
as a surrogate for the score function
, and simulate the resulting reverse SDE backwards in time (with a numerical solution such as Euler-Marayama schemes or predictor-corrector methods [24]), starting from
.
DDIMs (Denoising Diffusion Implicit Models) [24] were introduced to allow faster sampling by reducing the necessity to simulate the SDE with a fine time-grid to produce high-quality samples. DDIMs utilize Tweedie’s estimate [27] of the expectation
using the score model as
![]() |
where positive scalars
and
are coefficients that may be derived from
(see Singh et al. [12] for details). DDIM uses Tweedie’s estimate and the current iterate
to accelerate sampling with a non-Markovian sampling update rule
![]() |
where
is stochasticity (fixed at 0.1 for this work).
C. PET Reconstruction With SGMs
To solve the PET reconstruction problem with an SGM, we simulate the reverse diffusion process with an approximation of the conditional score
, allowing us to sample from
. To approximate
, we decompose into prior and likelihood terms by Bayes’ law as
![]() |
and approximate the second term
.
While direct approximations to
have been investigated [12], [28], Singh et al. find these too inefficient or inaccurate for 3D PET reconstruction [12]. Several works [28], [29] have instead modified the DDIM sampling rule (7) for conditional generation, implicitly approximating the conditional likelihood by enforcing data consistency on Tweedie’s estimate. These approaches calculate Tweedie’s estimate
of the fully-denoised sample
, update
with an iterative data consistency scheme, and then add back Gaussian noise according to the DDIM update rule (7).
III. Related Work
Following seminal works by Chung and Yu [10], [11], the integration of an image prior learned by SGMs with image reconstruction has proved effective in different medical imaging modalities - for a review see Webber & Reader [9].
Singh et al. [12] were the first to show simulated results for PET reconstruction using SGMs, demonstrating improved metrics of image quality and the ability to better recover simulated lesions from 3D phantom fluorodeoxyglucose (FDG) PET scans. Xie et al. [30] considered joint reconstruction of PET-MR data utilizing a dual-domain diffusion process to show improvements over supervised learning methods on 2D sinograms simulated from real FDG PET images.
Recently, Hu et al. [13] showed results for unsupervised SGM-based reconstruction with simulated ultra-low dose FDG PET, outperforming conventional model-based iterative reconstruction (MBIR) methods.
At present, the only method shown in 3D for PET is Singh et al.’s [12] approach applying a pre-trained SGM to parallel axial slices, and a relative difference prior (RDP) to encourage consistency in the transverse direction. A general method to decompose 3D SGM-based reconstruction into multiple perpendicular 2D reconstruction problems has been proposed by Lee et al. [16].
A. Motivation
Singh et al. [12] propose PET-DDS, an adaptation of Decomposed Diffusion Sampling (DDS) [28] to the case of non-negative PET images with high dynamic range.
PET-DDS uses a modified DDIM sampling rule (see Section II-C), enforcing data consistency on Tweedie’s estimate via gradient ascent steps on a MAP proximal objective. This objective has three terms: the PLL
for subset j, an anchor term to prevent straying too far from the diffusion output and an axial relative difference prior (RDP) for 3D reconstruction.
When implementing PET-DDS, we empirically found that when reconstructing from
fewer counts than Singh et al., our log-likelihood gradients were large enough to prevent convergence of the proximal update. Therefore, we found it necessary to introduce the hyperparameter
to relax the rate of gradient ascent towards the reconstruction objective.
PET-DDS is a principled methodology that delivers high-quality reconstructions, but it has a number of shortcomings. PET-DDS has many hyperparameters to optimize, including: strength of MAP regularization
; number of MAP iterations per generative step
; gradient ascent step size
; strength of RDP regularization
; and, number of BSREM subsets
. The first three of these depend on the time discretization and number of counts in the measured data. Ideally, consistent hyperparameters across time discretizations would simplify hyperparameter tuning and support performing either fine- or coarse-grained reconstructions.
Furthermore, when used in practice, convergence to PET-DDS’s proximal objective is not achieved, and so the hyperparameters
,
and
primarily act as proxies for the balance between the likelihood and the prior.
Additionally, using a constant likelihood strength across generative steps may be computationally inefficient. It is clear that likelihood steps early in the reverse diffusion process are less impactful than those later in the process, as the random noise added has more of an information-removal effect. This motivates varying the strength of likelihood update at different generative steps for efficiency or quality improvements.
Lastly, existing methods for 3D reconstruction such as PET-DDS utilize a pre-trained SGM applied to axial slices through the reconstruction volume. This necessitates the inclusion of the transverse RDP, which has a smoothing effect on the reconstruction that causes an undesirable loss of detail.
IV. Proposed Approach
A. Problem Formulation
Suppose D is the probability distribution over images learned by a pre-trained SGM. Let c be a real scalar. Then, we seek to solve the following problem:
![]() |
This problem formulation may be viewed as sampling from a manifold of fixed likelihood images (see Fig. 2), as weighted by their probability under the learned prior distribution.
Fig. 2.
Explanatory figure showing the prior and likelihood values of reconstruction iterates for MLEM versus our method. The degeneracy of PLL means that there exists a manifold of equal likelihood images that we may sample according to the prior probability density learned by the SGM.
For this problem to be meaningful, c should be chosen such that clinically-relevant images exist with log-likelihood equal to c. In this work, we choose c by computing a clinically-relevant image
with the MLEM algorithm, and setting
. This is a desirable selection, as solving the above problem would lead to sampled images that are equally consistent with measured sinograms as MLEM images, but without the issues of early-terminated MLEM (chiefly a lack of resolved detail).
B. Likelihood-Scheduling for SGM-Based Reconstruction
We propose to solve the problem defined in Section IV-A with a dynamic data consistency update that matches the likelihood of reconstruction iterates at each reverse diffusion step to a pre-computed ‘likelihood schedule’. For a visual explanation, see Fig. 1.
Firstly, we perform an MLEM reconstruction from sinogram data and record the PLL values for each image iterate (using A). We then linearly interpolate the PLL values into an
-valued ‘likelihood schedule’, where
is the number of generative denoising steps used in an SGM-based reconstruction.
Then, for the
generative step, we first perform a single Tweedie denoising step, which we normalize with Singh et al.’s measurement-normalization procedure [12]. Next, we perform gradient ascent on Tweedie’s estimate until the log-likelihood of our estimate exceeds the
value in our likelihood schedule. This process occurs in pixel-space, utilizing A and
. We then reapply noise to return a noisy iterate in accordance with the DDIM sampling rule (7).
The number of gradient ascent steps used in our algorithm is dynamic. The final reconstruction will have a PLL value within one gradient ascent step of a conventional MLEM reconstruction’s PLL value.
This method has just two critical hyperparameters to tune: 1)
the number of MLEM iterations used to determine the maximum end-point PLL value and, 2) the SGM’s time discretization, i.e.
the number of generative steps used in the reverse diffusion process. Crucially,
controls the relative balance of the prior and the likelihood (with larger
giving greater emphasis to the likelihood), while
independently controls the number of diffusion timesteps used (and thereby trades off reconstruction speed against accuracy).
Furthermore, it is easier to tune the critical hyperparameter
, as it may be set to the number of EM updates for standard clinical reconstructions. All unsupervised PET reconstruction approaches involve heuristic regularization hyperparameters; a particular strength of our approach is to align this heuristic with an existing clinically-accepted and vendor-recommended heuristic, i.e.
.
The proposed approach is also more flexible to differing numbers of generative timesteps, as a likelihood schedule may be produced and adhered to for any number of generative timesteps. While hyperparameters such as the gradient ascent step size
may still be specified, within reasonable bounds
only controls the accuracy with which the likelihood schedule is conformed to, and not the balance of likelihood and prior. This is fundamentally different from gradient-ascent scheduling approaches such as linear annealing, which do not reduce the number of hyperparameters required.
C. Adaptations to 3D
To adapt our method to 3D, one could incorporate the axial-only RDP utilized by PET-DDS by replacing the likelihood schedule with an analogous “objective schedule” consisting of the sum of the likelihood term and the RDP term. However, as discussed, the axial RDP causes undesirable blurring, particularly for sagittal and coronal slices.
We instead take inspiration from the approach of Lee et al. [16], by leveraging pre-trained SGMs trained on slices from orthogonal orientations. Namely, we pre-train three SGMs
on diverse high-quality slices in the sagittal, coronal and transverse orientations respectively. During reconstruction, we apply each SGM to slices in its respective orientation and calculate the score
as the average of the score vectors output by the pre-trained SGMs. Note that this methodology differs from Lee et al. in that for each generative timestep, the sum of scores from 3 perpendicular SGMs is used rather than alternating the choice of score between 2 perpendicular SGMs. Alternating between directions leads to Tweedie’s estimate iterates with slice inconsistency effects, which are eliminated gradually as many diffusion steps are taken. Our different approach was hence motivated by the desire to reduce the number of diffusion timesteps (due to the expense of the likelihood updates) while eliminating slice inconsistency effects throughout the diffusion process.
To accelerate our method in 3D, we take larger gradient ascent steps. However, this can potentially cause less accurate matching to the likelihood schedule. To counter this imprecision, where an image iterate’s PLL overshoots the target PLL, we linearly interpolate between the penultimate and current iterates (using the penultimate and current PLL) to yield a final iterate that better matches the target PLL.
V. Experimental Setup
A. Baseline Methods
We implemented PET-DDS and our proposed methodology with the same forward model as three baseline methods:
1). OSEM:
As discussed in Section II-A, OSEM [19] is a model-based iterative method widely used on clinical scanners. In OSEM, expectation-maximization steps are taken with respect to subsets of the measured data, resulting in an accelerated version of MLEM. Regularization is implicitly achieved by early stopping before full convergence to the noisy maximum likelihood image estimate.
2). MAP-EM:
A more sophisticated conventional iterative method is MAP-EM [18], an iterative algorithm for maximizing a regularized Poisson likelihood function. For our implementation, we follow Wang & Qi’s formulation of patch-based edge-preserving regularization [31]. In this formulation, at each iteration an image estimate
is computed via an OSEM update and a regularization image
is calculated with Wang & Qi’s regularization function. The OSEM estimate and regularization image are then combined using the De Pierro update [32] weighted by a scalar hyperparameter
.
3). FBSEM-Net: Unrolled Iterative Deep-Learned Method:
FBSEM-net (deep learning PET reconstruction with forward-backward splitting expectation-maximization) [5] is a state-of-the-art supervised deep learning PET reconstruction algorithm, that offers a principled approach to incorporating deep learning into physics-based reconstruction. This method unrolls the iterative MAP-EM algorithm, replacing the hand-crafted prior with a neural network that is learned from data.
Following Mehranian and Reader [5], for computational and memory efficiency we perform 30 burn-in iterations of OSEM with 4 subsets, followed by 12 FBSEM-net steps that simultaneously regularize and accelerate the reconstruction. The reconstruction target for training purposes is the ground truth.
We compare to two implementations with different neural architectures for the regularizing neural network: FBSEM-net, with a convolutional neural network (CNN) comparable to Mehranian and Reader [5], and FBSEM-net-adv, with the same network architecture used for score-based learning (with a constant timestep input). Section V-E contains training details.
FBSEM-net-adv is included in this study as a strong baseline representing the performance of deep learning approaches on within-distribution datasets when sinogram training data is available. Despite its strong performance in 2D, FBSEM-net-adv is omitted in 3D, due to computational infeasibility to train on available hardware.
B. PET Forward Operator
Each reconstruction method made use of the same ParallelProj projector [33]. The geometry of the scanner was modeled using the publicly available specifications of Siemens’ Biograph mMR scanner.
The provided normalization was for data that has been axially compressed to span 11 (with 5 or 6 central lines of response summed along the axial direction). To cope with this constraint, axial compression was explicitly modeled in the forward operator (shown to have a negligible effect on reconstruction quality by Belzunce and Reader [34]). In 2D, a 4.5mm full-width half-maximum (FWHM) Gaussian PSF was also used.
The full forward model used was therefore:
![]() |
where x is an image estimate,
is a span 11 sinogram of normalization factors,
is a span 11 sinogram of attenuation factors,
is a compression operator that converts a span 1 sinogram into span 11,
is the ParallelProj projector and P is the Gaussian PSF.
C. 3D Real [
F]DPA-714 Data
69 static [18F]DPA-714 brain datasets (from the Inflammatory Reaction in Schizophrenia team at King’s College London [35]) were used in this work. The radiotracer [18F]DPA-714 is a second-generation translocator protein (TSPO) PET probe, which is used to perform brain-wide quantitative analysis of TSPO. The datasets used in this work were acquired from healthy control subjects.
The data had been previously acquired from 1-hour scans with the Siemens Biograph mMR, with approximately 200 MBq administered, with total counts in the range
to
. At full-count, high-quality images (voxel size 2.08626 mm
mm
mm; 3D image size
) were reconstructed with the scanner defaults (OSEM with 21 subsets, 2 iterations and no PSF).
For each dataset, Siemens’ scanner-specific algorithms were used to produce normalization sinograms, compute attenuation maps from previously acquired paired CT scans, and approximate the distribution of scatter events.
D. 2D Simulated Data
In order to have knowledge of the ground truth, simulated data were used. 2D transverse slices of high-quality [18F]DPA-714 PET images were used as ground truths, with each image obtained via full-count reconstruction with Siemens’ implementation of the OSEM algorithm (with 21 subsets and 42 iterations, i.e. 2 full passes through the data). Forward projected data were obtained using a single axial ring, after rescaling each ground truth image slice such that the total count of simulated prompts matched the original estimate of true events. Corrective factors were modeled as purely attenuation and contamination was modeled as a constant background of 30% of simulated prompts.
In 2D, the dose level was set to 60% of a single direct plane sinogram, resulting in an average
counts per reconstructed slice. Poisson noise was then applied to the clean forward projected data.
E. Training and Validation for SGMs and FBSEM-Net
For each of the transverse, coronal and sagittal orientations, an SGM was trained with all non-empty 2D slices from 55 3D training datasets (clinical images). Each SGM was trained to minimize the DSM objective (5) for 100 epochs, a value identified by 5-fold cross-validation on the transverse training datasets. Training-time data augmentations such as rotation and translation were performed. The SGM architecture used was identical to Singh et al. [12].
2D FBSEM-net instances were trained using transverse slices from the same 55 training datasets as the SGMs, with slices from one 3D validation dataset used for early stopping on the validation loss. This same dataset was used for validation of the reconstruction process with the pre-trained SGMs, as well as bias-variance assessments in 2D. An additional 13 datasets were reserved as test data. All deep learning methods were trained with Adam with learning rate
.
F. 2D Reconstruction
Reconstructions in 2D were performed using simulated data (Section V-D). To calculate the 2D bias-variance assessment in Section VI-B and 2D likelihood-variance assessment in Section VI-G, for each of 10 random seeds, noisy sinogram data was generated according to the Poisson noise model. Then, reconstructions were performed for each independent noisy realization, with bias and standard deviation calculated according to Reader and Ellis [36].
Where unspecified, results from SGM-based reconstruction represent the mean of 5 samples with different random seeds (obtained from the same fixed noisy measured data), reconstructed with 100 generative timesteps.
was used by default, as well as 20 iterations of gradient ascent per generative step with PET-DDS. Normalized root mean square error (NRMSE) and structural similarity index measure (SSIM) were chosen to assess the global reconstruction performance.
G. 3D Reconstruction
Reconstructions in 3D were performed from real clinical research data (see Section V-C). To match clinical software outputs, our ParallelProj forward operator was used without a PSF. To calculate the 3D likelihood-variance assessment in Section VI-H2, prompts and randoms were sampled at 10% of counts assuming independent Poisson statistics, with smoothed randoms and scatter sinograms re-estimated using Siemens’ scanner software. To accommodate the computational demands of 3D reconstruction, we perform only 25 diffusion steps per reconstruction, compute only a single reconstructed sample instead of a sample mean, and also use
.
All experiments were conducted on an NVIDIA GeForce RTX 3090 with 24 GB GPU memory.
VI. Results
A. 2D Reconstruction Performance
Table I shows the quantitative performance of each reconstruction method, assessed against 10 central 2D slices through each of 13 test datasets and averaged over 3 independent realizations of noisy sinogram data. Optimal hyperparameters for each method were established using a hyperparameter sweep to minimize NRMSE on the validation dataset.
TABLE I. Quantitative Results for Reconstruction Methods Applied to 2D [18F]DPA-714 Simulated Data, With Hyperparameters for Each Method Chosen to Minimize NRMSE on a Validation Dataset. 95% Confidence Intervals (±) Calculated With Respect to 3 Different Realizations of Noisy Measured Data.
| Method | NRMSE (%)
|
SSIM (%)
|
Time (s) |
|---|---|---|---|
| OSEM |
![]() |
![]() |
0.1 |
| MAP-EM |
![]() |
![]() |
0.3 |
| FBSEM-net |
![]() |
![]() |
0.2 |
| FBSEM-net-adv |
![]() |
![]() |
0.7 |
| PET-DDS |
![]() |
![]() |
35 |
| Ours |
![]() |
![]() |
13 |
Fig. 3 presents representative reconstructed images from each method with these hyperparameters. As anticipated, our method and PET-DDS have similar performance quantitatively and qualitatively. We can say with confidence that without restrictions on time or hyperparameter searching, our method’s reconstruction accuracy is at least on par with PET-DDS.
Fig. 3.

Reconstructions with each method from 2D simulated [
F]DPA-714 data, with hyperparameters for each method chosen to minimize NRMSE on a validation dataset. Arrows point to key differences between reconstructions, including well-reconstructed structures (central arrow) and hallucinations due to the high noise level (right arrow). SGM-based reconstructions are the mean of 5 sampled reconstructions.
FBSEM-net-adv was the best-performing method in terms of both NRMSE and SSIM, which we attribute to the additional information incorporated in its training. In comparison, FBSEM-net over smooths reconstructions as a result of its simpler neural architecture.
The count level used is sufficiently low that several brain structures are not reconstructed by OSEM or MAP-EM. In these areas, the SGM-based reconstructions have more fine detail than other methods, but also more hallucinations; in contrast, FBSEM-net-adv has smoothed such areas. This difference in failure mode has contributed to the increased quantitative performance of FBSEM-net-adv relative to the SGM methods.
The indicative reconstruction times listed in Table I allow us to conclude that the SGM-based methods are currently an order of magnitude slower than the conventional and supervised methods. For this set of optimal hyperparameters for PET-DDS, our proposed method was faster; this may not hold true in other settings.
B. 2D Bias-Variance Assessment
Fig. 4 shows the results of a bias-variance assessment, performed on 10 central axial slices through the validation dataset. Where applicable, reconstruction hyperparameters were varied to show the effect of balancing the prior with the likelihood on the bias and variance properties of the reconstructions. This chart agrees with the previous quantitative results in Table I. In particular, our method achieves a similar or superior bias to PET-DDS with optimal hyperparameters for all variance levels.
Fig. 4.
2D bias-variance assessment for each method [36]. For OSEM and MAP-EM (
), one subset was used and iteration number was varied from 5 to 100. For PET-DDS,
was varied from 0 to 2 (with
the rightmost point and
values increasing in increments of 0.5 for each of the data points displayed), whereas for our method
was varied from 9 to 17 (increments of 2 shown, rightmost point corresponds to
.
C. 2D Hyperparameter Stability
To assess the stability of each SGM method, we considered varying the number of generative timesteps from 5 to 200 on both our method and PET-DDS, with the effect on reconstructions shown in Fig. 5. We also considered the effect of varying the step size
of gradient ascent steps, shown in Fig. 6. Whereas PET-DDS fails to converge for
, our method is robust to at least
. Furthermore, our method with large
uses fewer likelihood updates than generative steps, exhibiting remarkable efficiency, and demonstrating that our method is more efficiently exploiting the reduced need for gradient ascent steps with a strong diffusion prior.
Fig. 5.
Reconstruction error (NRMSE) of 10 2D slices using optimal hyperparameters chosen for 100 generative timesteps at alternate numbers of generative timesteps for our method and PET-DDS.
Fig. 6.
Reconstruction quality (SSIM) and number of likelihood updates for our method and PET-DDS using optimal hyperparameters chosen for 100 generative timesteps at alternate step sizes
, as evaluated on 10 2D slices.
These results show that our methodology is robust to both the number of generative timesteps chosen and the step size of gradient ascent employed. Therefore, for reasonable choices of generative timestep number and gradient ascent step size, our reconstruction error is solely a function of the target likelihood schedule.
D. Sample Path of Likelihood-Matched Vs Fixed Updates
Fig. 7 compares the evolution of likelihood of Tweedie’s estimate through the reverse diffusion process for our method and PET-DDS. We can see that the likelihood scheduling approach matches that of its likelihood schedule (and therefore the relevant MLEM estimate), whereas the PET-DDS reconstruction has no such guidance and with poor selection of
or
is liable to overfit to noise or underfit to measurement data.
Fig. 7.
Example log-likelihood of measured data with respect to current Tweedie’s estimate for a single reconstruction with our method and PET-DDS (with different values of
).
Fig. 8 shows the number of likelihood steps taken as a function of generative timestep. Whereas PET-DDS maintains a constant number of likelihood steps per generative timestep (reported by Singh et al. from 4 to 20 [12])), our method varies the number of likelihood steps to conform to the likelihood schedule. Relatively fewer steps at the start of the reverse diffusion process wastes less computation, as much information is lost when the re-noising step adds high-variance Gaussian noise to the Tweedie estimate. Fewer steps at the end of the reverse diffusion process reduces the chance of over-convergence to noisy measurement data.
Fig. 8.
Number of likelihood steps taken per generative denoising step for a representative 2D PET reconstruction with likelihood-scheduled SGM-based PET reconstruction.
In Section VI-A, our method used a mean of 201 likelihood updates per reconstructed sample (plus 14 for the likelihood scheduling) compared to 400-2000 fixed likelihood updates for PET-DDS (depending on the number of likelihood steps per generative timestep, reported from 4 to 20 [12]). It is clear that there are potential efficiency gains to be made by dynamically varying the number of likelihood steps taken.
E. Effect on Lesion Recovery
A major strength of unsupervised diffusion model approaches relative to supervised approaches such as FBSEM-net is their ability to resolve out-of-distribution features such as lesions [12]. To test this, we inserted a hot lesion into a 2D test dataset on the boundary between gray and white matter. We then performed reconstruction with this out-of-distribution dataset to investigate the lesion recovery performance of our approach. Figures 10 and 11 verify that our approach does not adversely affect lesion recovery and that our method matches or outperforms the improvements to the contrast recovery coefficient (CRC) that are claimed by PET-DDS. FBSEM-net-adv (FBSEM-net with an advanced neural network) performs notably poorly at this task, highlighting that unsupervised approaches display greater flexibility to reconstructing out-of-distribution datasets (and thereby achieve superior lesion recovery) compared to supervised approaches such as FBSEM-net.
Fig. 10.
Evaluation of methods for a lesion recovery task, with an out-of-distribution hot lesion inserted into a test dataset. NRMSE is measured globally, while CRC is evaluated on the lesion itself, with a large area of white matter non-overlapping the lesion chosen as the background region. OSEM and MAP-EM (
) have iteration number varied from 5 to 40 and 10 to 100 respectively. For PET-DDS,
was varied from 0.5 to 2.25, whereas for our method
was varied from 11 to 18. Results were averaged over 7 Poisson noise realizations. See Fig. 11 for corresponding images.
Fig. 11.

Visualization of each method’s performance for resolving an out-of-distribution hot lesion inserted into a test dataset. Where hyperparameters were varied, the image for each method was chosen as the image with the best NRMSE for a CRC value of at least 0.55. See Fig. 10 for each method’s corresponding quantitative performance.
F. 2D Reconstruction Uncertainty
For a fixed Poisson noise realization, we sample reconstructions based on the score-based prior. Fig. 9 shows varied reconstructions from simulated data at two low count levels. As counts increase, less shape variation, closer resemblance to the ground truth, and fewer artifacts are observed. At both count levels, the mean image appears smoother and better matches the ground truth than the samples. The impact of dose level and generative timesteps on SGM performance will be explored in future work.
Fig. 9.

Example reconstruction samples from two different count levels, using different random seeds (but with fixed noisy data realizations for each count level).
G. Comparison to MLEM for Varying Iteration
In Fig. 12, we directly compared reconstructions with our methodology and the MLEM images used to derive their likelihood schedules. Our SGM-based methodology delivers noise reduction and better structure preservation than MLEM. As likelihood increases, both reconstructions become noisy and the noise pattern in our reconstructions closely matches the noise in the MLEM reconstruction.
Fig. 12.

Example reconstructions with equivalent PLL using our likelihood-scheduling SGM-based method (mean of 5 samples) and MLEM, from 2D simulated data with
counts.
H. Real 3D Data
1). Qualitative Results:
In Fig. 13 we show reconstructions that achieve equivalent likelihood (to 21 iterations of MLEM) with different methods from real 10% count data. We note that our introduction of perpendicular SGMs resolves the slice inconsistency (seen as alternating intensity transverse slices) seen for PET-DDS’s (with a single SGM) coronal and sagittal slices. Both of the SGM-based methods display lower noise than the MLEM reconstruction at this likelihood level. (It should be noted that PET-DDS can eliminate slice inconsistency at lower likelihood at the cost of a blurring effect from the axial RDP.)
Fig. 13.

Example reconstructions for real [
F]DPA-714 data in 3D. All columns except “Clinical” used the ParallelProj projector [33]
without PSF; “Clinical” reconstructions used Siemens’ proprietary tools. Reconstructions from 10% count data match the PLL corresponding to 21 iterations of MLEM.
2). Quantitative Results:
Our methodology allows for direct comparisons between MLEM reconstruction (the clinical standard) and SGM-based reconstruction at the same likelihood. We leveraged this capability to assess the trade-off between likelihood and pixel-level variations between reconstructions of independent noise realizations of 10% count data, shown in Fig. 14. Our findings with real data concord with those in simulations, namely that for low likelihood values our methodology delivers improved reconstructions with reduced noise (relative to MLEM). In the regime of reconstructions over-fitting to noise (
), reconstructions from our methodology vary more than MLEM; this may be because there do not exist images with such high likelihood on the SGM’s manifold of probable images. However, all images at this likelihood level are too noisy to be suited to clinical tasks.
Fig. 14.
Coefficient of variation (CoV) against PLL for reconstructions from 10% counts of real [
F]DPA-714 3D measured data. CoV is measured as the mean pixel-wise coefficient of variation across reconstructions from 5 realizations of noisy measured data, as measured in a region of white matter (selected as a large visually uniform rectangle from the ground truth).
3). Timing:
We report the real computing time for 3D reconstruction with our proposed method and PET-DDS (both with and without the use of perpendicular SGMs) in Table II. Our approach is slightly more time efficient than PET-DDS for low-dose PET reconstruction (on a fair comparison with the same perpendicular pre-trained SGMs), with the overhead introduced from SGM steps greater than in 2D. The times for our approach include the time taken to construct the likelihood schedule; as this only occurs once per dataset, computing more samples is even more efficient with our method.
TABLE II. Real Computing Time in Seconds for Computing a Single Sample With Selected Methods in 3D. PET-DDS Was Evaluated With 4 Gradient-Based Steps Per Diffusion Iteration, While Our Approach Was Used With NMLEM=21 (Corresponding to Images in Fig. 12). Timings Include Constructing the Likelihood Schedule With 21 Steps of MLEM for Our Approach.
| Method | Time (s) | |
|---|---|---|
| 25 SGM steps | 100 SGM steps | |
| MLEM | 33 | 33 |
| PET-DDS with single SGM | 116 | 398 |
| PET-DDS with perp. SGMs | 137 | 488 |
Ours (
) |
145 | 385 |
Ours (
) |
126 | 369 |
Ours (
) |
116 | 368 |
VII. Discussion
Our results demonstrate that our method has a significantly lower hyperparameter tuning burden than PET-DDS, with the option of just tuning a single hyperparameter
to directly vary the balance between the prior and the likelihood. This replaces tuning strength of MAP regularization
; number of MAP iterations per generative step
; strength of RDP regularization
and gradient ascent step size
. In particular, this work integrates SGM priors with the standard clinical heuristic and vendor recommendation for the number of MLEM iterations.
Hallucinations in reconstructions are a known concern with SGM-based reconstruction. While hallucinations are present in some of the example low-dose reconstructions shown, it should be noted that this is an expected result; count levels were set deliberately low to explore the setting where structures are not clearly discernible from the OSEM reconstructions (and therefore cannot be easily reconstructed by MAP-EM even when relying on edge preservation priors). While individual samples may vary, given enough samples we can obtain a lower variance mean estimate. Our likelihood scheduling approach also makes it easier to increase the level of consistency with a model-based reconstruction by increasing
(and potentially spot hallucinations); it could also be integrated with recent approaches for reducing hallucinations on out-of-distribution data [37].
This work is the first methodology to investigate possible reconstructions for a fixed likelihood, providing the posterior distribution of image reconstruction conditioned on both a likelihood value and noisy measured data. This development opens the possibility of assessing the uncertainty of a reconstruction for a fixed likelihood level equivalent to a standard clinical reconstructed image. Furthermore, hyperparameter tuning could be eliminated completely by integrating this work with bootstrapping approaches for estimating the optimal PLL value or MLEM iteration [36].
Other methods of deriving a likelihood schedule may provide superior efficiency to the MLEM-based method proposed, as could altering the SGM’s noise schedule. Furthermore, the step size hyperparameter
could be fully eliminated with an adaptive step size strategy or replacing gradient ascent with a different model-based step.
Further work on 3D modeling with pre-trained SGMs could improve the image quality in 3D further, for example by integrating our likelihood scheduling approach with 2.5D training approaches, latent diffusion models or patch-based approaches [38]. Lastly, the methods presented may find use cases in other medical image modalities where model-based iterative steps are combined with pre-trained SGMs, such as magnetic resonance imaging or computed tomography [9].
VIII. Conclusion
In summary, we have shown a novel method for PET reconstruction with pre-trained SGMs, with the advantages of a lower hyperparameter tuning burden than previous SGM-based methods and simpler comparison to clinical methods. We further showed the applicability of pre-trained SGMs to real 3D PET reconstruction, and reduced issues with slice inconsistency and blurring in 3D.
Acknowledgment
For the purposes of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Accepted Author Manuscript version arising, in accordance with King’s College London’s Rights Retention policy. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health.
Funding Statement
This work was supported by the National Institute for Health and Care Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The work of George Webber was supported in part by EPSRC Centre for Doctoral Training in Smart Medical Imaging under Grant [EP/S022104/1], in part by GSK Studentship. The work of Yuya Mizuno has been supported by fellowship grants from the Japan Society for the Promotion of Science, Canon Foundation in Europe, Astellas Foundation for Research on Metabolic Disorders, and the Japanese Society of Clinical Neuropsychopharmacology. The work of Oliver D. Howes was supported in part by the Medical Research Council-U.K., under Grant [MC_U120097115], Grant [MR/W005557/1], and Grant [MR/V013734/1]; and in part by the Wellcome Trust under Grant [094849/Z/10/Z]. This work was also supported in part by the Wellcome/EPSRC Centre for Medical Engineering under Grant [WT 203148/Z/16/Z], and in part by EPSRC under Grant [EP/S032789/1].
Contributor Information
George Webber, Email: george.webber@kcl.ac.uk.
Yuya Mizuno, Email: yuya.mizuno@kcl.ac.uk.
Oliver D. Howes, Email: oliver.howes@kcl.ac.uk.
Alexander Hammers, Email: alexander.hammers@kcl.ac.uk.
Andrew P. King, Email: andrew.king@kcl.ac.uk.
Andrew J. Reader, Email: andrew.reader@kcl.ac.uk.
References
- [1].Bailey D. L., Positron Emission Tomography: Basic Sciences. Cham, Switzerland: Springer, 2005. [Google Scholar]
- [2].Boellaard R., “Standards for PET image acquisition and quantitative data analysis,” J. Nucl. Med., vol. 50, no. 1, pp. 11–20, May 2009. [DOI] [PubMed] [Google Scholar]
- [3].Reader A. J. and Pan B., “AI for PET image reconstruction,” Brit. J. Radiol., vol. 96, no. 1150, Oct. 2023, Art. no. 20230292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Reader A. J., Corda G., Mehranian A., Costa-Luis C. D., Ellis S., and Schnabel J. A., “Deep learning for PET image reconstruction,” IEEE Trans. Radiat. Plasma Med. Sci., vol. 5, no. 1, pp. 1–25, Jan. 2021. [Google Scholar]
- [5].Mehranian A. and Reader A. J., “Model-based deep learning PET image reconstruction using forward–backward splitting expectation–maximization,” IEEE Trans. Radiat. Plasma Med. Sci., vol. 5, no. 1, pp. 54–64, Jan. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Guazzo A. and Colarieti-Tosti M., “Learned primal dual reconstruction for PET,” J. Imag., vol. 7, no. 12, p. 248, Nov. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Zhu B., Liu J. Z., Cauley S. F., Rosen B. R., and Rosen M. S., “Image reconstruction by domain-transform manifold learning,” Nature, vol. 555, pp. 487–492, Mar. 2018. [DOI] [PubMed] [Google Scholar]
- [8].Häggström I., Schmidtlein C. R., Campanella G., and Fuchs T. J., “DeepPET: A deep encoder–decoder network for directly solving the PET image reconstruction inverse problem,” Med. Image Anal., vol. 54, pp. 253–262, May 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Webber G. and Reader A. J., “Diffusion models for medical image reconstruction,” BJR|Artificial Intell., vol. 1, no. 1, p. 013, Aug. 2024. [Google Scholar]
- [10].Chung H. and Ye J. C., “Score-based diffusion models for accelerated MRI,” Med. Image Anal., vol. 80, Aug. 2022, Art. no. 102479. [DOI] [PubMed] [Google Scholar]
- [11].Chung H., Ryu D., McCann M. T., Klasky M. L., and Ye J. C., “Solving 3D inverse problems using pre-trained 2D diffusion models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2023, pp. 22542–22551. [Google Scholar]
- [12].Singh I. R., et al. , “Score-based generative models for PET image reconstruction,” Mach. Learn. Biomed. Imag., vol. 2, pp. 547–585, Jan. 2024. [Google Scholar]
- [13].Hu R., et al. , “Unsupervised low-dose PET image reconstruction based on pre-trained denoising diffusion probabilistic prior,” J. Nucl. Med., vol. 65, p. 241109, Jun. 2024. [Google Scholar]
- [14].Webber G., Mizuno Y., Howes O. D., Hammers A., King A. P., and Reader A. J., “Generative-Model-Based fully 3D PET image reconstruction by conditional diffusion sampling,” in Proc. IEEE Nucl. Sci. Symp. (NSS), Med. Imag. Conf. (MIC) Room Temp. Semiconductor Detect. Conf. (RTSD), Nov. 2024, pp. 1–2. [Google Scholar]
- [15].Hou R., Li F., and Zeng T., “Fast and reliable score-based generative model for parallel MRI,” IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 1, pp. 953–966, Jan. 2025. [DOI] [PubMed] [Google Scholar]
- [16].Lee S., Chung H., Park M., Park J., Ryu W.-S., and Ye J. C., “Improving 3D imaging with pre-trained perpendicular 2D diffusion models,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2023, pp. 10676–10686. [Google Scholar]
- [17].Shepp L. A. and Vardi Y., “Maximum likelihood reconstruction for emission tomography,” IEEE Trans. Med. Imag., vol. MI-1, no. 2, pp. 113–122, Oct. 1982. [DOI] [PubMed] [Google Scholar]
- [18].Levitan E. and Herman G. T., “A maximum a posteriori probability expectation maximization algorithm for image reconstruction in emission tomography,” IEEE Trans. Med. Imag., vol. MI-6, no. 3, pp. 185–192, Sep. 1987. [DOI] [PubMed] [Google Scholar]
- [19].Hudson H. M. and Larkin R. S., “Accelerated image reconstruction using ordered subsets of projection data,” IEEE Trans. Med. Imag., vol. 13, no. 4, pp. 601–609, Dec. 1994. [DOI] [PubMed] [Google Scholar]
- [20].De Pierro A. R. and Yamagishi M. E. B., “Fast EM-like methods for maximum 'a posteriori' estimates in emission tomography,” IEEE Trans. Med. Imag., vol. 20, no. 4, pp. 280–288, Apr. 2001. [DOI] [PubMed] [Google Scholar]
- [21].Song Y. and Ermon S., “Generative modeling by estimating gradients of the data distribution,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, Jan. 2019, pp. 1–12. [Google Scholar]
- [22].Sohl-Dickstein J., Weiss E., Maheswaranathan N., and Ganguli S., “Deep unsupervised learning using nonequilibrium thermodynamics,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 2256–2265. [Google Scholar]
- [23].Ho J., Jain A., and Abbeel P., “Denoising diffusion probabilistic models,” in Proc. NIPS, vol. 33. Vancouver, BC, Canada: Curran Associates, 2020, pp. 6840–6851. [Google Scholar]
- [24].Song Y., Sohl-Dickstein J., Kingma D. P., Kumar A., Ermon S., and Poole B., “Score-Based Generative Modeling through Stochastic Differential Equations,” 2020, arXiv:2011.13456.
- [25].Anderson B. D. O., “Reverse-time diffusion equation models,” Stochastic Processes Appl., vol. 12, no. 3, pp. 313–326, May 1982. [Google Scholar]
- [26].Vincent P., “A connection between score matching and denoising autoencoders,” Neural Comput., vol. 23, no. 7, pp. 1661–1674, Jul. 2011. [DOI] [PubMed] [Google Scholar]
- [27].Efron B., “Tweedie's formula and selection bias,” J. Amer. Stat. Assoc., vol. 106, no. 496, pp. 1602–1614, Dec. 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Wittmer J., Badger J., Sundar H., and Bui-Thanh T., “An autoencoder compression approach for accelerating large-scale inverse problems,” Inverse Problems, vol. 39, no. 11, Oct. 2023, Art. no. 115009. [Google Scholar]
- [29].Zhu Y., et al. , “Denoising diffusion models for plug-and-play image restoration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jan. 2023, pp. 1219–1229. [Google Scholar]
- [30].Xie T., et al. , “Joint diffusion: Mutual consistency-driven diffusion model for PET-MRI co-reconstruction,” Phys. Med. Biol., vol. 69, no. 15, Jul. 2024, Art. no. 155019. [DOI] [PubMed] [Google Scholar]
- [31].Wang G. and Qi J., “Penalized likelihood PET image reconstruction using patch-based edge-preserving regularization,” IEEE Trans. Med. Imag., vol. 31, no. 12, pp. 2194–2204, Dec. 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].De Pierro A. R., “On the relation between the ISRA and the EM algorithm for positron emission tomography,” IEEE Trans. Med. Imag., vol. 12, no. 2, pp. 328–333, Jun. 1993. [DOI] [PubMed] [Google Scholar]
- [33].Schramm G. and Thielemans K., “PARALLELPROJ—An open-source framework for fast calculation of projections in tomography,” Frontiers Nucl. Med., vol. 3, Jan. 2024, Art. no. 1324562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Belzunce M. A. and Reader A. J., “Assessment of the impact of modeling axial compression on PET image reconstruction,” Med. Phys., vol. 44, pp. 5172–5186, Oct. 2017. [DOI] [PubMed] [Google Scholar]
- [35].Muratib F., Mizuno Y., Figueiredo I. C., Howes O., and Marques T. R., “Dissection of neuroinflammation in schizophrenia,” BJPsych Open, vol. 7, no. S1, pp. S274–S275, Jun. 2021. [Google Scholar]
- [36].Reader A. J. and Ellis S., “Bootstrap-Optimised regularised image reconstruction for emission tomography,” IEEE Trans. Med. Imag., vol. 39, no. 6, pp. 2163–2175, Jun. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Barbano R., et al. , “Steerable conditional diffusion for Out-of-Distribution adaptation in medical image reconstruction,” IEEE Trans. Med. Imag., vol. 44, no. 5, pp. 2093–2104, May 2025. [DOI] [PubMed] [Google Scholar]
- [38].Hu J., Song B., Fessler J. A., and Shen L., “Patch-based diffusion models beat whole-image models for mismatched distribution inverse problems,” 2024, arXiv:2410.11730.



































