Abstract
Inter-frame patient motion introduces spatial misalignment and degrades parametric imaging in whole-body dynamic positron emission tomography (PET). Most current deep learning inter-frame motion correction works consider only the image registration problem, ignoring tracer kinetics. We propose an inter-frame Motion Correction framework with Patlak regularization (MCP-Net) to directly optimize the Patlak fitting error and further improve model performance. The MCP-Net contains three modules: a motion estimation module consisting of a multiple-frame 3-D U-Net with a convolutional long short-term memory layer combined at the bottleneck; an image warping module that performs spatial transformation; and an analytical Patlak module that estimates Patlak fitting with the motion-corrected frames and the individual input function. A Patlak loss penalization term using mean squared percentage fitting error is introduced to the loss function in addition to image similarity measurement and displacement gradient loss. Following motion correction, the parametric images were generated by standard Patlak analysis. Compared with both traditional and deep learning benchmarks, our network further corrected the residual spatial mismatch in the dynamic frames, improved the spatial alignment of Patlak Ki/Vb images, and reduced normalized fitting error. With the utilization of tracer dynamics and enhanced network performance, MCP-Net has the potential for further improving the quantitative accuracy of dynamic PET. Our code is released at https://github.com/gxq1998/MCP-Net.
Keywords: Inter-frame motion correction, Parametric imaging, Tracer kinetics regularization, Whole-body dynamic PET
1. Introduction
Whole-body dynamic positron emission tomography (PET) using 2-deoxy-2-[18F]fluoro-D-glucose (FDG) has emerged as a more accurate glycolytic metabolism measurement in clinical and research protocols [7] than static PET due to the time dependency of radiotracer uptake [21]. In continuous-bed-motion mode (CBM) [17], an image sequence is typically collected for 90 min starting at the tracer injection and then fitted with voxel-wise kinetic modeling for parametric imaging [22]. For FDG that typically follows irreversible 2-tissue compartmental model, the Patlak plot [18] is a simplified linear model for parametric estimation. The Patlak slope Ki [18], the net uptake rate constant, has been shown to improve oncological lesion identification with higher tumor-to-background and contrast-to-noise ratio [8].
However, patient motion is unpreventable during the long scanning period, which has a harmful impact on parametric imaging [14]. The inter-frame motion caused by body movement and the changes in the long-term cardiac and respiratory motion patterns lead to increased parameter estimation errors. In the whole-body scope, subject motion is non-rigid, complicated, and unpredictable, and the simultaneous presence of high uptake and low uptake organs makes motion correction substantially harder. The significant cross-frame tracer distribution variation also complicates the motion estimation and correction.
Recent breakthroughs in deep learning based image registration have achieved superior performance and computational efficiency over conventional non-rigid registration methods. Spatial-temporal network structures have been applied in motion-related dynamic image sequence registration [10,13,19,24], outperforming single image-pair networks [1,23]. Although multiple-frame analysis has the potential of improving dynamic PET motion correction, most methods still only consider image registration without utilizing tracer kinetics, which could lead to residual misalignment especially for images with low uptake [11]. While the incorporation of kinetic modeling into the registration has been proposed in dynamic contrast-enhanced magnetic resonance imaging [2,15], dynamic PET typically involves more complex tracer dynamics. Pharmacokinetics has been utilized in PET reconstruction [5] and brain registration [11], but such implementation has not been introduced in either deep learning or the whole-body scope.
In this work, we propose an inter-frame Motion Correction framework with Patlak regularization (MCP-Net) for whole-body dynamic PET. An analytical Patlak fitting module is integrated into the framework with Patlak loss penalization introduced to the loss function. We evaluated the proposed model using a simulated dataset and a 9-fold cross-validation on an internal real-patient dataset and compared with traditional and deep learning benchmarks.
2. Methods
2.1. Dataset and Pre-processing
Five healthy and 22 cancer subjects (27 in total) were enrolled at Yale PET Center, with obtained informed consent and Institutional Review Board approval. Each subject underwent a 90-min dynamic multi-frame whole-body PET scan using CBM protocol on a Siemens Biograph mCT with FDG bolus injection [16], obtaining 19 consecutive reconstructed whole-body frames (4 × 2 min and 15 × 5 min) as well as the individual input function. The details of the image acquisition are in Supplementary Fig. S1. In the 22 cancer patients, 57 hypermetabolic regions of interest (ROIs) were selected by a nuclear medicine physician for additional evaluation. Dynamic frames were calibrated to have units of standardized uptake values (SUV). Due to GPU RAM limitation, the input frames were downsampled by a factor of 4 and zero-padded to the same resolution of 128 × 128 × 256 for all the deep learning approaches. The displacements were upsampled back to the original resolution using a spline interpolation (order = 3) before warping the original frames.
2.2. Proposed Network
The MCP-Net consists of a spatial-temporal motion estimation network, a spatial transformation module, and an analytical Patlak fitting module (see Fig. 1).
Fig. 1.

The overall structure of the proposed motion correction framework MCP-Net.
Spatial-temporal Motion Estimation Network.
The dual-channel motion estimation input is a moving frame sequence, each concatenated with the reference frame. To decrease the driving force of high-uptake voxels while also minimize saturation in the local normalized cross-correlation (NCC) loss computation, an intensity cutoff layer is first applied with threshold of SUV = 2.5 and Gaussian noise (σ = 0.01) added to thresholded voxels [10]. To handle multiple-frame spatial-temporal analysis, the motion estimation module is a concatenated 3-D U-Net [6] with shared weights and a convolutional long short-term memory [20] layer integrated at the bottleneck [10]. This enables information extraction across both adjacent and non-adjacent frames, especially effective for long-duration motion and tracer changes. The number of both encoding and decoding U-Net levels is set to 4, with input sequence length N = 5. With the voxel-wise displacement field estimations, the spatial transformation layer warps the input frames and outputs motion compensated frames.
The Analytical Patlak Fitting Module.
The subsequent analytical Patlak fitting module estimates parametric imaging results using the Patlak plot [18] after a starting time t* = 20 min,
| (1) |
where CT is the tissue radiotracer concentration, CP is the input function (plasma radiotracer concentration), Ki is the slope (the net uptake rate constant), and Vb is the y-axis intercept. This method incorporates image-based regression weights into the fitting process more easily [3]. The Patlak fitting module takes the subject input function and the whole CBM dynamic frame sequence as the input; in addition to the frames corrected by the current iteration, the remaining frames are corrected by the previous update. With the Patlak Ki and Vb estimations, the tracer kinetics information and parametric regularization are introduced to the framework.
The Patlak Penalization in Loss Function.
The Patlak penalization by mean squared percentage fitting error (MSPE) is incorporated to the loss function in addition to an image similarity measurement by local NCC and a displacement gradient loss by one-sided forward difference [1],
| (2) |
where N is the motion estimation input sequence length, FR is the reference frame, is the ith warped frame, λ is the gradient loss regularization factor, φi is the estimated displacement field for frame i, T is the total number of frames after t*, α is the Patlak penalization factor, is the jth Patlak input dynamic frame, and is the jth Patlak fitted dynamic frame.
2.3. Training Details and Baseline Comparison
All 5-min frames were motion corrected with Frame 12 (in the middle of the 5-min scans) as the fixed frame. Five successive frames from the same subject are sent into the motion estimation network as the input sequence, with the remaining 10 frames from the last update and individual input function sent to the Patlak estimation module. Note that the motion estimation inputs are all the original frames without correction, while the remaining frames for Patlak fitting are updated every 50 epochs to provide the latest Patlak error estimation by the currently-trained network. The update cycle is determined from the trade-off between time consumption and network convergence while also avoiding the instability in the early stage of training. A preliminary ablation study of the updating strategy was conducted on a random fold of cross-validation and summarized in Supplementary Table S1.
We included a traditional entropy-based non-rigid registration method implemented by BioImage Suite [9,12] (BIS), a single-pair deep learning registration model Voxelmorph [1] (VXM), and a multiple-frame analysis model [10] (B-convLSTM) as the benchmarks for performance comparison under a 9-fold subject-wise cross-validation. Deep learning approaches were developed using Keras (TensorFlow backend) on an NVIDIA Quadro RTX 8000 GPU and trained using Adam optimizer (learning rate = 10−4). The stopping epoch was determined based on the observed minimal validation loss, which was 750 for VXM and 500 for B-convLSTM and MCP-Net. The regularization factor λ of both VXM and B-convLSTM was fixed as 1 [1,10]. The hyperparameters of MCP-Net were set at λ = 0.1 and α = 0.1. A preliminary sensitivity test of λ and α was implemented with results in Supplementary Tables S2 and S3.
2.4. Evaluation Metrics
To directly evaluate motion estimation, we ran a motion simulation test by applying patient-derived motion vector predictions to the selected “motion-free” frames of another subject, since ground-truth motion vectors are not available in real-patient datasets. Specifically, we selected 3 subjects with insignificant motion and treated the frames after motion correction by another well-trained deep learning model VXM-multiframe [10] as motion-free. To achieve realistic simulated motion, we selected another 3 subjects with visually significant motion and used the motion fields also estimated by VXM-multiframe as the ground-truth such that the model for motion generation is independent of the tested models. We calculated the average absolute prediction error in each frame,
| (3) |
where V is the total number of voxels per frame, (, , ) is the ground-truth motion and (, , ) is the predicted motion. The voxel-wise motion prediction error maps were also visualized.
To assess dynamic frame similarity in real-patient results, we calculated the structural similarity index between the reference frame and the average of all warped moving frames (Avg-to-ref SSIM) in each subject. The Ki and Vb images were generated by the Patlak analysis at the original image resolution. The dynamic frames and Ki/Vb images were overlaid to visualize motion-related mismatch, as both Ki and Vb are motion-sensitive and higher alignment indicates better registration. Whole-body Ki/Vb normalized mutual information (NMI) was computed to measure alignment. To assess Patlak fitting, the normalized weighted mean fitting errors (NFE) were formulated as
| (4) |
where wk is the fitting weight of time activity curve after decay correction [4], is the fitted tissue tracer concentration, CT (tk) is the acquired tissue tracer concentration, tk is the middle acquisition time of the kth frame, and T is the total number of frames after t*. The NFE maps were visualized voxel-wise and the NFE statistics in whole-body, torso, and ROIs were computed. Paired two-tailed t-tests with significance level = 0.05 were used for statistical analysis.
3. Results
3.1. Motion Simulation Test
In Fig. 2, the voxel-wise motion prediction error map of MCP-Net is generally the darkest with reduced hotspots within the body outline and torso organs, indicating decreased inference error. All deep learning models achieved lower motion prediction errors than BIS. MCP-Net significantly decreased motion prediction error compared with other baselines (p < 0.05), suggesting improved robustness and prediction accuracy. Sample overlaid dynamic frames demonstrating motion simulation and estimation accuracy are shown in Supplementary Fig. S2.
Fig. 2.

Sample absolute error maps of predicted motion fields. The subject-wise mean absolute prediction errors (in mm, mean ± standard deviation) are annotated.
3.2. Qualitative Analysis
In Fig. 3, sample overlaid dynamic frame pairs showed inter-frame motion and correction effects for heart, liver, and kidney misalignment. Both single-pair baselines BIS and VXM significantly decreased the misalignment but residual under-correction was still present. While B-convLSTM with multiple-frame analysis further improved motion correction, our proposed MCP-Net achieved the lowest remaining spatial mismatch. Similarly, in Fig. 4, both BIS and VXM reduced the motion-related Ki/Vb misalignment but still had residual mismatch. While B-convLSTM showed improved alignment, the MCP-Net realized the best spatial match at the brain edge, liver dome and the lower edge, gastrointestinal (GI) tract, and bone. Thus, Patlak regularization improved both frame and Ki/Vb spatial alignment. In Fig. 5, using the same color scale, the voxel-wise NFE maps for MCP-Net were generally the darkest with the most reduced bright spots of high fitting error like the heart and liver regions. For the significant motion of the hand and bladder, which do not strictly follow the 2-tissue compartmental models, the proposed MCP-Net still has the capability to reduce error.
Fig. 3.

Overlaid dynamic frames showing inter-frame motion and the correction effects in heart (upper), liver (lower, white arrows), and kidney (lower, cyan arrows).
Fig. 4.

Overlaid Patlak Ki and Vb images showing motion correction impacts in brain (upper), liver (middle), GI tract (lower, white arrows), and bone (lower, cyan arrows).
Fig. 5.

Sample voxel-wise parametric NFE maps in liver (upper, white arrows), heart (upper, cyan arrows), bladder (lower, white arrows), and hand (lower, cyan arrows).
3.3. Quantitative Analysis
Table 1 summarized the quantitative analysis results. Both multiple-frame models achieved substantial improvements in each metric, while the proposed MCP-Net consistently achieved the lowest whole-body and torso NFEs as well as the highest Avg-to-ref SSIM and whole-body Ki/Vb NMI. Compared with BIS and VXM, both B-convLSTM and MCP-Net significantly reduced the NFEs and increased Ki/Vb NMI (p < 0.05). The MCP-Net achieved significantly lower torso NFE (p = 0.027), higher whole-body Ki/Vb NMI (p = 0.019), and improved Avg-to-ref SSIM (p = 5.01e − 5) than B-convLSTM. Thus, the MCP-Net with Patlak regularization could further improve parametric imaging with reduced fitting error and enhanced Ki/Vb alignment by directly utilizing tracer kinetics.
Table 1.
Quantitative analysis of the inter-frame motion correction approaches (mean ± standard deviation) with the best results marked in bold.
| Avg-to-ref SSIM | Whole-body NFE | Torso NFE | Whole-body Ki/Vb NMI | |
|---|---|---|---|---|
| Original | 0.9487 ± 0.0160 | 0.3634 ± 0.1044 | 0.7022 ± 0.1079 | 0.8923 ± 0.0376 |
| BIS | 0.9509 ± 0.0149 | 0.3370 ± 0.0943 | 0.6908 ± 0.1088 | 0.9105 ± 0.0362 |
| VXM | 0.9446 ± 0.0165 | 0.3422 ± 0.0977 | 0.6834 ± 0.1039 | 0.9067 ± 0.0446 |
| B-convLSTM | 0.9513 ± 0.0146 | 0.2857 ± 0.0836 | 0.6390 ± 0.1005 | 0.9281 ± 0.0357 |
| MCP-Net | 0.9523 ± 0.0141 | 0.2840 ± 0.0829 | 0.6197 ± 0.1032 | 0.9295 ± 0.0348 |
In Fig. 6, box plots show the distributions of the mean and maximum NFE of each ROI. The MCP-Net showed consistent improvement in reducing mean and maximum error as B-convLSTM, but MCP-Net demonstrated stronger reduction for the outliers with extremely high fitting error, suggesting increased robustness.
Fig. 6.

Mean (left) and maximum (right) ROI NFEs for each motion correction method.
4. Conclusion
We proposed MCP-Net, a whole-body dynamic PET inter-frame motion correction framework with Patlak penalization. With the integrated analytical Patlak fitting module and Patlak loss regularization, the MCP-Net demonstrated improved motion correction both qualitatively and quantitatively compared with conventional and deep learning benchmarks. This shows the advantage of directly optimizing parameter estimation and utilizing tracer kinetics in multiple-frame motion correction analysis. The proposed MCP-Net has the potential to be applied to other tracers and further developed into a joint end-to-end motion correction and parametric imaging framework. Future directions will include investigating intra-frame motion correction as well as the mismatch problem in attenuation and scatter correction.
Supplementary Material
Acknowledgements.
This work is supported by National Institutes of Health (NIH) through grant R01 CA224140.
Footnotes
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-16440-8_16.
References
- 1.Balakrishnan G, Zhao A, Sabuncu MR, Guttag J, Dalca AV: Voxelmorph: a learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 38(8), 1788–1800 (2019) [DOI] [PubMed] [Google Scholar]
- 2.Bhushan M, Schnabel JA, Risser L, Heinrich MP, Brady JM, Jenkinson M: Motion correction and parameter estimation in dceMRI sequences: application to colorectal cancer. In: Fichtinger G, Martel A, Peters T (eds.) MICCAI 2011. LNCS, vol. 6891, pp. 476–483. Springer, Heidelberg: (2011). 10.1007/978-3-642-23623-5_60 [DOI] [PubMed] [Google Scholar]
- 3.Carson RE: Tracer kinetic modeling in PET. In: Bailey DL, Townsend DW, Valk PE, Maisey MN (eds.) Positron Emission Tomography. Springer, London: (2005). 10.1007/1-84628-007-9_6 [DOI] [Google Scholar]
- 4.Chen K, Reiman E, Lawson M, Feng D, Huang SC: Decay correction methods in dynamic pet studies. IEEE Trans. Nucl. Sci 42(6), 2173–2179 (1995) [Google Scholar]
- 5.Cheng X: Improving reconstruction of dynamic PET imaging by utilizing temporal coherence and pharmacokinetics. Ph.D. thesis, Technische Universität München (2015) [Google Scholar]
- 6.Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin S, Joskowicz L, Sabuncu MR, Unal G, Wells W (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham: (2016). 10.1007/978-3-319-46723-8_49 [DOI] [Google Scholar]
- 7.Dimitrakopoulou-Strauss A, Pan L, Sachpekidis C: Kinetic modeling and parametric imaging with dynamic pet for oncological applications: general considerations, current clinical applications, and future perspectives. Eur. J. Nucl. Med. Mol. Imaging 48(1), 21–39 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fahrni G, Karakatsanis NA, Di Domenicantonio G, Garibotto V, Zaidi H: Does whole-body patlak 18 f-fdg pet imaging improve lesion detectability in clinical oncology? Eur. Radiol 29(9), 4812–4821 (2019) [DOI] [PubMed] [Google Scholar]
- 9.Guo X, et al. : Inter-pass motion correction for whole-body dynamic parametric pet imaging. In: 2021 Society of Nuclear Medicine and Molecular Imaging Annual Meeting (SNMMI 2021), pp. 1421. SNMMI, Soc Nuclear Med; (2021) [Google Scholar]
- 10.Guo X, Zhou B, Pigg D, Spottiswoode B, Casey ME, Liu C, Dvornek NC: Unsupervised inter-frame motion correction for whole-body dynamic PET using convolutional long short-term memory in a convolutional neural network. Med. Image Anal 80, 102524 (2022). 10.1016/j.media.2022.102524 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jiao J, Searle GE, Tziortzi AC, Salinas CA, Gunn RN, Schnabel JA: Spatio-temporal pharmacokinetic model based registration of 4d pet neuroimaging data. Neuroimage 84, 225–235 (2014) [DOI] [PubMed] [Google Scholar]
- 12.Joshi A, et al. : Unified framework for development, deployment and robust testing of neuroimaging algorithms. Neuroinformatics 9(1), 69–84 (2011) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Li M, Wang C, Zhang H, Yang G: Mv-ran: multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis. Comput. Biol. Med 120, 103728 (2020) [DOI] [PubMed] [Google Scholar]
- 14.Lu Y, et al. : Data-driven voluntary body motion detection and non-rigid event-by-event correction for static and dynamic pet. Phys. Med. Biol 64(6), 065002 (2019) [DOI] [PubMed] [Google Scholar]
- 15.Mojica M, Ebrahimi M: Motion correction in dynamic contrast-enhanced magnetic resonance images using pharmacokinetic modeling. In: Medical Imaging 2021: Image Processing, vol. 11596, p. 115962S. International Society for Optics and Photonics; (2021) [Google Scholar]
- 16.Naganawa M, et al. : Assessment of population-based input functions for Patlak imaging of whole body dynamic 18 f-fdg pet. EJNMMI Phys. 7(1), 1–15 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Panin V, Smith A, Hu J, Kehren F, Casey M: Continuous bed motion on clinical scanner: design, data correction, and reconstruction. Phys. Med. Biol 59(20), 6153 (2014) [DOI] [PubMed] [Google Scholar]
- 18.Patlak CS, Blasberg RG, Fenstermacher JD: Graphical evaluation of blood-to-brain transfer constants from multiple-time uptake data. J. Cerebral Blood Flow Metabolism 3(1), 1–7 (1983) [DOI] [PubMed] [Google Scholar]
- 19.Shi L, et al. : Automatic inter-frame patient motion correction for dynamic cardiac pet using deep learning. IEEE Trans. Med. Imaging 40, 3293–3304 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Shi X, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC: Convolutional lstm network: A machine learning approach for precipitation nowcasting. arXiv preprint arXiv:1506.04214 (2015) [Google Scholar]
- 21.Vaquero JJ, Kinahan P: Positron emission tomography: current challenges and opportunities for technological advances in clinical and preclinical imaging systems. Annu. Rev. Biomed. Eng 17, 385–414 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wang G, Rahmim A, Gunn RN: Pet parametric imaging: past, present, and future. IEEE Trans. Radiat. Plasma Med. Sci 4(6), 663–675 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zhao S, Lau T, Luo J, Eric I, Chang C, Xu Y: Unsupervised 3d end-to-end medical image registration with volume Tweening network. IEEE J. Biomed. Health Inform 24(5), 1394–1404 (2019) [DOI] [PubMed] [Google Scholar]
- 24.Zhou B, Tsai YJ, Chen X, Duncan JS, Liu C: MDPET: a unified motion correction and denoising adversarial network for low-dose gated pet. IEEE Trans. Med. Imaging 40, 3154–3164 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
