Deep learning-based head and neck deformable image registration using spatio-temporal analysis and self attention

Donghoon Lee; Yu-Chi Hu; Teeradon TreeChairusame; Jung Hun Oh; Nancy Lee; Michalis Aristophanous; Laura Cerviño; Pengpeng Zhang

doi:10.1016/j.phro.2026.100928

. 2026 Feb 15;37:100928. doi: 10.1016/j.phro.2026.100928

Deep learning-based head and neck deformable image registration using spatio-temporal analysis and self attention

Donghoon Lee ^a,^⁎, Yu-Chi Hu ^a, Teeradon TreeChairusame ^b,^c, Jung Hun Oh ^a, Nancy Lee ^b, Michalis Aristophanous ^a, Laura Cerviño ^a, Pengpeng Zhang ^a

PMCID: PMC12933849 PMID: 41756517

Graphical abstract

Keywords: Deformable image registration, Head and neck cancer, Deep learning, Adaptiveradiotherapy

Highlights

•
Proposed method completed deformable registration in under 30 s.
•
Spatio-temporal analysis yielded a Dice similarity coefficient over 0.80.
•
Patch-based learning handled variable image sizes without resizing.
•
Inverse consistency showed a Pearson correlation of 0.99 in bidirectional field.

Abstract

Background and purpose

Significant anatomical changes during head and neck cancer (HNC) radiotherapy challenge accurate dose delivery. Deformable image registration (DIR) is essential for adaptive radiotherapy (ART), yet conventional methods are too slow for online clinical use. This study proposed a novel deep learning-based DIR algorithm for longitudinal HNC imaging.

Materials & methods

We used sixty HNC patient datasets, each containing a planning CT (pCT) and six weekly cone-beam CTs (CBCTs). Fifty datasets were used for training with cross-validation, and the remaining ten were reserved for testing. The proposed DIR algorithm is a patch-based model that integrates 3D convolutional neural networks, self-attention, and a convolutional Long Short Term Memory to model temporal deformations. The model predicted bidirectional deformation vector fields and was trained with a composite loss function combining image similarity, DVF smoothness, and inverse consistency. Performance was benchmarked against the large deformation diffeomorphic metric mapping (LDDMM) algorithm using Dice similarity coefficient (DSC), Hausdorff distance, and Jacobian analysis.

Results

The proposed method achieved significantly faster inference, performing bidirectional DIR between the pCT and all six weekly CBCTs in under 3 min and averaging about 30 s per patient, while matching or exceeding LDDMM’s accuracy. DSC remained above 0.8 for all key structures, and the method demonstrated improved DVF consistency with lower mean and 95th percentile Hausdorff distances. Unlike LDDMM, it required no manual parameter tuning, providing consistent results.

Conclusion

The proposed DIR algorithm enabled rapid, accurate, and consistent image registration, supporting real-time ART workflows and retrospective dose accumulation in personalized radiotherapy.

1. Introduction

More than 50% of head and neck cancer (HNC) patients experience significant weight loss during radiotherapy, causing anatomical changes that compromise the original treatment plan's accuracy [1], [2], [3]. To address inaccuracies in dose delivery, adaptive radiotherapy (ART) is used to dynamically update plans according to the patient’s current anatomy [4], [5], [6].

A critical component of ART is deformable image registration (DIR), which aligns images taken at different time points or modalities to account for anatomical changes. DIR plays a pivotal role in mapping the changes caused by weight loss or tumor shrinkage, thereby ensuring an accurate estimation of the dose delivered to the patient [7], [8]. However, traditional DIR methods face significant challenges in clinical applications due to their computational complexity and lengthy processing times [9]. These limitations are especially pronounced in the context of online ART, where speed and reliability are crucial.

In recent years, deep learning-based DIR methods have been introduced and have shown promising results in applications to treatment sites such as the lungs and brain [10], [11], [12]. These methods leverage neural networks to model complex deformations more efficiently and accurately than traditional approaches. Despite these advancements, HNC presents unique challenges. The head and neck anatomy is highly complex, with substantial deformation and inhomogeneity. The air cavities and irregular tissue densities cause noticeable image artifacts. These factors complicate multi-modality DIR tasks, making it difficult to achieve precise registration [13], [14], [15].

The complexity of HNC anatomy, coupled with the necessity for rapid and reliable re-planning inherent in the adaptive process [4], [5], [6], highlights a persistent gap: the lack of a DIR solution that is both highly accurate—especially for complex HNC changes observed on multi-modal images [8], [14], [15] —and computationally efficient enough for routine clinical implementation. While deep learning provided the efficiency necessary for high-speed processing [10], [12], it must be robustly integrated with techniques that can account for the temporal and structural complexities inherent in longitudinal treatment changes [10], [12], [16]. Developing an algorithm that simultaneously addresses the anatomical challenges of the head and neck and the demanding throughput requirements of ART remains a critical area of research to realize the promise of adaptive radiotherapy fully [16].

To address the limitations of current DIR methods regarding both speed and precision in the complex HNC environment, we developed a novel DIR algorithm that integrates longitudinal image analysis with advanced deep learning techniques. The primary aim of this study was to rigorously evaluate the accuracy, stability, and computational efficiency of this newly developed algorithm, comparing its performance against traditional DIR methods to demonstrate its feasibility for high-throughput ART applications.

2. Materials and methods

2.1. Clinical dataset

Our study recruited 60 HNC patients treated at our institution between 2021 and 2024 under Institutional Review Board–approved protocol #16-700. The requirement for informed consent was waived due to the retrospective nature of the study. The cohort was divided into a training and cross-validation set (n = 50) and an independent testing set (n = 10). Each dataset included a planning computed tomography (pCT) scan along with at least six weekly cone-beam computed tomography (CBCT) scans per patient.

For the model development, we employed a 5-fold cross-validation scheme to optimize the hyper-parameters such as learning rate, batch size and weighting factors in the loss function. The 50 training patients were equally divided into five folds, with 10 patients allocated to each fold for validation. The entire temporal sequence (pCT and all weekly CBCTs) for a given patient remained exclusively within the same fold to prevent data leakage. A set of hyper-parameters was chosen to ensure that the training is properly converged and there was no overfitting for all the 5 models in the cross-validation. The chosen set of hyper-parameters was used to train our final model with entire 50 training patients. The final model was evaluated on an independent test patient cohort (n = 10).

The pCTs were acquired on a Philips Big Bore scanner (Philips Medical Systems, Cleveland, OH, USA) using 120 kVp, 2.0–3.0 mm slice thickness with intravenous contrast, and patients were immobilized using a 5-point thermoplastic mask. Weekly CBCTs were acquired on a Varian TrueBeam linear accelerator (Varian Medical Systems, Palo Alto, CA, USA) using either full-fan or Spotlight technique (a high-resolution, limited field-of-view scan mode).

To develop the DIR model, we first performed rigid alignment of all weekly CBCT scans to the corresponding pCT to ensure spatial consistency. To optimize the efficiency of deep learning training and reduce computational load, we cropped each image into a fixed volume size of 256 × 256 × 128 voxels. In the head and neck region, artifacts caused by metal dental implants often result in exceptionally high Hounsfield unit (HU) values, complicating model training. To address this issue, we thresholded the HU values within a range of −1000 to 1000. Subsequently, for data normalization, we scaled the voxel intensities to fall within a range of −1 to 1. For evaluation, we utilized manual contours of the primary gross tumor volume (GTVp), parotid glands, and submandibular glands, which were delineated by radiation oncologists. These contours served as the ground truth for assessing the performance of the DIR model.

In cases of a limited CBCT field-of-view, we cropped both the pCT and its contours to match the visible portion of the parotid gland, ensuring consistent data for training.

2.2. Description of the proposed deep learning model

For this study, we significantly enhanced our Seq2Morph algorithm, previously applied to lung cancer DIR [16]. To address the challenges of HNC radiotherapy, we incorporated a patch-based scheme for variable image sizes, a self-attention mechanism (SAM) for highly deformable regions, and an inverse consistency loss to ensure robust bidirectional transformations (Fig. 1).

Fig. 1 — A schematic representation of the Seq2Morph model, which incorporates 3D convolutional layers, 3D self-attention maps (SAM), and 3D Convolutional Long Short Term Memory (ConvLSTM) structures. The model was designed for deformable image registration (DIR) in both directions, from planning CT to weekly CBCT and vice versa.

Our improved Seq2Morph model uses a sequence-to-sequence architecture with a U-Net-based recurrent unit [17], [18], [19], [20], [21]. This unit integrates a 3D SAM and a Convolutional Long Short Term Memory (ConvLSTM) to learn temporal deformation patterns, with the SAM focusing the model on highly deformable regions [22], [23], [24]. To mitigate data scarcity, we employed a patch-based approach, sampling each volume into 512 overlapping patches (64 × 64 × 64). The model outputs a sequence of weekly DVFs in both directions, which are reconstructed into full volumes by averaging the overlapping patch predictions.

The loss function in training includes image similarity loss (IS), a penalty to the smoothness of bi-directional DVFs (SM), and inverse consistency loss (IC). Given a time point t, let $ϕ_{t}$ and ${ϕ'}_{t}$ be the deformation from pCT to CBCT_t and deformation from the CBCT_t to pCT, respectively. The image similarity loss term is defined as follows in Equation (1):

IS = \frac{1}{N} \sum_{t = 1}^{t = N} 1 - N C C (ϕ_{t} ({CBCT}_{t}), p C T) + 1 - N C C ({ϕ'}_{t} (p C T), {CBCT}_{t})

(1)

where NCC is normalized cross-correlation.

The penalty term for the smoothness of bi-directional DVFs is defined by the L2-norm of the gradient of DVF as shown in Equation (2):

SM = \frac{1}{N} \sum_{t = 1}^{t = N} \sum_{v \in V} ‖ \nabla ϕ_{t} (v) ‖ + ‖ \nabla {ϕ^{'}}_{t} (v) ‖

(2)

Given an identity field $g$ , inverse consistency loss of bi-directional DVF is defined in Equation (3):

IC = \frac{1}{N} \frac{1}{|V|} \sum_{t = 1}^{t = N} \sum_{v \in V} {‖ (g + ϕ_{t} (g)) - (g - {ϕ'}_{t} (g)) ‖}^{2}

(3)

Finally, the total loss is the weighted losses from the 3 terms, as described in Equation (4):

Loss = I S + w_{1} S M + w_{2} I C

(4)

In this study, we set w1 = w2 = 0.5. The weight w1 was selected based on the VoxelMorph framework [12], which proposed an unsupervised deep learning-based approach for deformable image registration. The weight w2 was chosen to equally balance the regularization term, promoting smoothness in the deformation field and maintaining equilibrium in the overall loss function. Leveraging a high-performance computing cluster equipped with NVIDIA A40 GPUs, our model required approximately two days for training, while the inference time for DIR of six weekly CBCTs was under 30 s.

2.3. Quantitative evaluation

To evaluate the performance of our proposed model, we conducted a comprehensive quantitative analysis. Image similarity between the deformed images and the corresponding target images was evaluated using the structural similarity index (SSIM). SSIM was calculated for each weekly registration of all 10 test patients, and the mean value across all patients and time points was reported. The geometric accuracy of the deformation was assessed using the Dice similarity coefficient (DSC) and Hausdorff distance (both mean and 95th percentile) for key anatomical structures, including the GTVp, parotid glands, and submandibular glands. All metrics were calculated using MATLAB R2024a.

Additionally, we compared our network’s performance against a state-of-the-art DIR algorithm, Large Deformation Diffeomorphic Metric Mapping (LDDMM), as implemented in the ANTs software package [25], [26]. Statistical comparisons between Seq2Morph and LDDMM were performed using a paired t-test, with all analyses conducted in MATLAB R2024a. To further demonstrate the feasibility of our method compared to commercial systems, we performed a pilot evaluation using a single patient dataset with the Eclipse DIR system (Varian Medical Systems, Palo Alto, CA, USA). As this system is not utilized in our clinical routine, it provided an independent comparison. These results are available in the Supplementary Data.

3. Results

During radiotherapy, anatomical changes occurred across the entire test cohort, characterized by progressive parotid gland shrinkage (mean volume reduction: 24.4% ± 9.2% over the 6-week course), as well as frequently observed GTVp shrinkage and airway expansion. Despite these variations, the proposed DIR method maintained DSC values above 0.8 for all evaluated organs across weekly registrations and in both deformation directions (Fig. 2).

Fig. 2 — Dice similarity coefficient (DSC) results for the Primary Gross Tumor Volume (GTVp), parotid gland, and submandibular gland were derived using Seq2Morph. The first row showed Deformable image registration (DIR) results from planning CT to weekly CBCT, while the second row showed DIR results from weekly CBCT to planning CT. Each box plot illustrated the distribution of DSC values. The central horizontal line indicated the median (50th percentile), and the 'x' marker represented the mean. The bottom and top edges of the box corresponded to the 25th (Q1) and 75th (Q3) percentiles, respectively.

Aggregated DSC values for Seq2Morph were 0.83 ± 0.04 for GTVp, 0.85 ± 0.04 for the parotid glands, and 0.82 ± 0.04 for the submandibular glands. Corresponding values for LDDMM were 0.81 ± 0.06, 0.81 ± 0.10, and 0.82 ± 0.11, respectively. While differences in mean DSC were not statistically significant (p > 0.1 across structures), Seq2Morph exhibited lower standard deviations, indicating higher consistency across patients and time points.

Longitudinal performance metrics showed that over six weeks, the DSC for Seq2Morph decreased by approximately 5%, while LDDMM showed a reduction of 10–15%. Image similarity analysis quantified this alignment, with the proposed method achieving a mean SSIM of 0.91 ± 0.03, compared to 0.87 ± 0.04 for rigidly aligned pCT–CBCT pairs (p < 0.001).

Distance-based metrics remained consistent with these findings. The mean Hausdorff distance across all weekly registrations was under 2 mm (Table 1), and the 95th percentile Hausdorff distance ranged from 4–5 mm for GTVp and 3–5 mm for the parotid and submandibular glands. Seq2Morph recorded lower mean and 95th percentile Hausdorff distances than LDDMM across all evaluated structures, with detailed LDDMM comparisons provided in Supplementary Table S1.

Table 1.

The mean Hausdorff distance and the 95% Hausdorff distance (HD) for bidirectional Deformable image registration (DIR) results derived using Seq2Morph. GTVp: Primary Gross tumor volume. PG: Parotid gland. SM: Submandibular gland.

	Mean HD (mm)
	Week1	Week2	Week3	Week4	Week5	Week6
	pCT → CBCT
GTVp	1.52 ± 0.45	1.56 ± 0.52	1.52 ± 0.34	1.59 ± 0.57	1.84 ± 0.44	2.21 ± 0.33
PG	1.15 ± 0.31	1.21 ± 0.26	1.25 ± 0.38	1.43 ± 0.38	1.41 ± 0.46	1.58 ± 0.36
SM	1.06 ± 0.16	1.24 ± 0.24	1.39 ± 0.42	1.39 ± 0.39	1.54 ± 0.34	1.78 ± 0.34
	CBCT → pCT
GTVp	1.21 ± 0.35	1.38 ± 0.33	1.63 ± 0.42	1.61 ± 0.23	1.57 ± 0.44	1.88 ± 0.36
PG	1.07 ± 0.31	1.12 ± 0.21	1.19 ± 0.21	1.38 ± 0.26	1.53 ± 0.23	1.59 ± 0.22
SM	1.17 ± 0.39	1.28 ± 0.29	1.39 ± 0.39	1.54 ± 0.42	1.57 ± 0.40	1.59 ± 0.44
	Week1	Week2	Week3	Week4	Week5	Week6
	HD95 (mm)
	pCT → CBCT
GTVp	4.58 ± 2.29	4.78 ± 2.39	5.41 ± 2.29	5.33 ± 2.73	6.30 ± 2.78	6.51 ± 2.10
PG	3.33 ± 1.03	3.49 ± 0.89	3.40 ± 0.81	4.14 ± 1.27	4.45 ± 1.02	4.39 ± 1.02
SM	3.00 ± 0.71	3.48 ± 0.93	3.36 ± 0.74	3.46 ± 0.90	4.22 ± 0.83	5.06 ± 1.46
	CBCT → pCT
GTVp	4.24 ± 1.89	4.17 ± 1.80	4.79 ± 1.85	4.59 ± 1.69	4.79 ± 1.73	5.73 ± 2.03
PG	3.46 ± 0.89	3.63 ± 0.66	3.62 ± 1.12	4.14 $\pm 1.12$	4.45 ± 1.01	4.39 ± 1.02
SM	3.12 ± 0.95	3.11 ± 0.58	3.41 ± 0.73	4.41 ± 1.33	4.50 ± 1.09	5.26 ± 1.16

Open in a new tab

Deformation field analysis yielded a bidirectional DVF correlation exceeding 0.99 across all patients and weekly time points. Jacobian determinant analysis showed expansion and shrinkage patterns within target structures that corresponded to the observed anatomical changes (Fig. 3).

Fig. 3 — Jacobian determinant maps of bidirectional Deformation Vector Fields (DVFs) overlaid on imaging data from two representative patients. The Jacobian map derived from the DVF (planning CT → Week 6 CBCT) was overlaid on the planning CT, while the map derived from the DVF (Week 6 CBCT → planning CT) was overlaid on the Week 6 CBCT. GTVp: Primary Gross tumor volume.

Regarding computational efficiency, the proposed method completed six weekly bidirectional DIR tasks in under 3 min (averaging approximately 30 s per registration), representing a tenfold reduction in processing time compared to the 30 min required by LDDMM.

For two separate cases, high-deformation scenarios were observed at week 6 relative to the baseline planning CT, with volume reductions in the parotid glands (31% and 28%), submandibular glands (15% and 17%), and GTVp (18% and 27%). These changes are documented through longitudinal imaging and DVFs (Fig. 4, Fig. 5). Analysis of these deformation fields confirmed the absence of local singularities or folding artifacts, even in regions of pronounced anatomical changes. Under these conditions, structure alignment remained consistent, with DSC values exceeding 0.8 for all evaluated organs in both registration directions.

Fig. 4 — Proposed bidirectional deformable image registration (DIR) results for two representative cases. (a)–(c) Weekly CBCT scans with manual contours (color-filled) and planning contours (white solid lines). (d)–(f) Forward DIR results (weekly CBCT to planning CT): planning contours (white solid lines) were compared with deformed CBCT contours (color-filled). (g)–(i) Backward DIR results (planning CT to weekly CBCT): manual CBCT contours (white solid lines) were compared with deformed planning contours (color-filled). *Note: Only results from Week 3 to Week 6 are displayed, as significant anatomical deformations become more pronounced in the later stages of treatment.* GTVp: Primary Gross tumor volume. Lt: Left. Rt: Right.

Fig. 5 — Two examples of bidirectional displacement vector fields (DVFs) between planning CT and Week 6 CBCT. (a)–(c) Planning CT scans were overlaid with red arrows, which represented the forward DVFs from the planning CT to the Week 6 CBCT. (d)–(f) Week 6 CBCT scans were overlaid with green arrows, which represented the backward DVFs from the Week 6 CBCT to the planning CT. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

4. Discussion

This study introduced Seq2Morph, a deep learning-based DIR algorithm designed for ART in HNC. Our key finding was that by leveraging spatiotemporal information from sequential scans, Seq2Morph provided robust DIR performance that was comparable or superior to conventional methods. Crucially, it achieved this with a significant reduction in computation time, addressing a critical need for efficiency in clinical ART workflows.

We improved the applicability and speed of the registration algorithm by adopting a patch-based approach. In radiotherapy, the volume size of pCT and CBCT varies depending on target size and location. Applying a full-volume deep learning algorithm requires cropping and resampling to match a fixed matrix size, which can lead to a loss of field of view and resolution. Additionally, processing full-size volumetric imaging data can lead to memory constraints, limiting the ability to develop deep and wide neural networks. Since we utilized entire temporal imaging sequences to incorporate spatiotemporal features for DIR improvement, the patch-based approach provided significant advantages. Furthermore, this method allowed us to address the issue of limited datasets. By generating 25,600 paired patches per pCT-CBCT pair, we effectively created a large dataset suitable for deep learning studies.

As a critical performance enhancer over our previous version of Seq2Morph, we incorporated a 3D SAM immediately before the convolutional recurrent network [23], [24]. This enabled the model to focus on highly deformable regions and analyze their spatiotemporal patterns using subsequent ConvLSTM blocks [27], [28]. SAM have been shown to improve deep learning models by highlighting critical regions during training [27], [28]. We found that this provided a substantial benefit to our model. Additionally, sequential imaging data inherently captured structural changes throughout radiotherapy. We leveraged this feature to build a deep learning network optimized for DIR, addressing the limitations of conventional algorithms that typically registered only two image sets at a time.

Regarding the benchmark selection, we employed LDDMM as the primary reference for comparison. While commercially available DIR packages, such as MIM Maestro (MIM Software Inc., Cleveland, OH, USA) and Eclipse (Varian Medical Systems, Palo Alto, CA, USA), are widely utilized in clinical settings, we intentionally excluded MIM from our primary quantitative evaluation. At our institution, physicians frequently use structures propagated via MIM DIR as references for manual contouring on CBCT. Consequently, including MIM in the benchmark could introduce a significant incorporation bias. Furthermore, LDDMM offers a rigorous mathematical framework for large deformations, justifying its role as a robust benchmark for evaluating our deep learning model [25].

We incorporated an inverse consistency loss into our training objective, a deliberate design choice given its importance for adaptive radiotherapy [11], [22]. In applications such as dose accumulation, the geometric alignment between forward and inverse DVFs is paramount [13], [29]. Discrepancies can reveal systematic registration errors and lead to dose warping inaccuracies. Our results confirmed the success of this strategy. The DVF and Jacobian determinant maps showed highly correlated patterns in both directions, validating that explicitly optimizing for inverse consistency yielded robust and geometrically plausible bidirectional transformations suitable for clinical use [11], [16], [22].

Evaluation results demonstrated that the proposed Seq2Morph model was comparable to or outperformed LDDMM in DIR performance. Notably, Seq2Morph generated six weekly DVFs in both directions—including propagated structures—within 3 min and averaging 30 s. In contrast, LDDMM required 5 min per DIR task, meaning that processing a full sequential imaging dataset would take approximately 30 min. Given the demands of a busy clinical environment, our method significantly improved workflow efficiency by reducing computation time while maintaining robust DIR performance. Furthermore, fixed parameters (e.g., gradient step 0.05, bin size 32) are generally used for LDDMM across the entire patient cohort. However, optimal parameter selection is crucial for achieving high-quality DIR, often requiring extensive parameter tuning via grid search or other optimization methods. Head and neck patients exhibit varying deformation characteristics—some show minimal anatomical changes, while others experience significant volume shrinkage and deformation [5], [6]. LDDMM parameters must be adjusted accordingly, but in clinical practice, this is unrealistic due to the iterative effort required to evaluate different parameter combinations [10], [12]. In contrast, deep learning models inherently learn from diverse deformation patterns during training, allowing for automatic adaptation to patient-specific variations [10], [12]. This explains why Seq2Morph exhibited more consistent DIR performance with lower variability in quantitative results.

Our clinical workflow for ART in HNC patients critically depends on monitoring anatomical changes in OARs, such as the parotid and submandibular glands. A weekly volume change greater than 5% triggers a qualitative review by our physics team to assess the need for an offline ART plan. The feasibility of this process, along with future plans for dosimetric analysis via dose accumulation, hinges on the availability of fast and reliable DIR. This presented a significant clinical need for a tool like Seq2Morph.

To benchmark its clinical utility, we compared our algorithm against the widely used commercial DIR package. In this comparison, Seq2Morph demonstrated improved performance, showing higher similarity in the deformed images (Fig. S1) and achieving a DSC at least 10% higher for contour propagation (Fig. S2 and Table S2). Furthermore, Seq2Morph offered a dramatic improvement in computational efficiency: it completed bidirectional DIR for all six weekly time points in under 3 min and averaging 30 s per patient, whereas another commercial software required approximately 3 min for a single task.

This study has several limitations. We relied on image similarity and contour accuracy as evaluation proxies due to the absence of a ground-truth DVF. Our quantitative analysis was limited to the parotid and submandibular glands, as our clinical program monitors these OARs to identify candidates for ART [29] and manual contours were available for them, though we qualitatively confirmed accuracy on smaller structures like the epiglottis (Fig. S3). Lastly, the model was trained on an internal dataset. To partially address this, we successfully tested its robustness on CBCT images from a different device (Fig. S4), but a comprehensive multi-institutional validation is still required to confirm its generalization performance.

In conclusion, Seq2Morph was a fast and robust deep learning DIR algorithm for head and neck cancer. By combining a patch-based strategy, 3D SAM, and a sequential learning framework, the model accurately captured spatiotemporal deformation patterns with high computational efficiency. Its performance can enhance adaptive radiotherapy by providing rapid, precise image registration and enabling retrospective analysis of dose accumulation. Future work will focus on extending this model to predict anatomical changes, further enhancing its utility for personalized treatment adaptation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was funded in part through the NIH/NCI Cancer Center Support Grant P30CA008748.

Footnotes

^{Appendix A}

Supplementary data to this article can be found online at https://doi.org/10.1016/j.phro.2026.100928.

Appendix A. Supplementary data

The following are the Supplementary data to this article:

Supplementary Data 1

mmc1.pdf^{(423.8KB, pdf)}

References

1.Langius J.A.E., Zandbergen M.C., Eerenstein S.E.J., et al. Effect of nutritional interventions on nutritional status, quality of life and mortality in patients with head and neck cancer receiving (chemo)radiotherapy: a systematic review. Clin Nutr. 2013;32:671–678. doi: 10.1016/j.clnu.2013.06.012. [DOI] [PubMed] [Google Scholar]
2.Lønbro S., Dalgas U., Primdahl H., et al. Feasibility and efficacy of progressive resistance training and dietary supplements in radiotherapy treated head and neck cancer patients—the DAHANCA 25A study. Acta Oncol. 2013;52:310–318. doi: 10.3109/0284186X.2012.741325. [DOI] [PubMed] [Google Scholar]
3.Orell H., Schwab U., Saarilahti K., et al. Nutritional counseling for head and neck cancer patients undergoing (chemo)radiotherapy: a prospective randomized trial. Front Nutr. 2019;6:22. doi: 10.3389/fnut.2019.00022. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Hansen E.K., Bucci M.K., Quivey J.M., et al. Repeat CT imaging and replanning during the course of IMRT for head-and-neck cancer. Int J Radiat Oncol Biol Phys. 2006;64:355–362. doi: 10.1016/j.ijrobp.2005.07.957. [DOI] [PubMed] [Google Scholar]
5.Bhide S.A., Davies M., Burke K., et al. Weekly volume and dosimetric changes during chemoradiotherapy with intensity modulated radiation therapy for head and neck cancer: a prospective observational study. Int J Radiat Oncol Biol Phys. 2010;76:1360–1368. doi: 10.1016/j.ijrobp.2009.04.005. [DOI] [PubMed] [Google Scholar]
6.Noble D.J., Yeap P.L., Seah S.Y.K., et al. Anatomical change during radiotherapy for head and neck cancer, and its effect on delivered dose to the spinal cord. Radiother Oncol. 2019;130:32–38. doi: 10.1016/j.radonc.2018.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Li X., Zhang Y., Shi Y., et al. Evaluation of deformable image registration for contour propagation between CT and cone-beam CT images in adaptive head and neck radiotherapy. Technol Health Care. 2016;24(Suppl 1):S747–S755. doi: 10.3233/THC-161204. [DOI] [PubMed] [Google Scholar]
8.Castadot P., Lee J.A., Parraga A., et al. Comparison of 12 deformable registration strategies in adaptive radiation therapy for the treatment of head and neck tumors. Radiother Oncol. 2008;89:1–12. doi: 10.1016/j.radonc.2008.04.010. [DOI] [PubMed] [Google Scholar]
9.Sotiras A., Davatzikos C., Paragios N. Deformable medical image registration: a survey. IEEE Trans Med Imaging. 2013;32:1153–1190. doi: 10.1109/TMI.2013.2265603. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Fu Y., Lei Y., Wang T., et al. Deep learning in medical image registration: a review. Phys Med Biol. 2020;65 doi: 10.1088/1361-6560/ab843e. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Ding Y., Feng H., Yang Y., et al. Deep-learning-based fast and accurate 3D CT deformable image registration in lung cancer. Med Phys. 2023;50:6864–6880. doi: 10.1002/mp.16548. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Balakrishnan G., Zhao A., Sabuncu M.R., et al. VoxelMorph: a learning framework for deformable medical image registration. IEEE Trans Med Imaging. 2019;38:1788–1800. doi: 10.1109/TMI.2019.2897538. [DOI] [PubMed] [Google Scholar]
13.Cagni E., Botti A., Orlandi M., et al. Evaluating the quality of patient-specific deformable image registration in adaptive radiotherapy using a digitally enhanced head and neck phantom. Appl Sci. 2022;12:9493. doi: 10.3390/app12199493. [DOI] [Google Scholar]
14.Hou J., Guerrero M., Chen W., D’Souza W. Deformable planning CT to cone-beam CT image registration in head-and-neck cancer. Med Phys. 2011;38:2088–2094. doi: 10.1118/1.3554647. [DOI] [PubMed] [Google Scholar]
15.Singhrao K., Kirby N., Pouliot J. A three-dimensional head-and-neck phantom for validation of multimodality deformable image registration for adaptive radiotherapy. Med Phys. 2014;41 doi: 10.1118/1.4901523. [DOI] [PubMed] [Google Scholar]
16.Lee D., Alam S., Jiang J., et al. Seq2Morph: a deep learning deformable image registration algorithm for longitudinal imaging studies and adaptive radiotherapy. Med Phys. 2023;50:970–979. doi: 10.1002/mp.16026. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Ronneberger O., Fischer P., Brox T. U-Net: convolutional networks for biomedical image segmentation. Lect Notes Comput Sci. 2015;9351:234–241. doi: 10.1007/978-3-319-24574-4_28. [DOI] [Google Scholar]
18.Lee D., Alam S., Jiang J., et al. Deformation-driven Seq2Seq longitudinal tumor and organs-at-risk prediction for radiotherapy. Med Phys. 2021;48:4784–4798. doi: 10.1002/mp.15075. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Lee D., Hu Y., Kuo L., et al. Deep learning-driven predictive treatment planning for adaptive radiotherapy of lung cancer. Radiother Oncol. 2022;169:57–63. doi: 10.1016/j.radonc.2022.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Shi X., Chen Z., Wang H., et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Adv Neural Inf Process Syst. 2015;28 [Google Scholar]
21.Shi X., Gao Z., Lausen L., et al. Deep learning for precipitation nowcasting: a benchmark and a new model. Adv Neural Inf Process Syst. 2017;30 [Google Scholar]
22.Li Y., Tang H., Wang W., et al. Dual attention network for unsupervised medical image registration based on VoxelMorph. Sci Rep. 2022;12:16250. doi: 10.1038/s41598-022-20589-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Wang Y, Qian W, Li M, et al. A transformer-based network for deformable medical image registration. arXiv preprint. 2022;arXiv:2202.12104. https://doi.org/10.48550/arXiv.2202.12104.
24.Liu Y, Chen J, Zuo L, et al. Vector field attention for deformable image registration. arXiv preprint. 2024;arXiv:2407.10209. https://doi.org/10.48550/arXiv.2407.10209. [DOI] [PMC free article] [PubMed]
25.Beg M.F., Miller M.I., Trouvé A., et al. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Int J Comput Vis. 2005;61:139–157. doi: 10.1023/B:VISI.0000043755.93987.aa. [DOI] [Google Scholar]
26.Avants B.B., Tustison N.J., Song G., et al. A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage. 2011;54:2033–2044. doi: 10.1016/j.neuroimage.2010.09.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Oktay O, Schlemper J, Folgoc LL et al. Attention U-Net: Learning Where to Look for the Pancreas. 2018; arXiv:1804.03999. https://doi.org/10.48550/arXiv.1804.03999.
28.Junyu C., Eric F., Yufan H., et al. TransMorph: Transformer for unsupervised medical image registration. Med Imag Anal. 2021;82 doi: 10.1016/j.media.2022.102615. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Michalis A., Eric A., Phillip L., Shira A., et al. Clinical experience with an offline adaptive radiation therapy head and neck program: dosimetric benefits and opportunities for patient selection. Int J Radiat Oncol Biol Phys. 2024;119:1557–1568. doi: 10.1016/j.ijrobp.2024.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data 1

mmc1.pdf^{(423.8KB, pdf)}

[b0005] 1.Langius J.A.E., Zandbergen M.C., Eerenstein S.E.J., et al. Effect of nutritional interventions on nutritional status, quality of life and mortality in patients with head and neck cancer receiving (chemo)radiotherapy: a systematic review. Clin Nutr. 2013;32:671–678. doi: 10.1016/j.clnu.2013.06.012. [DOI] [PubMed] [Google Scholar]

[b0010] 2.Lønbro S., Dalgas U., Primdahl H., et al. Feasibility and efficacy of progressive resistance training and dietary supplements in radiotherapy treated head and neck cancer patients—the DAHANCA 25A study. Acta Oncol. 2013;52:310–318. doi: 10.3109/0284186X.2012.741325. [DOI] [PubMed] [Google Scholar]

[b0015] 3.Orell H., Schwab U., Saarilahti K., et al. Nutritional counseling for head and neck cancer patients undergoing (chemo)radiotherapy: a prospective randomized trial. Front Nutr. 2019;6:22. doi: 10.3389/fnut.2019.00022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0020] 4.Hansen E.K., Bucci M.K., Quivey J.M., et al. Repeat CT imaging and replanning during the course of IMRT for head-and-neck cancer. Int J Radiat Oncol Biol Phys. 2006;64:355–362. doi: 10.1016/j.ijrobp.2005.07.957. [DOI] [PubMed] [Google Scholar]

[b0025] 5.Bhide S.A., Davies M., Burke K., et al. Weekly volume and dosimetric changes during chemoradiotherapy with intensity modulated radiation therapy for head and neck cancer: a prospective observational study. Int J Radiat Oncol Biol Phys. 2010;76:1360–1368. doi: 10.1016/j.ijrobp.2009.04.005. [DOI] [PubMed] [Google Scholar]

[b0030] 6.Noble D.J., Yeap P.L., Seah S.Y.K., et al. Anatomical change during radiotherapy for head and neck cancer, and its effect on delivered dose to the spinal cord. Radiother Oncol. 2019;130:32–38. doi: 10.1016/j.radonc.2018.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0035] 7.Li X., Zhang Y., Shi Y., et al. Evaluation of deformable image registration for contour propagation between CT and cone-beam CT images in adaptive head and neck radiotherapy. Technol Health Care. 2016;24(Suppl 1):S747–S755. doi: 10.3233/THC-161204. [DOI] [PubMed] [Google Scholar]

[b0040] 8.Castadot P., Lee J.A., Parraga A., et al. Comparison of 12 deformable registration strategies in adaptive radiation therapy for the treatment of head and neck tumors. Radiother Oncol. 2008;89:1–12. doi: 10.1016/j.radonc.2008.04.010. [DOI] [PubMed] [Google Scholar]

[b0045] 9.Sotiras A., Davatzikos C., Paragios N. Deformable medical image registration: a survey. IEEE Trans Med Imaging. 2013;32:1153–1190. doi: 10.1109/TMI.2013.2265603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0050] 10.Fu Y., Lei Y., Wang T., et al. Deep learning in medical image registration: a review. Phys Med Biol. 2020;65 doi: 10.1088/1361-6560/ab843e. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0055] 11.Ding Y., Feng H., Yang Y., et al. Deep-learning-based fast and accurate 3D CT deformable image registration in lung cancer. Med Phys. 2023;50:6864–6880. doi: 10.1002/mp.16548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0060] 12.Balakrishnan G., Zhao A., Sabuncu M.R., et al. VoxelMorph: a learning framework for deformable medical image registration. IEEE Trans Med Imaging. 2019;38:1788–1800. doi: 10.1109/TMI.2019.2897538. [DOI] [PubMed] [Google Scholar]

[b0065] 13.Cagni E., Botti A., Orlandi M., et al. Evaluating the quality of patient-specific deformable image registration in adaptive radiotherapy using a digitally enhanced head and neck phantom. Appl Sci. 2022;12:9493. doi: 10.3390/app12199493. [DOI] [Google Scholar]

[b0070] 14.Hou J., Guerrero M., Chen W., D’Souza W. Deformable planning CT to cone-beam CT image registration in head-and-neck cancer. Med Phys. 2011;38:2088–2094. doi: 10.1118/1.3554647. [DOI] [PubMed] [Google Scholar]

[b0075] 15.Singhrao K., Kirby N., Pouliot J. A three-dimensional head-and-neck phantom for validation of multimodality deformable image registration for adaptive radiotherapy. Med Phys. 2014;41 doi: 10.1118/1.4901523. [DOI] [PubMed] [Google Scholar]

[b0080] 16.Lee D., Alam S., Jiang J., et al. Seq2Morph: a deep learning deformable image registration algorithm for longitudinal imaging studies and adaptive radiotherapy. Med Phys. 2023;50:970–979. doi: 10.1002/mp.16026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0085] 17.Ronneberger O., Fischer P., Brox T. U-Net: convolutional networks for biomedical image segmentation. Lect Notes Comput Sci. 2015;9351:234–241. doi: 10.1007/978-3-319-24574-4_28. [DOI] [Google Scholar]

[b0090] 18.Lee D., Alam S., Jiang J., et al. Deformation-driven Seq2Seq longitudinal tumor and organs-at-risk prediction for radiotherapy. Med Phys. 2021;48:4784–4798. doi: 10.1002/mp.15075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0095] 19.Lee D., Hu Y., Kuo L., et al. Deep learning-driven predictive treatment planning for adaptive radiotherapy of lung cancer. Radiother Oncol. 2022;169:57–63. doi: 10.1016/j.radonc.2022.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0100] 20.Shi X., Chen Z., Wang H., et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Adv Neural Inf Process Syst. 2015;28 [Google Scholar]

[b0105] 21.Shi X., Gao Z., Lausen L., et al. Deep learning for precipitation nowcasting: a benchmark and a new model. Adv Neural Inf Process Syst. 2017;30 [Google Scholar]

[b0110] 22.Li Y., Tang H., Wang W., et al. Dual attention network for unsupervised medical image registration based on VoxelMorph. Sci Rep. 2022;12:16250. doi: 10.1038/s41598-022-20589-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0115] 23.Wang Y, Qian W, Li M, et al. A transformer-based network for deformable medical image registration. arXiv preprint. 2022;arXiv:2202.12104. https://doi.org/10.48550/arXiv.2202.12104.

[b0120] 24.Liu Y, Chen J, Zuo L, et al. Vector field attention for deformable image registration. arXiv preprint. 2024;arXiv:2407.10209. https://doi.org/10.48550/arXiv.2407.10209. [DOI] [PMC free article] [PubMed]

[b0125] 25.Beg M.F., Miller M.I., Trouvé A., et al. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Int J Comput Vis. 2005;61:139–157. doi: 10.1023/B:VISI.0000043755.93987.aa. [DOI] [Google Scholar]

[b0130] 26.Avants B.B., Tustison N.J., Song G., et al. A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage. 2011;54:2033–2044. doi: 10.1016/j.neuroimage.2010.09.025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0135] 27.Oktay O, Schlemper J, Folgoc LL et al. Attention U-Net: Learning Where to Look for the Pancreas. 2018; arXiv:1804.03999. https://doi.org/10.48550/arXiv.1804.03999.

[b0140] 28.Junyu C., Eric F., Yufan H., et al. TransMorph: Transformer for unsupervised medical image registration. Med Imag Anal. 2021;82 doi: 10.1016/j.media.2022.102615. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0145] 29.Michalis A., Eric A., Phillip L., Shira A., et al. Clinical experience with an offline adaptive radiation therapy head and neck program: dosimetric benefits and opportunities for patient selection. Int J Radiat Oncol Biol Phys. 2024;119:1557–1568. doi: 10.1016/j.ijrobp.2024.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Deep learning-based head and neck deformable image registration using spatio-temporal analysis and self attention

Donghoon Lee

Yu-Chi Hu

Teeradon TreeChairusame

Jung Hun Oh

Nancy Lee

Michalis Aristophanous

Laura Cerviño

Pengpeng Zhang

Graphical abstract

Highlights

Abstract

Background and purpose

Materials & methods

Results

Conclusion

1. Introduction

2. Materials and methods

2.1. Clinical dataset

2.2. Description of the proposed deep learning model

Fig. 1.

2.3. Quantitative evaluation

3. Results

Fig. 2.

Table 1.

Fig. 3.

Fig. 4.

Fig. 5.

4. Discussion

Declaration of competing interest

Acknowledgments

Footnotes

Appendix A. Supplementary data

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases