Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 1.
Published in final edited form as: Int J Radiat Oncol Biol Phys. 2022 Feb 4;113(2):426–436. doi: 10.1016/j.ijrobp.2022.01.050

Comprehensive Quantitative Evaluation of Variability in MR-guided Delineation of Oropharyngeal Gross Tumor Volumes and High-risk Clinical Target Volumes: An R-IDEAL Stage 0 Prospective Study

Carlos E Cardenas 1,*,, Sanne E Blinde 2,*, Abdallah S R Mohamed 3,*,, Sweet Ping Ng 3,4,*,, Cornelis Raaijmakers 5,*, Marielle Philippens 5,*, Alexis Kotte 5,*, Abrahim A Al-Mamgani 6,*, Irene Karam 7,*, David J Thomson 8, Jared Robbins 9,*, Kate Newbold 10,*, Clifton D Fuller 3,*,†,, Chris Terhaard 5,*,†,; Collaborators/Investigators:, Houda Bahig 11,*,, Pierre Blanchard 12,, Homan Dehnad 5,*, Patricia Doornaert 5,*, Hesham Elhalawani 3,, Steven J Frank 3,, Adam Garden 3,, G Brandon Gunn 3,, Olga Hamming-Vrieze 6,*, Mona Kamal 3,, Nicolien Kasperts 5,*, Lip Wai Lee 8,*, Brigid A McDonald 1,*, Andrew McPartlin 8,*, Mohamed AM Meheissen 13,, William H Morrison 3,, Arash Navran 6,*, Christopher M Nutting 10,*, Frank Pameijer 14,*, Jack Phan 3,, Ian Poon 7,*, David I Rosenthal 3,, Ernst J Smid 5,*, Andrew J Sykes 8,*
PMCID: PMC9119288  NIHMSID: NIHMS1801452  PMID: 35124134

Abstract

Purpose:

Tumor and target volume manual delineation remains a challenging task in head-and-neck cancer radiotherapy. The purpose of this study was to conduct a multi-institutional evaluation of manual delineations of gross tumor volume (GTV), high-risk clinical target volume (CTV), parotids, and submandibular glands on treatment simulation MR scans of oropharyngeal cancer (OPC) patients.

Methods:

Pre-treatment T1-weighted (T1w), T1-weighted with gadolinium contrast (T1w+C) and T2-weighted (T2w) MRI scans were retrospectively collected for 4 OPC patients under an IRB-approved protocol. The scans were provided to twenty-six radiation oncologists from seven international cancer centers who participated in this delineation study. In addition, patients’ clinical history and physical examination findings, along with a medical photographic image and radiological results, were provided. The contours were compared using overlap/distance metrics using both STAPLE and pair-wise comparisons. Lastly, participants completed a brief questionnaire to assess participants’ experience and CTV delineation institutional practices.

Results:

Large variability was measured between observers’ delineations for GTVs and CTVs. The mean Dice Similarity Coefficient values across all physicians’ delineations for GTVp, GTVn, CTVp, and CTVn were 0.77, 0.67, 0.77, and 0.69, respectively, for STAPLE comparison and 0.67, 0.60, 0.67, and 0.58, respectively, for pair-wise analysis. Normal tissue contours were defined more consistently when considering overlap/distance metrics. The median radiation oncology clinical experience was 7 years. The median experience delineating on MRI was 3.5 years. The GTV-to-CTV margin used was 10 mm for six of seven participant institutions. One institution used 8 mm and three participants (from three different institutions) used a margin of 5 mm.

Conclusion:

The data from this study suggests that appropriate guidelines, contouring quality assurance sessions, and training are still needed for the adoption of MR-based treatment planning for head-and-neck cancers. Such efforts should play a critical role in reducing delineation variation and ensure standardization of target design across clinical practices.

Introduction

The widespread adoption of highly conformal techniques such as intensity modulated radiation therapy (IMRT) and volumetric modulated arc therapy (VMAT) for head and neck cancer treatment resulted in improved sparing of organs at risk and reduced the toxicity burdens typically associated with radiation therapy. While the clinical benefits of these techniques are well documented 1-3, the use of highly conformal plans has brought about new challenges to the clinic 4. With the use of high precision treatments there has been a greater focus on accurate target delineation, patient set-up, and treatment delivery since small errors while performing these tasks may result in significant under-dosage of at-risk regions and/or unnecessary irradiation of surrounding organs at risk (OARs).

The delineation of tumor and target volumes has been greatly improved by the adaptation of multi-modality imaging in radiation oncology. It is common clinical practice to use a contrast-enhanced computed tomography (CE-CT) with or without a fluorodeoxyglucose positron emission tomography (FDG-PET) scan for head and neck cancers as they greatly improve the ability to see macroscopic tumor involvement over non-contrast CT alone. More recently, magnetic resonance imaging (MRI) has become more widely used in radiotherapy planning due to the higher soft tissue contrast over CT, which often allows for better distinction between healthy tissues and appreciable disease. Furthermore, with the advent of the MR-Linac 5-7 and MR-guided radiation therapy (MRgRT), there is a trend toward a MR-based radiation treatment planning 8,9 increasing the likelihood of future MR-based tumor and target volume delineation.

Inter- and intra-observer variability when delineating gross tumor volumes (GTV) and clinical target volumes (CTV) have been widely studied for many treatment sites with many reports suggesting large heterogeneity amongst practitioners. This large variability in target delineation is considered a major source of uncertainty 10,11 and reduces our ability to systematically assess the quality of the radiation therapy plans. The inter-observer variability for delineation of the CTV for oropharyngeal cancer is one of the largest reported in the literature 11. When delineating tumors alone, Thiagarajan et al 12 investigated the contributions of MRI and FDG-PET on forty-one head and neck cancer patients and found improved agreement when using multi-modality information over single modality alone; in addition, they found that the lack of physical examination (PE) findings resulted in an underestimation of mucosal disease when cases were presented without knowledge of PE findings. Focusing on oropharyngeal cancers, Bird et al 13 found that inter-observer delineation variability was higher when using CT-alone than both MR-alone and CT+MR. Similar results have been reported by Rasch et al 14 for nasopharynx tumors. Hong et al 15 conducted a study to assess this variability on an oropharyngeal cancer patient and noticed significant variability in clinical target volume delineation and clinical practices. While several head and neck target delineation guidelines have been published in recent years 16-21, these guidelines focus on CT-based radiotherapy and may not be suitable for MR-based radiation treatment planning 22. Furthermore, MR-based CTV delineation variability is currently unknown for oropharyngeal cancers.

The MR-Linac Consortium is a multi-site cooperative group 23, committed to prospective technology development in a programmatic format, using a paradigm based on the surgical technology IDEAL (Idea, Development, Exploration, Assessment, Long-term study) conceptual framework, deemed R-IDEAL 24 (Radiotherapy- Idea, Development, Exploration, Assessment, Long-term study). As part of this effort in preparation for now-open Phase II adaptive MR-guided radiotherapy trial for oropharyngeal cancer 25, the MR-Linac Consortium Head and Neck Tumor Site Group sought to undertake a prospective technical benchmarking evaluation (R-IDEAL Stage 0) of human segmentation performance, as part of a coherent quality assurance program 26 for multisite MR-Linac trials25.

Consequently, the aim of this prospective, blinded R-IDEAL Stage 0 technology implementation study was to 1) quantify observer-dependent manual segmentation variability for GTVs and high-risk CTVs as a necessary prerequisite for adaptive trials that modify tumor volumes on the MR-Linac, and 2) index organs-at-risk (parotid and submandibular glands (SMG)) for oropharyngeal cancer patients using MRI inputs.

Methods and Materials

Patients and Image Acquisition

Four patients with oropharyngeal cancer were retrospectively selected by two experienced head and neck radiation oncologists after receiving institutional review board approval. Patients with both early and locally advanced stage disease were selected. Patient and tumor characteristics are shown in Table 1.

Table 1.

Patient and tumor characteristics (AJCC 8th Edition)

CASE 1 CASE 2 CASE 3 CASE 4
AGE (YEARS) 51 71 58 57
SEX F F M M
PRIMARY SITE Left tonsil Base of tongue right-sided Posterior wall Base of tongue bilateral
T-STAGE 1 2 3 3
N-STAGE 0 0 2b 1
P16 + + +
SMOKING STATUS Non-smoker Non-smoker Smoker Smoker

Pre-treatment T1-weighted (T1w), T1-weighted with gadolinium contrast (T1w+C) and T2-weighted (T2w) MRI scans were available for all patients. These scans were acquired on an Ingenia 3T MRI scanner (Philips, Eindhoven, The Netherlands) for treatment planning purposes with each patient in the treatment planning position and using a 5-point thermoplastic mask and an individualized head support. The scans covered the region extending from the caudal-edge of the nasopharynx region cranially to the hypopharynx region caudally in the superior–inferior direction. Details on image acquisition are presented in Supplementary Table S1. The T1, T1w, and T2 scans were acquired as a series and visually inspected to ensure accurate co-registration.

Delineation Study

Twenty-six radiation oncologists and one dedicated head and neck radiologist from seven international centers (UMC Utrecht (The Netherlands), University of Texas MD Anderson Cancer Center (Houston, Texas, USA), NKI Antoni van Leeuwenhoek (Amsterdam, The Netherlands), Sunnybrook Health Sciences Centre (Toronto, Ontario, Canada), Froedtert & Medical College of Wisconsin Cancer Center (Milwaukee, Wisconsin, USA), The Royal Marsden NHS Foundation Trust (London, UK), The Christie NHS Foundation Trust, (Manchester, UK)) were asked to delineate the parotids, submandibular glands, the GTV and high-risk CTV. When nodal disease was present, participants were asked to delineate GTVp/CTVp (primary) and GTVn/CTVn (nodal) as separate structures to investigate delineation differences between primary and nodal disease regions.

The available pre-treatment MRI scans (T1w, T1w+C, and T2w) were provided with each patient’s clinical history and physical examination findings along with a medical photographic image and radiological results. Participants were asked to delineate the requested structures based on their own institutional guidelines. In addition, all participants received a basic questionnaire to determine years of experience in radiotherapy, years of experience with delineating on MRI, delineation software, and institutional GTV-to-CTV margin expansion values used.

Quantitative Analysis

The contours delineated in this study were compared to quantify variability. The Dice similarity coefficient27 (DSC), the mean surface distance (MSD), and the 95th Percentile Hausdorff distance (95HD) where calculated. These metrics are defined as follows,

DSC=2ABA+B (1)
MSD=12(d¯A,B+d¯B,A) (2)
95HD=percentile(dA,BdB,A,95th) (3)

where ∣A∣ and ∣B∣ are the number of voxels from contoured volumes A and B, respectively; ∣A⋂B∣ denotes the number of voxels included in the intersection between volumes A and B; dA,B is a vector containing all minimum Euclidian distances from the surface point from volume A to B; and d¯A,B is the average value in the vector d. The DSC ranges in values from 0 (no overlap) to 1 (perfect overlap); for both MSD and 95HD, values closer to zero represent better agreement between two contours’ surfaces.

These metrics (Eqs. 1-3) were calculated to assess the manual delineations using two approaches (Figure 1). First, a physician pair-wise comparison of the contours was performed, meaning that all physician comparisons (i.e. Physician 1 vs Physician 2, Physician 1 vs Physician 3, …, Physician 25 vs Physician 26) were considered. This comparison provides a real-world estimate of the delineation variability amongst the participants providing the minimum and maximum extreme derived from the overlap and distance metrics. Second, we estimated a consensus volume using a modified version 28 of the Simultaneous Truth And Performance Level Estimation (STAPLE) algorithm 29. The STAPLE algorithm calculates the maximum likelihood estimates of the true positive and false negative of individual segmentations and uses these values to produce a volume that estimates the best agreement between the individual segmentations. A limitation to the STAPLE algorithm is that it does not take into consideration intensity information from the image; it only relies on individual segmentations. Yang et al 28 addressed this limitation by integrating a tissue appearance model which improved the generation of STAPLE contours through the addition of image intensities. The resulting consensus volumes for each patient were considered our ground-truth for this analysis and we compared each physician’s delineations using overlap (DSC) and distance (MSD and 95HD) metrics.

Figure 1.

Figure 1.

Methods for quantitative evaluation of manual contours. The STAPLE algorithm was used to generate a “consensus” contour (top) for each organ, tumor, and target volume contours; then individual physician contours are compared to the individual region of interest’s STAPLE contour. In addition, a pair-wise evaluation of the contours was performed (bottom); here, individual physician contours are individually compared to every other physician contour. This approach highlights the potential true disagreement between two individual physicians.

Results

Twenty-four out of twenty-six (92%) radiation oncologists submitted a full set of contours for all patients. Incomplete submissions from 2 participants led to the exclusion of all contours from these participants in the subsequent analysis. In addition, the head and neck radiologist only submitted GTVp and GTVn contours. Since other contoured structures were omitted (CTVs and normal tissues), it was decided to exclude the radiologist’s contours from the volumetric analysis and generation of the STAPLE volumes.

Twenty-two of twenty-six radiation oncologists’ questionnaires were completed. The median time of experience as a head and neck radiation oncologist was 7 years (range: 1-25 years) and the median clinical experience delineating on MRI was 3.5 years (range: 0-15 years). Two institutions reported using in-house contouring software, three institutions reported using Pinnacle treatment planning system, one institution reported using MIM, and members from one institution reported mixed use of contouring software (Pinnacle, Monaco, Velocity, and Eclipse). The participants reported using contouring software which they routinely used and were comfortable using. Three observers used automatic segmentation with manual edits to delineate the organs at risk. The GTV-to-CTV margin used was 10 mm for six of seven participant institutions. One institution used 8 mm and three delineators (from three different institutions) used a margin of 5 mm.

Volumes (in cm3) for the participant’s delineations and resulting STAPLE volumes are detailed in Supplementary Tables S2 and S3. Distribution of these volumes are shown in Figure 2. The median coefficient of variation (standard deviation / mean × 100%) across all cases for CTVs, GTVs, parotids, and submandibular glands were 40.9% (range: 30.4 – 69.5%), 34.5% (range: 12.2 – 101.0%), 12.5% (range: 8.5 – 14.9%), and 9.8% (range: 6.4 – 15.4%), respectively. The single radiologist’s GTV contours were consistently smaller than those found by most radiation oncologists across all cases presented in this study (Supplementary Table S2).

Figure 2.

Figure 2.

Volume (cm3) distributions for tumor, target, and normal tissue volumes contoured in the current study. Note that individual subplots may have different volume ranges.

Boxplots displaying distributions from the volumetric comparisons using STAPLE and pair-wise evaluation approaches are shown in Figures 3 (GTVs and CTVs) and 4 (normal tissues). A summary of these results is provided in Table S4. When considering primary and nodal volumes, both GTVp and CTVp delineations were found to be more variable than GTVn and CTVn delineations (p-values: 0. 01 and < 0.0001, respectively, Mann-Whitney U Test) when comparing their respective DSC values, measured against STAPLE, having ranges of 0-0.92, 0-0.95, 0.22-0.91, 0.12-0.91 for GTVp, GTVn, CTVp, and CTVn volumes, respectively, across all cases and radiation oncologist contours for each volume. When considering laterality of the normal tissues, there was no significant difference in DSC distributions (p-values > 0.1 for both parotids and submandibular glands, Mann-Whitney U Test) from left and right manual delineations (measured against STAPLE).

Figure 3.

Figure 3.

Boxplots demonstrating inter-observer variability for GTV and CTVs, shown in rows, for the Dice Similarity Coefficient (DSC), mean surface distance (MSD), and 95th percentile Hausdorff distance (95HD), shown in columns. All distances are in millimeters.

A pair-wise comparison of intra-institutional delineation variability for tumor and target volumes is provided in Figure S1 (institution number has been randomly assigned). Intra-institutional variability, measured by the median DSC values between all pair-wise comparisons available for individual institutions, ranged from 0.60 to 0.85 for GTVp, 0.50 to 0.85 for GTVn, 0.38 to 0.86 for CTVp, and 0.48 to 0.80 for CTVn across the seven participating institutions. Average (standard deviation) DSC values for the two institutions with more than 3 participants where 0.69 (0.11) and 0.66 (0.19) for GTVp, 0.71 (0.26) and 0.63 (0.24) for GTVn, 0.72 (0.08) and 0.68 (0.15) for CTVp, and 0.68 (0.17) and 0.65 (0.15) for CTVn.

Discussion

This study presents the results of a numerically robust R-IDEAL Stage 0 multi-institutional quantification of delineation variability of gross tumor volumes, high-risk clinical target volumes, parotids, and submandibular glands of oropharyngeal cancer patients when these structures are delineated on MR images alone (as would be the case for daily MR-Linac-based adaptive MR-guided-radiotherapy). The data suggests variability in the delineation of gross tumor and clinical target volumes across participants remains. For example, ratio in volumes between smallest and largest volumes (Vmax/Vmin) across all participants were as high as 31.0 for tumor volumes and 32.3 for target volumes (average across all cases: 10.7 and 11.4 for GTV and CTV, respectively; see Supplementary Data). Figure 5 shows individual participant’s delineations (center columns) for GTVs and CTVs on single axial T1w MR scan slices for the 4 cases presented in this study. In this figure, the right most panels (“CTV^”) show axial and sagittal or coronal views of the STAPLE contours for GTV and CTV, as well as the intersection and union of all participant’s CTVs. Interestingly, for 4 out 6 GTVs, the consensus GTV contours (STAPLE) were mostly covered by the intersection of all participant’s target volumes suggesting that the consensus derived GTV would receive appropriate coverage by all participant’s CTVs; however, for 2 cases the CTV intersection volume had little overlap with the STAPLE GTVs with one of these cases showing that there was not a single voxel in the patient’s MR scan where 100% of participants CTVs overlapped (Figure 5, panels denoted by asterisk).

Figure 5.

Figure 5.

Illustration demonstrating delineation variability for both GTV and CTV in all 4 cases. For cases 3 and 4, nodal disease is shown on their respective first rows then followed by primary disease and target delineations on the following row panels. From left to right each column shows an axial slice of the T2w MRI (T2w) scan provided, a) all participants’ GTV contours (GTV), b) all participants’ CTV delineations (CTV), c) axial view of STAPLE contours for GTV (yellow) and CTV (red), union of all CTV contours (green), and the intersection of all CTV contours (fuchsia). The asterisk (*) highlights a case where there is a lack of intersection volume between participants’ CTV contours.

There was higher agreement in the delineation of normal tissues with ratios in volumes across participants as high as 2.8 for parotids and 2.2 for submandibular glands (average across all cases: 1.9 and 1.6 for parotids and submandibular glands, respectively). Figure 4 shows the STAPLE and pair-wise comparison of variations in delineations per case. It is important to note that one participant contoured the left submandibular gland as the right submandibular gland, resulting in zero overlap (this contour was excluded from analysis and Figure 4) between this organ’s contour and the remaining delineations, highlighting the need for quality assurance of the contours.

Figure 4.

Figure 4.

Boxplots demonstrating inter-observer variability for parotids and submandibular glands (SGMs), shown in rows, for the Dice Similarity Coefficient (DSC), mean surface distance (MSD), and 95th percentile Hausdorff distance (95HD), shown in columns. All distances are in millimeters.

Two approaches were used to quantitatively evaluate agreement between participant’s contours: 1) individual contour comparison to consensus (generated via STAPLE) and 2) pair-wise comparison of the contours. The STAPLE algorithm generates a statistically-derived “consensus” volume by taking into consideration delineations from multiple observers. While the generation of a computationally-derived “consensus” can be attractive for inter-observer analyses, real-clinical scenarios lack “consensus” volumes; therefore, it could be argued that pair-wise analyses may provide a more accurate estimate of inter-observer variability. Our data showed that the STAPLE comparison distributions where tighter than those observed for the pair-wise comparison distributions for both tumor/target volumes and normal tissues (Figures 3 and 4, respectively). This was expected as the pair-wise comparison provides a more accurate quantification of extreme differences in contours between two participants (i.e. these differences are reduced by comparing these extremes to a common contour in the STAPLE analysis). Nevertheless, pair-wise analyses also have limitations. A clear limitation of these analyses is that they are sensitive to outliers in data distributions and the results of such analyses should be interpreted carefully.

The current study shows that there is similar intra-institutional variability across participant institutions, yet the presented analysis is limited by low number of observers at some institutions. No statistical difference was found between pair-wise analysis DSC distributions from the two centers that had more than 3 participants (p-value > 0.05 for all tumor and target volume comparisons, Mann-Whitney U Test).

Several studies have reported on delineation variability for head and neck GTV and CTVs 12,15,30-35. Anderson et al 31 investigated head and neck GTV delineation variability across multiple imaging modalities (CE-CT, FDG-PET/CT, and T1w+C MRI). MRI-based GTVs resulted, on average, in the largest delineated volumes, with an average intersection over union of 36% across three observers. In a similar study, Ng et al 33 found that delineations based on T1w+C MR imaging alone resulted in less GTV inter-observer delineation variability (median DSC of 0.58 in a pair-wise analysis) when compared to dual-energy CT. Our study reports median DSC values of 0.73 and 0.75 for GTVp and GTVn, respectively, suggesting better agreement in delineations. Gudi et al 34 reported moderate GTV delineation variability on both CE-CT and CE-CT + FDG-PET/CT (mean DSC values of 0.57 and 0.69, respectively) measured on 10 cases with pharyngolaryngeal cancer; in the current study, we observe mean DSC values of 0.68 and 0.65 for GTVp and GTVn, respectively, across 4 cases. Similarly, Thiagarajan et al 12 showed significant variability in GTV delineation when evaluating the individual contributions of MRI, PET, and physical evaluation in the delineation process. Concerning CTVs, Hong et al previously showed large heterogeneity in target design between different observers 15. In their study, the authors provided participants (n=20) with CT scans and GTV contours of an oropharyngeal cancer patient (T2 N1 M0 squamous cell carcinoma of the tonsil) and asked the participants to delineate CTVs. When considering high-risk CTVs, the coefficient of variation in their study was 191% (w/ μ = 43 cm3 and σ = 82 cm3) showing larger variation than the current study (CoV = 41%).

To address this reported variability in CTV delineation, some groups have proposed the use of uniform margin expansions 36 or computational methods for the automatic delineation of CTVs 37-39. Flansen et al 36 showed in a multi-center study that using geometric margins from GTV-to-CTV for high-risk CTVs resulted in higher agreement in manual delineations than when using anatomical margins. Cardenas et al 37 proposed the use of artificial intelligence to automatically delineate oropharyngeal cancer high-risk CTVs. Their results showed high agreement between the clinically-used and automatically-delineated target volumes (mean DSC = 0.81 vs mean DSC = 0.64 in the current pair-wise analysis) when GTVs are already provided.

In the current study, a slight improvement was noticed in terms of consistency for primary GTV and CTV contours when compared to those from base of tongue cases, this may be caused by the fact that lymphoid tissue is abundantly present in this region and might be misinterpreted as tumor on MRI, but additional analyses are needed to confirm this hypothesis. Consequently, future oropharyngeal cancer delineation studies should consider techniques to integrate microscopic disease evaluation to provide margin guidelines to the community. For example, Ligtenberg et al 40 proposed imaging modality-specific margins for laryngeal and hypopharyngeal cancer after co-registering pre-surgical CT, MRI, and FDG-PET imaging with histological images collected after laryngectomy. Nevertheless, similar studies for oropharyngeal cancers would bring additional challenges.

The use of CT scans for radiotherapy treatment planning has been required as these scans provide electron density information that is necessary for previously clinically-available dose calculation algorithms. With the introduction of MRgRT and advances in MR-based dose calculation algorithm development, our field is fast approaching the possibility of MR-based radiotherapy. There are many advantages to MR-based radiotherapy. For example, the assessment of head and neck cancers, particularly those located in the oropharynx, can be hindered due to several factors including the presence of CT dental artifacts and lack of contrast between tumor and surrounding tissues 41. Several studies have shown MRI to provide superior soft tissue contrast, allowing for better definition of tumor extent and adjacent organs at risk 42,43. Furthermore, the use of single modality scans for treatment planning removes the need to co-register images eliminating any potential treatment uncertainties derived from image co-registration. It goes without saying that the introduction of MR-based radiotherapy presents itself with unique challenges. A potential challenge can be the interpretability of the information provided by the MRI as there is not sufficient imaging and pathological correspondence in oropharynx cancer that guide decisions such as whether or not to include edema seen on T2w within the target volumes. The results of the current study suggest that there is an urgent need for MR-based delineation guidelines and training necessary to reduce delineation variability of tumor and target volumes. It is important to highlight the potential role of conducting contouring peer-review sessions prior to treatment commencement during initial adoption of MRgRT in a clinical setting. Some studies have suggested that establishing institutional contouring peer-review sessions leads to more consistent target design within institutional clinical practices 44,45.

Variability in contouring can be the result of many confounding factors. Image interpretability plays a role increasing variability in contouring as different observers may interpret the same image very differently. Adding to this, there is often uncertainty involved in defining edges of anatomical structures or tumors which can be due to the limited spatial resolution of the images or lack of contrast to tissues surrounding a volume of interest. Here, interpretability presents its own challenges when it comes to defining tumors as different regions within a tumor may enhance differently on available imaging. Another factor affecting contouring variability is difference in contouring practices. Even when images are interpreted equally between observers, decisions as to what to include as part of a tumor or target volume may differ. These differences may be attributed to established institutional guidelines, prior training, experience, or new findings that may suggest changes in clinical practices. To address these sources of variation, several efforts are underway to establish guidelines and recommendations for MR-based radiotherapy to reduce variability in tumor and target volume contouring for head and neck cancers.

The presented study is subject to some limitations. First, only images from T1w, T1w+C, and T2w scans were provided to participants for contouring. Some studies have indicated that the addition of fluorodeoxyglucose positron emission tomography (FDG-PET/CT) and/or contrast-enhanced CT (CE-CT) to MR scans could produce more consistent tumor delineations. Furthermore, it does not consider the role of multi-parametric MR imaging as a venue to provide additional information about tumor disease and extent. Future studies will be needed to determine the role and potential benefit of including additional imaging modalities for delineation purposes. A follow-up study is underway to investigate the addition of FDG-PET imaging and the use of recommended guidelines for GTV definition using MRI 22 which may lead to better delineation conformity between observers.

Importantly, this study represents, to our knowledge, not only the first formal prospective technical assessment of MR-only radiotherapy segmentation required to benchmark performance for MR-guided adaptive radiotherapy, but also the largest single multi-site target delineation cohort for oropharyngeal cancers. This robust sample size provides a statistically reliable estimator of segmentation agreement. Using standard measures of delineation variability performance and comparing these metrics to previously published studies, our data suggests that MR-only performance appears to generally meet or exceed agreement and consistency metrics reported in prior head and neck target delineation studies evaluating variability tumor and target definition. The role of 1) use of consensus MR-only contouring guidelines for OPC tumor and target volume delineation and 2) MR-only tumor and target volume contour peer-review as practices to increase segmentation agreement amongst different practitioners remain to be evaluated.

Conclusion

Tumor and target volume manual delineation remains a challenging task in head and neck cancer radiotherapy. The data from this study suggests that appropriate guidelines, contouring quality assurance sessions, and training are still needed for the adoption of MR-based treatment planning for head and neck cancers. Such efforts should play a critical role in reducing delineation variation and ensure standardization of target design across clinical practices.

Supplementary Material

Appendix: Supplementary Materials

Acknowledgements:

This research was supported by the Andrew Sabin Family Foundation; Dr. Fuller was a 2017-2019 Sabin Family Foundation Fellow. Dr. Fuller received funding and salary support during study execution interval from the National Institutes of Health (NIH), including: the National Institute for Dental and Craniofacial Research Award (1R01DE025248/R56DE025248); a National Science Foundation (NSF), Division of Mathematical Sciences, Joint NIH/NSF Initiative on Quantitative Approaches to Biomedical Big Data (QuBBD) Grant (NSF 1557679); a National Institute of Biomedical Imaging and Bioengineering (NIBIB) Research Education Programs for Residents and Clinical Fellows Grant (R25EB025787-01); the NIH Big Data to Knowledge (BD2K) Program of the National Cancer Institute (NCI) Early Stage Development of Technologies in Biomedical Computing, Informatics, and Big Data Science Award (1R01CA214825); NCI Early Phase Clinical Trials in Imaging and Image-Guided Interventions Program (1R01CA218148); an NIH/NCI Cancer Center Support Grant (CCSG) Pilot Research Program Award from the UT MD Anderson CCSG Radiation Oncology and Cancer Imaging Program (P30CA016672) and an NIH/NCI Head and Neck Specialized Programs of Research Excellence (SPORE) Developmental Research Program Award (P50 CA097007. Dr. Fuller has received direct industry grant support and travel funding from Elekta AB. B.A. McDonald reports grants from National Institutes of Health (NIH)/National Institute of Dental and Craniofacial Research (NIDCR) (1F31DE029093-01A1) and research support from a Dr. John J. Kopchick Fellowship via The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences. This work was supported by infrastructure support from the MR-Linac Consortium.

Funding sources:

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Footnotes

Conflicts of interest: This work was supported by infrastructure support from the MR-Linac Consortium.

Research data are stored in an institutional repository and will be shared upon request to the corresponding author

References

  • 1.Beadle BM, Liao KP, Elting LS, et al. Improved survival using intensity-modulated radiation therapy in head and neck cancers: A SEER-Medicare analysis. Cancer. 2014;120(5):702–710. doi: 10.1002/cncr.28372 [DOI] [PubMed] [Google Scholar]
  • 2.Gupta T, Agarwal J, Jain S, et al. Three-dimensional conformal radiotherapy (3D-CRT) versus intensity modulated radiation therapy (IMRT) in squamous cell carcinoma of the head and neck: A randomized controlled trial. Radiotherapy and Oncology. 2012;104(3):343–348. doi: 10.1016/j.radonc.2012.07.001 [DOI] [PubMed] [Google Scholar]
  • 3.Nutting CM, Morden JP, Harrington KJ, et al. Parotid-sparing intensity modulated versus conventional radiotherapy in head and neck cancer (PARSPORT): A phase 3 multicentre randomised controlled trial. The Lancet Oncology. 2011;12(2):127–136. doi: 10.1016/S1470-2045(10)70290-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Cheung KY. Intensity modulated radiotherapy: Advantages, limitations and future developments. Biomedical Imaging and Intervention Journal. 2006;2(1):1–19. doi: 10.2349/biij.2.1.e19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lagendijk JJW, Raaymakers BW, Raaijmakers AJE, et al. MRI/linac integration. Radiotherapy and Oncology. 2008;86(1):25–29. doi: 10.1016/j.radonc.2007.10.034 [DOI] [PubMed] [Google Scholar]
  • 6.Raaymakers BW, Lagendijk JJW, Overweg J, et al. Integrating a 1.5 T MRI scanner with a 6 MV accelerator: proof of concept. Physics in Medicine and Biology. 2009;54(12):N229–N237. doi: 10.1088/0031-9155/54/12/N01 [DOI] [PubMed] [Google Scholar]
  • 7.Mutic S, Dempsey JF. The ViewRay System: Magnetic Resonance–Guided and Controlled Radiotherapy. Seminars in Radiation Oncology. 2014;24(3):196–199. doi: 10.1016/j.semradonc.2014.02.008 [DOI] [PubMed] [Google Scholar]
  • 8.Schmidt MA, Payne GS. Radiotherapy planning using MRI. Physics in Medicine and Biology. 2015;60(22):R323–R361. doi: 10.1088/0031-9155/60/22/R323 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kontaxis C, Bol GH, Stemkens B, et al. Towards fast online intrafraction replanning for free-breathing stereotactic body radiation therapy with the MR-linac. Physics in Medicine & Biology. 2017;62(18):7233–7248. doi: 10.1088/1361-6560/aa82ae [DOI] [PubMed] [Google Scholar]
  • 10.Van Herk M Errors and Margins in Radiotherapy. Seminars in Radiation Oncology. 2004;14(1):52–64. doi: 10.1053/j.semradonc.2003.10.003 [DOI] [PubMed] [Google Scholar]
  • 11.Segedin B, Petric P. Uncertainties in target volume delineation in radiotherapy - Are they relevant and what can we do about them? Radiology and Oncology. 2016;50(3):254–262. doi: 10.1515/raon-2016-0023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Thiagarajan A, Caria N, Schöder H, et al. Target volume delineation in oropharyngeal cancer: Impact of PET, MRI, and physical examination. International Journal of Radiation Oncology Biology Physics. 2012;83(1):220–227. doi: 10.1016/j.ijrobp.2011.05.060 [DOI] [PubMed] [Google Scholar]
  • 13.Bird D, Scarsbrook AF, Sykes J, et al. Multimodality imaging with CT, MR and FDG-PET for radiotherapy target volume delineation in oropharyngeal squamous cell carcinoma. BMC Cancer. 2015;15(1):1–10. doi: 10.1186/s12885-015-1867-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Rasch CRN, Steenbakkers RJHM, Fitton I, et al. Decreased 3D observer variation with matched CT-MRI, for target delineation in Nasopharynx cancer. Radiation Oncology. 2010;5(1):4–9. doi: 10.1186/1748-717X-5-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hong TS, Tome WA, Harari PM. Heterogeneity in head and neck IMRT target design and clinical practice. Radiotherapy and Oncology. 2012;103(1):92–98. doi: 10.1016/j.radonc.2012.02.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Grégoire V, Levendag P, Ang KK, et al. CT-based delineation of lymph node levels and related CTVs in the node-negative neck: DAHANCA, EORTC, GORTEC, NCIC, RTOG consensus guidelines. Radiotherapy and Oncology. 2003;69(3):227–236. doi: 10.1016/j.radonc.2003.09.011 [DOI] [PubMed] [Google Scholar]
  • 17.Michalski JM, Lawton C, El Naqa I, et al. Development of RTOG Consensus Guidelines for the Definition of the Clinical Target Volume for Postoperative Conformal Radiation Therapy for Prostate Cancer. International Journal of Radiation Oncology*Biology*Physics. 2011;76(2):361–368. doi: 10.1016/j.ijrobp.2009.02.006. Development [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lee AW, Ng WT, Pan JJ, et al. International guideline for the delineation of the clinical target volumes (CTV) for nasopharyngeal carcinoma. Radiotherapy and Oncology. 2017;126(1):25–36. doi: 10.1016/j.radonc.2017.10.032 [DOI] [PubMed] [Google Scholar]
  • 19.Grégoire V, Ang K, Budach W, et al. Delineation of the neck node levels for head and neck tumors: A 2013 update. DAHANCA, EORTC, HKNPCSG, NCIC CTG, NCRI, RTOG, TROG consensus guidelines. Radiotherapy and Oncology. 2014;110(1):172–181. doi: 10.1016/j.radonc.2013.10.010 [DOI] [PubMed] [Google Scholar]
  • 20.Grégoire V, Evans M, Le QT, et al. Delineation of the primary tumour Clinical Target Volumes (CTV-P) in laryngeal, hypopharyngeal, oropharyngeal and oral cavity squamous cell carcinoma: AIRO, CACA, DAHANCA, EORTC, GEORCC, GORTEC, HKNPCSG, HNCIG, IAG-KHT, LPRHHT, NCIC CTG, NCRI, NRG Oncolog. Radiotherapy and Oncology. 2017;126:3–24. doi: 10.1016/j.radonc.2017.10.016 [DOI] [PubMed] [Google Scholar]
  • 21.Grégoire V, Eisbruch A, Hamoir M, Levendag P. Proposal for the delineation of the nodal CTV in the node-positive and the post-operative neck. Radiotherapy and Oncology. 2006;79(1):15–20. doi: 10.1016/j.radonc.2006.03.009 [DOI] [PubMed] [Google Scholar]
  • 22.Jager EA, Ligtenberg H, Caldas-Magalhaes J, et al. Validated guidelines for tumor delineation on magnetic resonance imaging for laryngeal and hypopharyngeal cancer. Acta Oncologica. 2016;55(11):1305–1312. doi: 10.1080/0284186X.2016.1219048 [DOI] [PubMed] [Google Scholar]
  • 23.de Mol van Otterloo SR, Christodouleas JP, Blezer ELA, et al. The MOMENTUM Study: An International Registry for the Evidence-Based Introduction of MR-Guided Adaptive Therapy. Frontiers in Oncology. 2020;10. doi: 10.3389/fonc.2020.01328 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Verkooijen HM, Kerkmeijer LGW, Fuller CD, et al. R-IDEAL: A Framework for Systematic Clinical Evaluation of Technical Innovations in Radiation Oncology. Frontiers in Oncology. 2017;7. doi: 10.3389/fonc.2017.00059 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bahig H, Yuan Y, Mohamed ASR, et al. Magnetic Resonance-based Response Assessment and Dose Adaptation in Human Papilloma Virus Positive Tumors of the Oropharynx treated with Radiotherapy (MR-ADAPTOR): An R-IDEAL stage 2a-2b/Bayesian phase II trial. Clinical and Translational Radiation Oncology. 2018;13:19–23. doi: 10.1016/j.ctro.2018.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.McDonald BA, Vedam S, Yang J, et al. Initial Feasibility and Clinical Implementation of Daily MR-Guided Adaptive Head and Neck Cancer Radiation Therapy on a 1.5T MR-Linac System: Prospective R-IDEAL 2a/2b Systematic Clinical Evaluation of Technical Innovation. International Journal of Radiation Oncology*Biology*Physics. Published online December 2020. doi: 10.1016/j.ijrobp.2020.12.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Dice LR. Measures of the Amount of Ecologic Association Between Species. Ecology. 1945;26(3):297–302. doi: 10.2307/1932409 [DOI] [Google Scholar]
  • 28.Yang J, Haas B, Fang R, et al. Atlas ranking and selection for automatic segmentation of the esophagus from CT scans. Physics in Medicine and Biology. 2017;62(23):9140–9158. doi: 10.1088/1361-6560/aa94ba [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Warfield SK, Zou KH, Wells WM. Simultaneous Truth and Performance Level Estimation ( STAPLE ): An Algorithm for the Validation of Image Segmentation. 2004;23(7):903–921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Riegel AC, Berson AM, Destian S, et al. Variability of gross tumor volume delineation in head-and-neck cancer using CT and PET/CT fusion. International Journal of Radiation Oncology*Biology*Physics. 2006;65(3):726–732. doi: 10.1016/j.ijrobp.2006.01.014 [DOI] [PubMed] [Google Scholar]
  • 31.Anderson CM, Sun W, Buatti JM, et al. Interobserver and intermodality variability in GTV delineation on simulation CT, FDG-PET, and MR Images of Head and Neck Cancer. Jacobs J Radiat Oncol. 2014;1(1):006. [PMC free article] [PubMed] [Google Scholar]
  • 32.Ng SP, Dyer BA, Kalpathy-Cramer J, et al. A prospective in silico analysis of interdisciplinary and interobserver spatial variability in post-operative target delineation of high-risk oral cavity cancers: Does physician specialty matter? Clinical and Translational Radiation Oncology. 2018;12:40–46. doi: 10.1016/j.ctro.2018.07.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ng SP, Cardenas CE, Elhalawani H, et al. Comparison of tumor delineation using dual energy computed tomography versus magnetic resonance imaging in head and neck cancer re-irradiation cases. Physics and Imaging in Radiation Oncology. 2020;14(March):1–5. doi: 10.1016/j.phro.2020.04.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Gudi S, Ghosh-Laskar S, Agarwal JP, et al. Interobserver Variability in the Delineation of Gross Tumour Volume and Specified Organs-at-risk During IMRT for Head and Neck Cancers and the Impact of FDG-PET/CT on Such Variability at the Primary Site. Journal of Medical Imaging and Radiation Sciences. 2017;48(2):184–192. doi: 10.1016/j.jmir.2016.11.003 [DOI] [PubMed] [Google Scholar]
  • 35.van der Veen J, Gulyban A, Nuyts S. Interobserver variability in delineation of target volumes in head and neck cancer. Radiotherapy and Oncology. 2019;137:9–15. doi: 10.1016/j.radonc.2019.04.006 [DOI] [PubMed] [Google Scholar]
  • 36.Hansen CR, Johansen J, Samsøe E, et al. Consequences of introducing geometric GTV to CTV margin expansion in DAHANCA contouring guidelines for head and neck radiotherapy. Radiotherapy and Oncology. 2018;(126):43–47. doi: 10.1016/j.radonc.2017.09.019 [DOI] [PubMed] [Google Scholar]
  • 37.Cardenas CE, McCarroll RE, Court LE, et al. Deep Learning Algorithm for Auto-Delineation of High-Risk Oropharyngeal Clinical Target Volumes With Built-In Dice Similarity Coefficient Parameter Optimization Function. International Journal of Radiation Oncology Biology Physics. 2018;101(2):468–478. doi: 10.1016/j.ijrobp.2018.01.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Cardenas CE, Anderson BM, Aristophanous M, et al. Auto-delineation of oropharyngeal clinical target volumes using 3D convolutional neural networks. Physics in Medicine and Biology. 2018;63(21). doi: 10.1088/1361-6560/aae8a9 [DOI] [PubMed] [Google Scholar]
  • 39.Unkelbach J, Bortfeld T, Cardenas CE, et al. The role of computational methods for automating and improving clinical target volume definition. Radiotherapy and Oncology. Published online 2020. doi: 10.1016/j.radonc.2020.10.002 [DOI] [PubMed] [Google Scholar]
  • 40.Ligtenberg H, Jager EA, Caldas-Magalhaes J, et al. Modality-specific target definition for laryngeal and hypopharyngeal cancer on FDG-PET, CT and MRI. Radiotherapy and Oncology. 2017;123(1):63–70. doi: 10.1016/j.radonc.2017.02.005 [DOI] [PubMed] [Google Scholar]
  • 41.Hernandez S, Sjogreen C, Gay SS, et al. Development and dosimetric assessment of an automatic dental artifact classification tool to guide artifact management techniques in a fully automated treatment planning workflow. Computerized Medical Imaging and Graphics. 2021;90:101907. doi: 10.1016/j.compmedimag.2021.101907 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Chung NN, Ting LL, Hsu WC, Lui LT, Wang PM. Impact of magnetic resonance imaging versus CT on nasopharyngeal carcinoma: Primary tumor target delineation for radiotherapy. Head and Neck. 2004;26(3):241–246. doi: 10.1002/hed.10378 [DOI] [PubMed] [Google Scholar]
  • 43.Adams S, Baum RP, Stuckensen T, Bitter K, Hör G. Prospective comparison of 18F-FDG PET with conventional imaging modalities (CT, MRI, US) in lymph node staging of head and neck cancer. European Journal of Nuclear Medicine. 1998;25(9):1255–1260. doi: 10.1007/s002590050293 [DOI] [PubMed] [Google Scholar]
  • 44.Cardenas CE, Mohamed ASR, Tao R, et al. Prospective Qualitative and Quantitative Analysis of Real-Time Peer Review Quality Assurance Rounds Incorporating Direct Physical Examination for Head and Neck Cancer Radiation Therapy. International Journal of Radiation Oncology Biology Physics. 2017;98(3):532–540. doi: 10.1016/j.ijrobp.2016.11.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Ballo MT, Chronowski GM, Schlembach PJ, Bloom ES, Arzu IY, Kuban DA. Prospective peer review quality assurance for outpatient radiation therapy. Practical Radiation Oncology. 2014;4(5):279–284. doi: 10.1016/j.prro.2013.11.004 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix: Supplementary Materials

RESOURCES