Clinical adequacy assessment of autocontours for prostate IMRT with meaningful endpoints

Hamidreza Nourzadeh; William T Watkins; Mahmoud Ahmed; Cheukkai Hui; David Schlesinger; Jeffrey V Siebers

doi:10.1002/mp.12158

. 2017 Apr 12;44(4):1525–1537. doi: 10.1002/mp.12158

Clinical adequacy assessment of autocontours for prostate IMRT with meaningful endpoints

Hamidreza Nourzadeh ¹, William T Watkins ¹, Mahmoud Ahmed ¹, Cheukkai Hui ¹, David Schlesinger ¹, Jeffrey V Siebers ^1,^✉

PMCID: PMC10659108 PMID: 28196288

Abstract

Purpose

To determine if radiation treatment plans created based on autosegmented (AS) regions‐of‐interest (ROI)s are clinically equivalent to plans created based on manually segmented ROIs, where equivalence is evaluated using probabilistic dosimetric metrics and probabilistic biological endpoints for prostate IMRT.

Method and materials

Manually drawn contours and autosegmented ROIs were created for 167 CT image sets acquired from 19 prostate patients. Autosegmentation was performed utilizing Pinnacle's Smart Probabilistic Image Contouring Engine. For each CT set, 78 Gy/39 fraction 7‐beam IMRT treatment plans with 1 cm CTV‐to‐PTV margins were created for each of the three contour scenarios; P _MD using manually delineated (MD) ROIs, P _AS using autosegmented ROIs, and P _AM using autosegmented organ‐at‐risks (OAR)s and the manually drawn target. For each plan, 1000 virtual treatment simulations with different systematic errors for each simulation and a different random error for each fraction were performed. The statistical probability of achieving dose–volume metrics (coverage probability (CP)), expectation values for normal tissue complication probability (NTCP), and tumor control probability (TCP) metrics for all possible cross‐evaluation pairs of ROI types and planning scenarios were reported. In evaluation scenarios, the root mean square loss (RMSL) and maximum absolute loss (MAL) of coverage probability of dose–volume objectives, E[TCP], and E[NTCP] were compared with respect to the base plan created and evaluated with manually drawn contours.

Results

Femoral head dose objectives were satisfied in all situations, as well as the maximum dose objectives for all ROIs. Bladder metrics were within the clinical coverage tolerances except D _35Gy for the autosegmented plan evaluated with the manual contours. Dosimetric indices for CTV and rectum could be highly compromised when the definition of the ROIs switched from manually delineated to autosegmented. Seventy‐two percent of CT image sets satisfied the worst‐case CP thresholds for all dosimetric objectives in all scenarios, the percentage dropped to 50% if biological indices were taken into account. Among evaluation scenarios, (MD,P _AM) bore the highest resemblance to (MD,P _MD) where 99% and 88% of cases met all CP thresholds for bladder and rectum, respectively.

Conclusions

When including daily setup variations in prostate IMRT, the dose–volume metric CP, and biological indices of ROIs were approximately equivalent for the plans created based on manually drawn targets and autosegmented OARs in 88% of cases. The accuracy of autosegmented prostates and rectums are impediment to attain statistically equivalent plans created based on manually drawn ROIs.

Keywords: autocontour, prostate cancer, treatment planning, uncertainties in radiation therapy

1. Introduction

Region of interest (ROI) delineation is a crucial step in external beam radiation therapy planning. Errant ROIs can yield misguided deliveries with undesirable clinical consequences. Currently, ROIs are mostly segmented manually by physicians or dosimetrists prior to their use in treatment planning. While manual contouring is the clinical standard for structure delineation during treatment planning, it is both time consuming and prone to inter‐ and intraobserver contour variability.¹^,²^,³^,⁴^,⁵

To alleviate this bottleneck and to move toward fully automated treatment planning procedures, autocontouring techniques have been developed.⁶^,⁷ In both manual and automated contouring processes, delineation uncertainties are inherent to the resulting contours. Intrinsic and extrinsic factors can limit the ability to delineate anatomical regions for different imaging modalities¹^,⁴^,⁸ due to poor soft tissue contrast, noise, and image artifacts. In addition, manual contours have interobserver¹ delineation variability (due to different perception of observers), and intraobserver²^,⁴^,⁹ delineation variability (rooted in different interpretations/perception of the same observer in different trials). Observer variability is one of the dominant geometric uncertainties in radiation therapy,¹⁰ nonetheless, manual contours are considered the gold standard. While similarity metrics between manual physician‐contoured ROIs and autocontoured ROIs are progressively improving,⁶ the similarity metric threshold or other contour adequacy measure required to safely employ autocontouring in radiation therapy procedures is unclear.

The required spatial accuracy of ROI delineation depends on several factors including treatment site, prescribed dose to target(s), relative distance to other ROIs, potential dosimetric and biological consequences, delivery technique, dose conformity, as well as the relative size of the contour uncertainty compared with other uncertainties such as underlying geometric setup uncertainties. With margins added to provide a buffer for some components of the uncertainties, it is not clear if an additional margin is needed to accommodate contouring uncertainties. If, or when, current margins or planning techniques sufficiently account for contour variability between manual and autocontouring algorithms, then the transition from the current gold standard (manually drawn ROIs) to autocontoured ROIs can occur without sacrificing plan quality.

This work is based on the premise that autocontoured ROIs are adequate when they result in treatment plans that are equivalent to those based on manually drawn ROIs. We consider equivalent radiation therapy plans as plans which should result in an equivalent outcome. The inference in current clinical planning is that plans with matching dose–volume histograms or equal critical dose–volume indices are equivalent. In planning studies, equivalence of outcome surrogates, such as normal tissue complication probability (NTCP) and tumor control probability (TCP), can be quantified to infer equivalence. However, these evaluations often neglect inherent variances and uncertainties between the planned treatment and that achieved for a patient. In particular, typical evaluations assume that the patient pose and position for each treatment is the same as the pose and position at the time of treatment planning simulation.

Consider various methods to evaluate plan equivalence to judge the clinical adequacy of autocontours for treatment planning. A simple method would be to compare static plan metrics,¹¹^,¹²^,¹³ for example, for a fixed plan (dose distribution) compare the DVH of PTV created by expanding an autocontoured clinical target volume (CTV) with respect to the DVH of the PTV generated from the manual‐contoured CTV. This approach naïvely ignores the fact that the purpose of a PTV/PRV is to ensure that the underlying CTV/OAR receives/is spared from the critical dose after uncertainties are considered. An alternative is to utilize treatment delivery simulations to estimate the coverage probability (CP) of the dose–volume metrics of interest and compare them for the different plans. In this context, equivalent plans with respect to dosimetric objectives and biological endpoints are plans with equivalent CP for the dose–volume objectives and equivalent TCP, and NTCP distributions.

In this study, we evaluated the equivalency/similarity of treatment plans created using alternative contour sets by comparing the probability of a plan achieving meaningful dosimetric and biological endpoints. Treatment delivery simulations, which incorporate the dosimetric consequences inherent to daily setup errors, were used to evaluate the probabilities. For a patient population, we examined if plans created with autosegmented ROIs (P _AS) would likely achieve the same endpoints as if manually drawn ROIs were utilized for planning (P _MD). Due to the importance of accurately defining the target, in a third scenario, we also evaluated plans with autosegmented normal tissues and a manually drawn target (P _AM). The proposed method has the potential to identify ROI deviations in terms of variations in dosimetric and biological indices. Ultimately, this work can provide a method for intercomparing alternative ROI delineations and therefore to examine the clinical adequacy of autocontouring algorithms in light of other inherent variations which affect the achieved patient dose.

2. Methods

2.A. Treatment plan creation and coverage probability evaluation

Figure 1 shows the workflow for this study. The left side of the flowchart describes the data preparation stage for the study. The process starts with 229 manually contoured fan‐beam CT image sets acquired from 19 different prostate patients. Patient CT images were obtained from the Netherlands Cancer Institute. The CT image resolution was 0.94 × 0.94 × 3 mm³. Readers are referred to earlier published work¹⁴^,¹⁵^,¹⁶ for further information on this patient cohort. All manually contoured ROIs were drawn and reviewed by a single physician. The Smart Probabilistic Image Contouring Engine (SPICE)⁶ module of Pinnacle³ treatment planning system (Philips Medical Systems, Fitchburg, WI, USA) was utilized to obtain autosegmented contours for prostate, bladder, rectum, and femoral heads. To assess autosegmented contours sufficiency in the full treatment planning context, we used the commonly used procedure to create and evaluate IMRT prostate treatment plans. The right side of Figure 1 illustrates the evaluation process for each image set. The process starts with three sets of ROIs for autoplanning, namely set MD with manually drawn ROIs, set AS with autosegmented ROIs and set AM with autosegmented normal tissues and a manually drawn target. A 1 cm fixed margin (isotropic expansion) was used to create planning target volume (PTV) from the clinical target volume to account for simulated geometric uncertainties. The margin roughly approximates the van Herk formula result with the uncertainties from Table 1.

Data preparation (left) consists of using SPICE to auto contouring the 229 previously manually contoured male‐pelvis fan‐beam CT scan sets. The treatment plan creation and evaluation workflow (right) consists of plan creation, DVCM calculation and probabilistic evaluation of the plan quality indices. Three plans P _MD, P _AS, and P _AM are created on each CT dataset. [Color figure can be viewed at wileyonlinelibrary.com]

Table 1.

Prostate daily setup uncertainties assumed in this study. A 1 cm margin was utilized in planning to accommodate these uncertainties. When used in the treatment delivery simulations, each parameter follows a normal distribution with the standard deviation values for translation and rotation. The parameters are adopted from Ref.17

Setup error	Execution (random) error			Preparation (systematic) error
Setup error	LR	SI	AP	LR	SI	AP
Translation [mm]	2	1.8	1.7	2.6	2.4	2.6
Rotation [deg]	1.1	0.6	0.5	1.1	0.6	0.5

Open in a new tab

Seven‐beam IMRT treatment plans were generated for each ROI scenario (MD, AS, and AM), resulting in plans (P _MD, P _AS, and P _AM). A 78 Gy/39 fraction prescription was used. The beams were transversally distributed at gantry angles: 30°, 80°, 130°, 180°, 230°, 280°, 330°. The generated plans were created based on objectives in Table 2 for all patients. Direct machine parameter optimization (DMPO) from Pinnacle³ (Research Version 9.710) was employed to optimize all plans, with a 333 mm³ dose grid resolution, a maximum of 50 iterations, and a maximum of 50 segments. Since SPICE does not autocontour the sigmoid, the manually drawn sigmoid was used in optimization of all plans. A script was used to automate the generation of P _MD, P _AS, and P _AM to ensure uniform study execution across all plans, image sets, and patients.

Table 2.

Objectives used for the IMRT optimization, assuming localized prostate cancer with a 78 Gy Rx dose in 39 fractions. In addition, an extremely low weight (10⁻⁹) gEUD = 0 objective with a = 4 has been applied to the OARs as in¹⁵ to further reduce each OAR dose

Objective	PTV				Rectum				Bladder				Femur (Left/Right)			Sigmoid
Objective	D ₉₅	D _min	D _max	D ₂	D ₁₅	D ₂₀	D ₃₀	D ₅₀	D ₁₅	D ₂₅	D ₃₅	D ₅₀	D _max	D ₂₅	D ₄₀	D _max	D ₁
Value	78	78	85.8	81.9	75	70	65	50	80	75	70	65	50	45	40	65	45

Open in a new tab

As setup uncertainties are inherent to the treatment delivery process, instead of analyzing the alternative static plans, plans were analyzed including the effect of setup uncertainties via treatment delivery simulations. Robustness analysis of the generated plans was performed for a rigid body geometric uncertainty model with the parameters from Table 1 sampled as zero‐mean Gaussian probability distribution functions (PDFs).¹⁷ A similar type of plan robustness analysis has been carried out to compare static plans to those created using probabilistic treatment planning techniques.¹³^,¹⁸ In this study, robustness analysis consisted of performing 1000 virtual treatment courses with different sampled systematic errors for each treatment course and a different sampled random error for each fraction. Proof that the 1000 simulations are adequate to achieve statistical significance is given in the supplemental material. We leveraged our in‐house GPU‐accelerated radiation therapy robustness analyzer to perform the simulations. The robustness analyzer directly samples random translational and rotational setup uncertainties inherent to multifractional external beam treatment delivery,¹⁷ assuming the dose distribution is shift‐invariant¹⁵ and outputs DVHs and other treatment metrics for each virtual treatment course simulated. To estimate the CP for a specific DVH metric, a dose–volume coverage map (DVCM)¹⁹ was employed (i.e., a map that expresses CP as a function of dose–volume level). Details of the coverage evaluation computation can be found in¹⁹^,²⁰^,²¹^,²² and in the supplemental information. The robustness analyzer reports the CP at each specified dose–volume objectives, as well as the expectation values and distributions of biological indices. The reader is referred to the supplemental material for information about the TCP and NTCP models.

For each contoured CT image set, for each of the three plans (P _MD, P _AS, P _AM), treatment delivery simulations were performed with respect to each manual (MD) and each autosegmented contour (AS), resulting in six evaluations of each objective shown in Table 2. The syntax used for these evaluation is (evaluation contour, planning contour), for example, for (AS, P _MD ), the plan was developed with the manually drawn contours and the evaluation was with respect to the autosegmented contours. For each objective, arrays of each evaluation were constructed by concatenating the results for all the CT image sets. In comparing the arrays, (MD, P _MD ), was taken as the gold standard. Figure 2 summarizes the similarity assessment process in a flowchart.

After planning and evaluating CP of the desired objectives for all CT images sets, six different QM arrays are formed each corresponding to one scenario. Therefore, the similarity assessment process in essence is a matter of comparing the resultant probability arrays with respect to baseline scenario (MD, P _MD). [Color figure can be viewed at wileyonlinelibrary.com]

To quantify agreement between a scenario and the baseline gold standard, root mean square loss (RMSL), and maximum absolute loss (MAL) (infinity norm of losses) were utilized to compare arrays of the plan quality metrics (QM) of interest. RMSL serves as a similarity measure in which only losses in the QM are penalized [Eq. (1)], while MAL determines the worst‐case losses in the QM [Eq. (2)].

R M S L (v, u) = \sqrt{\frac{\sum_{i = 1}^{N} L (u (i), v (i)) {(v (i) - u (i))}^{2}}{N}}

(1)

MAL (v, u) = max (L (u (i), v (i)) | v (i) - u (i) |)

(2)

L (u (i), v (i)) = \{\begin{matrix} H (u (i) - v (i)) & C P, T C P, o r m i n i m u m \\ d o s i m e t r i c o b j e c t i v e \\ H (v (i) - u (i)) & O t h e r w i s e \end{matrix}

(3)

Where $v (i)$ and $u (i)$ are the QM evaluated on the ith CT image set for the test (v) and reference standard (u) scenarios. In this paper, in the absence of uncertainty (initial/unperturbed plan), the QM could be a dose–volume objective, NTCP, or TCP. In the presence of uncertainties, the QM could be CP of a dose–volume objective, E[TCP] or E[NTCP]. By definition, decrease in CP of dose–volume objectives, TCP, E[TCP], minimum dosimetric objectives is considered to be loss, whereas for the remaining indices the degradation occurs when they increase. To incorporate these, the loss function L(.) is defined as Eq. (3) where H(.) is Heaviside function. Throughout the paper, Δ symbol indicates difference between v and u arrays for the QMs.

2.B. Clinical significance assessment

To determine clinically relevant ranges for the RMSL and MAL similarity metrics, a dose perturbation analysis was performed. The analysis investigated the effect of the daily linac output variations (OV)s on the CPs, E[NTCP], and E[TCP] in the presence of daily setup motion. The RMSL and MAL for equivalent plans should be less than or equal to that inherent to acceptable day‐to‐day linac OVs.²³ For the baseline (MD, P _MD) scenario, the probabilistic gain and loss of each index was calculated as a function of OV for all 229 prostate patients in this study. Figure 3 depicts the procedure for given CT image set, ROI, selected dosimetric objective, and OV factor of linac. Robustness analysis was performed for the perturbed dose (dose due to P _MD multiplied by linac OV) to compute the perturbed CP values associated with objectives as well as the perturbed E[TCP] and E[NTCP] metrics. The perturbed indices were compared against the baseline counterparts to determine gain/loss in the worst case and similarity assessment metrics.

The analysis to study the effect of output variation on the CP of dosimetric objectives, E[TCP] and E[NTCP]. [Color figure can be viewed at wileyonlinelibrary.com]

3. Results

The SPICE autosegmentation generated of all required ROIs (prostate, bladder, rectum, femoral heads) for 167 CT image sets from 18 patients. For the remaining image sets, SPICE failed in accomplishing at least one requisite ROI. SPICE failed on all image sets for one patient (Patient 10). We did not observe any obvious distinction in these image sets that would result in the failure. These failed image sets were excluded from the remaining analysis. Table 3 summarizes the failures pattern in the cohort. While the CTV and rectum did not fail, the frequency of failure in autosegmentation of bladder was the highest followed by right and left femurs. The reader is referred to supplemental materials for more details on the segmentation quality.

Table 3.

The SPICE failure pattern in autosegmentation of ROIs for the cohort. From 229 CT image sets of 19 different patients, SPICE failed to create at least one ROI in 62 CT image sets with no failure in segmentation of prostate and rectum

Patient ID	1	2	3	7	8	9	10	11	12	13	16	18	19	Total
Bladder	3	4	0	1	2	1	11	7	6	0	0	8	0	43
Left femur	0	0	0	0	1	0	0	0	0	1	6	3	0	11
Right femur	0	0	1	0	0	1	0	0	0	0	12	2	1	17
# of failed CT Image Sets	3	4	1	1	3	2	11	7	6	1	12	10	1	62

Open in a new tab

3.A. Similarity assessment of dosimetric and biological indices in the absence of uncertainties

Before probabilistic assessment of the desired dosimetric and biological indices, the similarity assessment of the static evaluation of the QMs was investigated for the patient cohort in the absence of setup uncertainties. For static evaluations, PTV coverage is compared since corresponds with current clinical practice as the PTV is intended to accommodate variations. Figure 4 illustrates the variations of the dosimetric and biological QMs for the unperturbed plans in the different plan evaluation scenarios. Plan QM variability is the lowest in scenarios (MD, P _MD), (AS, P _AS), and (MD, P _AM), that is, evaluation scenarios in which the ROI definitions are not switched from manually drawn to autosegmented, and vice‐versa, particularly for the PTV. Variations in the other scenarios indicate the extent to which coverage could be affected by the alternative segmentations. Altering the PTV definition results in large variations in the PTV D ₉₅ and TCP. This is expected as dose distributions are designed to tightly conform to the PTV used in planning. For OARs, the similarity between distributions is evident for many QMs in scenarios (MD, P _MD) and (AS, P _AS). Except for four CT image sets, bladder's dose–volume metrics are satisfied, and there are wide margins between the samples median and the optimization objectives in all scenarios. This is due to the inclusion of a low weight (10⁻¹⁰) generalized equivalent uniform dose objective (gEUD = 0) in the optimization to lower OAR doses below the other specified objectives. The dosimetric QMs for autosegmented rectums are more dispersed than their manually drawn counterparts. Note that the increased variation in autosegmented rectum was not solely attributed to the difference in rectum contours, but the collective deviations in all autosegmented ROIs. To clarify this point, for instance, consider variations in D ₅₀ for (AS, P _AM) which are closer to those of (MD, P _MD) evaluations, which indicates using MD CTV could partially alleviate the impact of the alternative autosegmented OARs.

Box and whiskers plots of dosimetric and biological indices for both autosegmented (AS) and manually drawn (MD) ROIs evaluated on all the three planning scenarios for the 167 CT image sets. Evaluation scenarios correspond with 1 → (MD,P _MD), 2→ (MD,P _AS), 3 → (MD,P _AM), 4 → (AS,P _MD), 5 → (AS,P _AS), and 6→ (AS,P _AM). The edges of each box represent the 25th and 75th percentiles, and the median is indicated by a line inside each box. The whiskers approximately contain 99.3 of the data assuming the data are normally distributed. The samples plotted by + symbol represent data points falling out of the whiskers, that is, outliers. Dotted lines signify the objectives of optimization for each metric. Each number corresponds to an evaluation scenario. [Color figure can be viewed at wileyonlinelibrary.com]

The similarity between dosimetric and biological indices of individual structure set in the different scenarios with respect to initial/unperturbed plans are shown in Figs. 5, 6, and 7 for the PTV, bladder, and rectum, respectively. In these comparisons, deviations are given with respect to the baseline evaluation scenario (MD, P _MD). Dotted lines are used to subdivide the patients.

PTV. The differences in the TCP and the dosimetric objectives with respect to (MD, P _MD) for each scenario. The degradation (loss) in D ₉₅, D _min, and TCP values is given by negative values.

Differences in the dose–volume metrics (ΔD_XX) and NTCPs (ΔNTCP) of the bladder for alternative planning and evaluation scenarios from the baseline (MD, P _MD) scenario in all 167 structure sets. Mean, MAL, and RMSL metrics are shown in each graph. Image sets associated with each patient are divided by dotted lines.

Rectum. The differences in NTCPs and the dosimetric objectives with respect to (MD, P _MD) for each scenario.

For the PTV, D _max does not considerably change across all evaluation scenarios. The dose–volume metrics and TCPs in (AS, P _AS) and (MD, P _AM) are close to their corresponding values in (MD, P _MD), whereas for (MD, P _AS), (AS, P _MD), and (AS, P _AM), they are compromised for some CT image sets. This emphasizes the importance of the definition of the target.

For OARs, positive values imply degradation in indices. In Fig. 6, dosimetric indices have RMSL> 4.51 and MAL> 27.73 for bladder in all the scenarios except (MD, P _AM). However, as shown in Fig. 4, these deviations do not result in objective violation except for a small number of CT image sets due to the wide gap between the dose metrics and the objective thresholds. Similarly, rectum's dose–volume indices have large MALs and RMSLs in all the evaluation scenarios except for (MD, P _AM) (Fig. 7). The dissimilarities between metrics are more evident for D ₅₀ . For the bladder and rectum, RMSLs and MALs of NTCPs varied in [0.008, 0.032] and [0.095, 0.22] ranges, respectively.

3.B. Similarity assessment of dosimetric and biological indices in the presence of uncertainties

The 167 image sets successfully autocontoured were subject to planning with the drawn, autosegmented, and mixed contour sets, then had coverage probabilities assessed for each dose–volume objective as well as expectation values of the TCP and NTCP based on manual contours. In at least 96% of CT image sets, the probability of achieving all the dose–volume objectives for a given plan was greater than 0.9, independent of the contours used for evaluation.

Figure 8 illustrates the difference between the expectation value of the NTCP and the CPs for the bladder subject to the different evaluation scenarios in comparison with the baseline scenario (MD, P _MD ). Note, only negative ΔCP values indicate degraded plan quality. In most cases (83%), |ΔCP|< 0.02. For 69% of cases, ΔCP> = 0. The remaining 31% have ΔCP< 0 and are poorer plans which contribute to RMSL and MAL. RMSL values indicate strong resemblance between probability arrays (RMSL≤ 0.037).

Differences in the expectation of the bladder NTCP (ΔE[NTCP]) and coverage probabilities (ΔCPs) of the dose–volume objectives for alternative planning and evaluation scenarios with respect to the baseline (MD, P _MD) scenario. Mean, MAL, and RMSL metrics are shown in each graph. We plot −ΔE[NTCP] so that degradation is given by negative values, consistent with ΔCP values. All ROIs satisfy D _max objective with probability 1 in all the scenarios (not shown).

For three of the four bladder metrics analyzed the MAL and RMSL values are less for (MD, P _M) than for (MD, P _AS) indicating the importance and dependence on manual target contouring. Interestingly, while E[NTCP] is poorer for (AS, P _AM) than for (MD, P _AM), the MAL and RMSL for all of the dose–volume objectives is lower for (AS, P _AM). In scenario (MD, P _AM), the worst‐case ΔCP for D ₂₅ had MAL equal to 0.263. Note that E[NTCP] has the least variation (RMSL = 0.008, MAL = 0.042) in scenario (MD,P _AM). The maximum E[NTCP] loss occurred in evaluation scenario (AS,P _AS).

Figure 9 shows the differences between the E[NTCP] and dose–volume objective CPs for the rectum for each scenario compared with the baseline. In contrast to bladder results, the distributions have larger standard deviations in all scenarios. All scenarios have large MAL values except for (MD, P _AM). In only 15% cases, |ΔCP|< 0.02. For 25% of cases, ΔCP> = 0. The remaining 75% have ΔCP < 0. In (MD, P _AM), note that both RMSL and MAL values increased from D ₅₀ to D ₁₅ which implied achieving D ₁₅ in general was less probable than meeting D ₅₀ when manually drawn OARs were replaced with their autosegmented counterparts. Similar to bladder, E[NTCP] had the least variations in (MD, P _AM) (RMSL < 0.012, MAL < 0.089). When evaluating autosegmented rectum on (AS, P _AS), E[NTCP] has increased by at most 0.154.

Rectum. The differences with respect to (MD, P _MD) of the expected value of NTCPs and the coverage probability for the dosimetric objectives for each scenario.

Figure 10 illustrates the ΔCP for the CTV's D _min and D ₉₅ objectives. Visually, there appears to be two groupings: {(MD,P _AM), (AS, P _AS)} and {(AS, P _MD), (MD,P _AS), (AS,P _AM)}, those in which the assessment CTV and plan CTV are the same, and those in which they differ. Comparing D ₉₅ ΔCP for (MD, P _AM) and (AS, P _AS) to (MD,P _D), RMSLs of 0.008 and 0.009, respectively, are found, indicating strong similarity between the arrays. Moreover, the CP for D ₉₅ is greater than 0.9 for 99.28% of the cases. On the other hand, the RMSL values for (AS,P _MD), (MD,P _AS), and (AS,P _AM) are 0.584, 0.19, and 0.451, respectively, indicating much less similarity between the CPs. The RMSL values for D _min also differ from {(MD,P _AM), (AS,P _AS)} to {(AS,P _MD), (MD,P _AS), (AS,P _AM)} scenarios. RMSLs for (MD,P _AM), (AS,P _AS) are 0.131 and 0.157, which again implies high degree of similarity with the (MD,P _MD) scenario. RMSL values for scenarios (AS,P _MD), (MD,P _AS), and (AS,P _AM) are 0.73, 0.55, and 0.70, respectively, which indicates that CPs are far from the base scenario. Similarly, ΔE[TCP] for different scenarios follow the same pattern. While MAL < 0.021 for {(MD,P _AM), (AS,P _AS)}, MAL is at least 0.076 for {(AS,P _MD), (MD,P _AS), (AS,P _AM)} scenarios.

The comparison of E[TCP] and coverage probability arrays of DVH objectives for CTV for different contour type‐plan scenarios. For D _max objective, all the distributions associated with different scenarios are identical indicating the D _max objective is achieved despite of the setup uncertainties. On the other hand, for D _min and D ₉₅ goals, there are evident differences between scenarios that a contour is evaluated on the plan created based on the same contour {(MD,P _AM), (AS,P _AS)} and the remaining ones {(AS, P _MD), (MD, P _AS), (AS,P _AM)}.

3.B.1. Clinical significance assessment

To estimate de facto current clinically accepted variations in CPs, the CP arrays are calculated for (MD, P _MD) for all different 299 CT image sets as a function of permissible linac day‐to‐day OVs. The worst‐case clinical scenario was used for the simulations, a systematic offset of the output for all treatment fractions. Figure 11 illustrates gain and loss of both maximum absolute and root mean square for all the dosimetric and biological indices as output varied from −10% to 10%. Note that some indices were more sensitive to the changes in output. For instance, the CP of CTV D _min and D ₉₅ change abruptly with a small OV compared to the other metrics. As AAPM TG142²³ recommends that linac OVs be less than ± 2% daily for treatment machines delivering IMRT, the CP variation inherent to this level of variation is taken as the clinically relevant tolerance.

Root mean square and maximum absolute gain and loss of dosimetric objectives and biological indices expectation values as a function of the linear accelerator OV. The metrics were compared with respect to their values when no intensity variation exists for 229 prostate patient instances in this study. Note, as output increases, the probability of achieving the OAR objective decreases as shown by the decreasing CP values. [Color figure can be viewed at wileyonlinelibrary.com]

Table 4 compares the RMSL values of dosimetric indices from the different contour plan/evaluation scenarios to their corresponding thresholds obtained from dose perturbation simulations.

Table 4.

Comparison of similarity metric RMSL with respect to (MD,P _MD) for dosimetric and biological indices when compared with the CP variations due to a ± 2% change in linac OV. The metrics that exceed the thresholds are shaded. Values inside the parentheses are the corresponding OV (interpolated) for each metric in percent

Objective	CTV				Rectum					Bladder
Objective	D ₉₅	D _min	D _max	E[TCP]	D ₁₅	D ₂₀	D ₃₀	D ₅₀	E[NTCP]	D ₁₅	D ₂₅	D ₃₅	D ₅₀	E[NTCP]
Value	78	78	85.8	—	75	70	65	50	—	80	75	70	65	—
±2 OV	0.282	0.722	0	0.026	0.102	0.081	0.052	0.045	0.024	0.135	0.058	0.028	0.019	0.038
(MD, P _AS)	0.190 (−1.5)	0.550 (−1.6)	0 (0.0)	0.009 (−0.7)	0.115 (2.3)	0.119 (3.0)	0.070 (2.7)	0.090 (4.0)	0.018 (1.5)	0.014 (0.2)	0.037 (1.2)	0.030 (2.1)	0.019 (2.0)	0.020 (1.1)
(MD,P _AM)	0.009 (−0.1)	0.131 (−0.4)	0 (0.0)	0.003 (−0.2)	0.069 (1.3)	0.058 (1.4)	0.036 (1.3)	0.010 (0.42)	0.012 (1.0)	0.025 (0.4)	0.022 (0.7)	0.018 (1.1)	0.013 (1.5)	0.008 (0.4)
(AS,P _MD)	0.584 (−2.5)	0.737 (−2.2)	0 (0.0)	0.028 (−2.2)	0.168 (3.3)	0.179 (4.5)	0.184 (6.7)	0.224 (> 10)	0.029 (2.4)	0.002 (0.03)	0.037 (1.2)	0.023 (1.5)	0.002 (0.2)	0.031 (1.7)
(AS,P _AS)	0.008 (−0.1)	0.157 (−0.5)	0 (0.0)	0.004 (−0.3)	0.157 (3.1)	0.198 (5.0)	0.180 (6.5)	0.322 (> 10)	0.027 (2.3)	0.001 (0.02)	0.009 (0.3)	0.008 (0.5)	0.001 (0.1)	0.022 (1.2)
(AS,P _AM)	0.708 (−2.8)	0.451 (−1.3)	0 (0.0)	0.024 (−1.8)	0.165 (3.2)	0.175 (4.4)	0.171 (6.2)	0.199 (8.9)	0.028 (2.3)	0.012 (0.2)	0.017 (0.5)	0.012 (0.7)	0.001 (0.1)	0.029 (1.6)

Open in a new tab

Metrics for the bladder were in tolerance for all scenarios except D ₃₅ for (MD, P _AS), which corresponds to 2.1% change in linac output. For each dosimetric index, the scenario that did not meet its corresponding threshold is highlighted. The numbers in the parentheses are the linac output values corresponding to the observed RMSL in percent. All CTV and all rectal metrics (except D _max) for (AS,P _MD) − (MD,P _MD) exceeded the calculated clinically relevant thresholds, indicating the inadequacy of autotarget contouring.

For planning scenario P _AM, the metrics for bladder and CTV were less than the thresholds disregarding ROI type. However, rectum's metrics failed to meet the CP thresholds except for (MD,P _AM).

Table 5 reports the fraction of the CT image sets that fall below the OV‐based thresholds for MAL. More than 98% of the cases met MAL threshold for dose–volume objectives of the bladder. The lowest fraction occurred for D ₂₀ of rectum in (MD,P _AS) for which 83% of the cases met MAL threshold. The fractions of the cases meeting MAL threshold of E[NTCP] for bladder and rectum were within ± 0.02 across all the evaluation scenarios. For bladder, more than 99% of the cases fell below MAL thresholds in (MD, P _M) scenario. All the cases satisfied MAL threshold for CTV which implies that the OV effect to MAL for indices of CTV was dominant compared to the effect of delineation uncertainty due to autosegmentation to the same metric.

Table 5.

The fraction of CT image sets that meet MAL threshold for dosimetric and biological indices in different scenarios. The threshold was calculated by ± 2% change in OV

Objective	CTV				Rectum					Bladder
Objective	D ₉₅	D _min	D _max	E[TCP]	D ₁₅	D ₂₀	D ₃₀	D ₅₀	E[NTCP]	D ₁₅	D ₂₅	D ₃₅	D ₅₀	E[NTCP]
Value	78	78	85.8	—	75	70	65	50	—	80	75	70	65	—
± 2 OV	0.95	0.999	0.01	0.029	0.234	0.19	0.159	0.178	0.038	0.746	0.300	0.147	0.143	0.062
(MD,P _AS )	1	1	1	1	0.92	0.83	0.88	0.96	0.95	1	0.99	0.99	0.99	0.95
(MD,P _M)	1	1	1	1	0.98	0.92	0.95	0.97	0.98	1	1	0.99	1	1
(AS,P _D)	1	1	1	1	0.95	0.88	0.92	0.94	0.90	1	0.98	0.98	0.98	0.92
(AS,P _AS)	1	1	1	1	0.94	0.88	0.92	0.95	0.91	1	0.98	0.98	0.98	0.98
(AS,P _M)	1	1	1	1	0.92	0.87	0.91	0.94	0.92	1	0.99	0.98	0.99	0.92

Open in a new tab

Considering all the metric and evaluation scenarios, 72% of CT image sets satisfied the clinically relevant threshold for MAL for CP of all dosimetric objectives and all scenarios. Only 50% of the cases satisfy the threshold for all the biological and dosimetric indices. If the thresholds are set to CP values corresponding to ± 1% output variation, then 64% of cases satisfied all the dosimetric worst‐case scenario thresholds, and 25% of cases met both biological and dosimetric indices thresholds. By increasing the CP threshold level to ± 10% OV, around 97.2% of CT image sets satisfy the CP thresholds in scenarios. This means that in 2.8% of the cases, the CP loss due to DU of autosegmentation remains dominant compared to CP losses caused by such a large OV. The results implied that manual and autosegmented ROIs cannot be interchangeably used in all the scenarios.

Considering only (MD,P _AM) scenario, 99.41% and 88.02% of cases met bladder and rectum's CP thresholds, respectively. The worst‐case threshold was limited by rectum since all the 88% passing cases meet CTV and bladder metrics.

4. Discussion

Delineation uncertainty is considered a major contributor to geometric uncertainties in radiotherapy. Nonetheless, radiotherapy has enjoyed current success even though explicit margins are not added to accommodate inter‐ or intraobserver contour variations. Due to the variability in manually drawn contours and the effort required to create them, computer generated autocontours with sufficient accuracy for clinical use have been a long‐sought‐after goal.⁶^,⁷^,²⁴^,²⁵

One barrier to reaching this goal is clarity as to what qualifies as sufficient accuracy or similarity between two alternative contours. The fact that intra‐ and interobserver variations are inherent to all manual segmentations, and such segmentations are considered acceptable implies that some level of variation is acceptable. Therefore, some deviation between autocontours and manual contours must also be acceptable. Numerous authors have investigated differences between autosegmented and manually drawn contours,⁶^,⁷^,²⁴^,²⁵ mainly investigating the accuracy through comparison of contour‐based similarity coefficients. These similarity analyses are intended to reveal how the autosegmented ROIs compared with the manual gold standard, however, they do not assess the adequacy of the autosegmented ROIs for the radiation therapy treatment. Dice similarity coefficient (DSC) is one of the most commonly used metrics to assess the quality of segmented ROIs. However, DSC lacks the spatial information, and therefore infinitely many configurations with different dosimetric outcomes could result in the same DSC. In this paper, the authors’ intention was to analyze autosegmentation algorithm based on dosimetric and biological metrics without considering any other shape similarity metrics. For reference, DSC analysis is in the supplemental material.

A confounding aspect in evaluating ROI and plan equivalence is that clinical treatment plans are created based on expanded version of contoured ROIs, that is, a CTV is expanded to a PTV, and ideally, OARs to planning organ at risk volumes (PRVs). As such, one might consider the overlap of the margin‐expanded manual and autosegmented ROIs in an equivalence assessment, but this too ignores clinical adequacy.

With the combined knowledge that (a) the purpose of safety margins is to ensure coverage/sparing of the underlying CTV/OAR when uncertainties are considered and (b) that the actual coverage/sparing is dictated by the conformity of the dose distribution,¹⁸^,²⁶^,²⁷ we directly assess the CP via treatment delivery simulations. There is a strong precedent for use of simulations to assess CP¹⁶^,¹⁸^,²⁰^,²¹^,²⁸^,²⁹ and the often cited margin formulas by van Herk¹⁷ and Stroom³⁰^,³¹ are based on treatment delivery simulations and population‐based CP estimates. To account for the patient‐/contour‐specific plan and resulting dose distribution, we evaluate CP on a per‐patient/contour set basis. While our CP estimates account for rigid body motions,¹⁹ our method can be extended to include the effects of organ deformations,¹⁶ delineation uncertainties,¹⁰ and other inherent uncertainties. Inclusion of further uncertainties will likely reduce the CP deviations between the planning/evaluation scenarios.

In terms of our specific evaluation to prostate autocontouring, (MD, P _AS) revealed that plans developed with SPICE autocontoured contours, but evaluated with manually drawn contours, yielded coverage estimates that differed from the manual‐plan standard by a clinically significant amount. TCP distributions had lower mean values and higher dispersion when the P _AS plan was used, in line with the statistical similarity assessment of dose–volume objectives. With the CTV being the prostate, this result is not surprising, as the CTV definition is critical to successful radiation therapy.⁸^,³²^,³³

Similarity assessment of dose–volume objectives, CPs, and analyses in clinical significance assessment emphasize the need for improved autosegmentation of the rectum. This is due to the proximity of the rectum to the CTV and overlap with the PTV. The use of injected hydrogel buffer zones³⁴ between the prostate and rectum could mitigate this need.

Concentrating only on the P _M plans with manual target contours but autocontoured OARs, the bladder ΔCP's within the output variation‐derived CP tolerance. Specifically, for scenario (MD, P _MD), 99% of cases met bladder CP tolerance, and 95% of cases have expectation value of NTCP less than the MAL tolerance. Femur probabilistic objectives were met disregarding the scenario.

The advancement of autocontouring techniques in RT has the potential to result in more reliable contour sets with less delineation uncertainties. The probabilistic biological and dosimetric consequences can be evaluated in the proposed framework. It is likely that manual to autocontour and coverage similarity will continue to improve as autosegmentation algorithms evolve and improve. Nonetheless, ROI quality assurance, particularly to prevent use of obtuse contours remains necessary. Combination of knowledge‐based and shape analysis techniques³⁵ may play an important role in this regard in the future.

A potential bias of this study is the limiting the analysis to cases in which SPICE successfully autocontoured all of the required ROIs. Inclusion of partial SPICE successes (some, but not all required contours autosegmented) would have overly confused the study. However, as SPICE independently contours each ROI, substitution of manual ROIs for failed autosegmented ROIs is not expected to alter the suitability of the successfully generated auto‐OAR ROIs.

One should be cautioned that our specific study results are dependent on our planning method and the prostate treatment site. Alternative planning methods with altered conformity could impact the findings. However, alternative planning methods are currently clinically used even though inherent inter‐ and intraobserver variability exists, without apparent coverage concerns. Nonetheless, the proposed approach is general and can be applied to any combinations of segmentations (manual or auto), treatment site, planning technique, and arbitrary probabilistic uncertainty model. This opens up possibilities to evaluate the effect of ROI delineation errors in combination with other aspects involved in treatment plan creation.

5. Conclusion

In this work, we measured autocontouring adequacy in terms of clinical equivalence as measured by coverage probability assessed by treatment delivery simulations. By assessing CP of objectives with alternative underlying ROIs, we evaluated the plan evaluation metric equivalence to the alternative ROIs. Our method provides a systematic way to assess the similarity between two contour sets with respect to probabilistic dosimetric indices and biological endpoints. The dosimetric similarity with respect to a metric was measured by estimating the similarity arrays between probabilities of meeting the desired metric. The method was used to assess the adequacy of SPICE‐generated contours by comparing the CP for prostate IMRT plans based off of manual and autosegmented ROIs. The analyses revealed that in order to statistically achieve clinically relevant tolerance for dosimetric and biological indices, the SPICE's current accuracy of delineating prostate, and rectum needs to be improved.

Conflict of interest

The authors have no relevant conflicts of interest to disclose.

Supporting information

Appendix S1. Dice coefficient calculation.

Appendix S2. Robustness analysis.

Appendix S3. Biological responses.

Click here for additional data file.^{(1,013.5KB, pdf)}

Acknowledgments

The authors thank the Netherlands Cancer Institute for providing the image sets, the VCU P01CA116602 team for preprocessing the image sets, Dr. Elizabeth Weiss from Virginia Commonwealth University for contouring the image sets, and NVIDIA Corporation for supporting this research by providing the computation hardware.

References

1. Bhardwaj AK, Kehwar TS, Chakarvarti SK, et al. Variations in inter‐observer contouring and its impact on dosimetric and radiobiological parameters for intensity‐modulated radiotherapy planning in treatment of localised prostate cancer. J Radiother Pract. 2008;7:77–88. [Google Scholar]
2. Petric P, Dimopoulos J, Kirisits C, Berger D, Hudej R, Pötter R. Inter‐ and intraobserver variation in HR‐CTV contouring: intercomparison of transverse and paratransverse image orientation in 3D‐MRI assisted cervix cancer brachytherapy. Radiother Oncol J Eur Soc Ther Radiol Oncol. 2008;89:164–171. [DOI] [PubMed] [Google Scholar]
3. Brouwer CL, Steenbakkers RJHM, van den Heuvel E, et al. 3D Variation in delineation of head and neck organs at risk. Radiat Oncol. 2012;7:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Fiorino C, Reni M, Bolognesi A, Cattaneo GM, Calandrino R. Intra‐ and inter‐observer variability in contouring prostate and seminal vesicles: implications for conformal treatment planning. Radiother Oncol J Eur Soc Ther Radiol Oncol. 1998;47:285–292. [DOI] [PubMed] [Google Scholar]
5. Nakamura K, Shioyama Y, Tokumaru S, et al. Variation of clinical target volume definition among japanese radiation oncologists in external beam radiotherapy for prostate cancer. Jpn J Clin Oncol. 2008;38:275–280. [DOI] [PubMed] [Google Scholar]
6. Bzdusek KB, Pekar V, Peters J, et al. Smart Probabilistic Image Contouring Engine (SPICE). Fitchburg, WI: Philips Healthcare; 2012. [Google Scholar]
7. Caria N, Engels B, Bral S, et al. Varian Smartsegmentation^® Knowledge‐Based Contouring. Clinical Evaluation of an Automated Segmentation Module. Varian Medical Systems, Palo Alto, CA USA; 1–8. Available at: https://www.varian.com/sites/default/files/resource_attachments/SmartSegClinicalPerspectives_0.pdf (accessed 10 May 2016). [Google Scholar]
8. Weiss E, Hess CF. The impact of gross tumor volume (GTV) and clinical target volume (CTV) definition on the total accuracy in radiotherapy theoretical aspects and practical experiences. Strahlentherapie Und Onkol Organ Der Dtsch Röntgengesellschaft. et Al. 2003;179:21–30. [DOI] [PubMed] [Google Scholar]
9. Li XA, Liu F, Tai A, et al. Development of an online adaptive solution to account for inter‐ and intra‐fractional variations. Radiother Oncol. 2011;100:370–374. [DOI] [PubMed] [Google Scholar]
10. Xu H, Gordon JJ, Siebers JV. Coverage‐based treatment planning to accommodate delineation uncertainties in prostate cancer treatment. Med Phys. 2015;42:5435–5443. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Tsuji SY, Hwang A, Weinberg V, Yom SS, Quivey JM, Xia P. Dosimetric Evaluation of Automatic Segmentation for Adaptive IMRT for Head‐and‐Neck Cancer. Int J Radiat Oncol Biol Phys. 2010;77:707–714. [DOI] [PubMed] [Google Scholar]
12. Beasley WJ, McWilliam A, Aitkenhead A, Mackay RI, Rowbottom CG. The suitability of common metrics for assessing parotid and larynx autosegmentation accuracy. J Appl Clin Med Phys. 2016;17:41–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Eiland RB, Maare C, Sjöström D, Samsøe E, Behrens CF. Dosimetric and geometric evaluation of the use of deformable image registration in adaptive intensity‐modulated radiotherapy for head‐and‐neck cancer. J Radiat Res. 2014;55:1002–1008. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Deurloo KEI, Steenbakkers RJHM, Zijp LJ, et al. Quantification of shape variation of prostate and seminal vesicles during external beam radiotherapy. Int J Radiat Oncol Biol Phys. 2005;61:228–238. [DOI] [PubMed] [Google Scholar]
15. Sharma M, Weiss E, Siebers JV. Dose deformation‐invariance in adaptive prostate radiation therapy: implication for treatment simulations. Radiother Oncol. 2012;105:207–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Xu H, Vile DJ, Sharma M, Gordon JJ, Siebers JV. Coverage‐based treatment planning to accommodate deformable organ variations in prostate cancer treatment. Med Phys. 2014;41:101705. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. van Herk M. Errors and margins in radiotherapy. Semin Radiat Oncol. 2004;14:52–64. [DOI] [PubMed] [Google Scholar]
18. Xu H, Gordon JJ, Siebers JV. Sensitivity of postplanning target and OAR coverage estimates to dosimetric margin distribution sampling parameters. Med Phys. 2011;38:1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Gordon JJ, Sayah N, Weiss E, Siebers JV. Coverage optimized planning: probabilistic treatment planning based on dose coverage histogram criteria. Med Phys. 2010;37:550–563. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Stroom JC, De Boer HCJ, Huizenga H, Visser AG. Inclusion of geometrical uncertainties in radiotherapy treatment planning by means of coverage probability. Int J Radiat Oncol Biol Phys. 1999;43:905–919. [DOI] [PubMed] [Google Scholar]
21. Gordon JJ, Siebers JV. Coverage‐based treatment planning: optimizing the IMRT PTV to meet a CTV coverage criterion. Med Phys. 2009;36:961–973. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Xu H. A Study of Coverage Optimized Planning Incorporating Models of Geometric Uncertainties for Prostate Cancer. PhD dissertation, Virginia Commonwealth University; 2013. [Google Scholar]
23. Klein EE, Hanley J, Bayouth J, et al. Task Group 142 report: quality assurance of medical accelerators. Med Phys. 2009;36:4197–4212. [DOI] [PubMed] [Google Scholar]
24. Huyskens DP, Maingon P, Vanuytsel L, et al. A qualitative and a quantitative analysis of an auto‐segmentation module for prostate cancer. Radiother Oncol J Eur Soc Ther Radiol Oncol. 2009;90:337–345. [DOI] [PubMed] [Google Scholar]
25. Simmat I, Georg P, Georg D, Birkfellner W, Goldner G, Stock M. Assessment of accuracy and efficiency of atlas‐based autosegmentation for prostate radiotherapy in a variety of clinical conditions. Strahlentherapie und Onkol. 2012;188:807–813. [DOI] [PubMed] [Google Scholar]
26. Gordon JJ, Crimaldi AJ, Hagan M, Moore J, Siebers JV. Evaluation of clinical margins via simulation of patient setup errors in prostate IMRT treatment plans. Med Phys. 2007;34:202–214. [DOI] [PubMed] [Google Scholar]
27. Gordon JJ, Siebers JV. Evaluation of dosimetric margins in prostate IMRT treatment plans. Med Phys. 2008;35:569–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Tilly D, Ahnesjö A. Fast dose algorithm for generation of dose coverage probability for robustness analysis of fractionated radiotherapy. Phys Med Biol. 2015;60:5439–5454. [DOI] [PubMed] [Google Scholar]
29. Bohoslavsky R, Witte MG, Janssen TM, vanHerk M . Probabilistic objective functions for margin‐less IMRT planning. Phys Med Biol. 2013;58:3563. [DOI] [PubMed] [Google Scholar]
30. Stroom JC, Storchi PR. Automatic calculation of three‐dimensional margins around treatment volumes in radiotherapy planning. Phys Med Biol. 1997;42:745–755. [DOI] [PubMed] [Google Scholar]
31. Stroom JC, Koper PC, Korevaar GA, et al. Internal organ motion in prostate cancer patients treated in prone and supine treatment position. Radiother Oncol J Eur Soc Ther Radiol Oncol. 1999;51:237–248. [DOI] [PubMed] [Google Scholar]
32. Poortmans P, Bossi A, Vandeputte K, et al. Guidelines for target volume definition in post‐operative radiotherapy for prostate cancer, on behalf of the EORTC Radiation Oncology Group. Radiother Oncol. 2007;84:121–127. [DOI] [PubMed] [Google Scholar]
33. Rasch C, Steenbakkers R, Van Herk M. Target definition in prostate, head, and neck. Semin Radiat Oncol. 2005;15:136–145. [DOI] [PubMed] [Google Scholar]
34. Pinkawa M, Berneking V, König L, Frank D, Bretgeld M, Eble MJ. Hydrogel injection reduces rectal toxicity after radiotherapy for localized prostate cancer. Strahlentherapie und Onkol. 2017;193:22–28. [DOI] [PubMed] [Google Scholar]
35. Altman MB, Kavanaugh JA, Wooten HO et al. A framework for automated contour quality assurance in radiation therapy including adaptive techniques. Phys Med Biol. 2015;60:5199–5209. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix S1. Dice coefficient calculation.

Appendix S2. Robustness analysis.

Appendix S3. Biological responses.

Click here for additional data file.^{(1,013.5KB, pdf)}

[mp12158-bib-0001] 1. Bhardwaj AK, Kehwar TS, Chakarvarti SK, et al. Variations in inter‐observer contouring and its impact on dosimetric and radiobiological parameters for intensity‐modulated radiotherapy planning in treatment of localised prostate cancer. J Radiother Pract. 2008;7:77–88. [Google Scholar]

[mp12158-bib-0002] 2. Petric P, Dimopoulos J, Kirisits C, Berger D, Hudej R, Pötter R. Inter‐ and intraobserver variation in HR‐CTV contouring: intercomparison of transverse and paratransverse image orientation in 3D‐MRI assisted cervix cancer brachytherapy. Radiother Oncol J Eur Soc Ther Radiol Oncol. 2008;89:164–171. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0003] 3. Brouwer CL, Steenbakkers RJHM, van den Heuvel E, et al. 3D Variation in delineation of head and neck organs at risk. Radiat Oncol. 2012;7:32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mp12158-bib-0004] 4. Fiorino C, Reni M, Bolognesi A, Cattaneo GM, Calandrino R. Intra‐ and inter‐observer variability in contouring prostate and seminal vesicles: implications for conformal treatment planning. Radiother Oncol J Eur Soc Ther Radiol Oncol. 1998;47:285–292. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0005] 5. Nakamura K, Shioyama Y, Tokumaru S, et al. Variation of clinical target volume definition among japanese radiation oncologists in external beam radiotherapy for prostate cancer. Jpn J Clin Oncol. 2008;38:275–280. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0006] 6. Bzdusek KB, Pekar V, Peters J, et al. Smart Probabilistic Image Contouring Engine (SPICE). Fitchburg, WI: Philips Healthcare; 2012. [Google Scholar]

[mp12158-bib-0007] 7. Caria N, Engels B, Bral S, et al. Varian Smartsegmentation^® Knowledge‐Based Contouring. Clinical Evaluation of an Automated Segmentation Module. Varian Medical Systems, Palo Alto, CA USA; 1–8. Available at: https://www.varian.com/sites/default/files/resource_attachments/SmartSegClinicalPerspectives_0.pdf (accessed 10 May 2016). [Google Scholar]

[mp12158-bib-0008] 8. Weiss E, Hess CF. The impact of gross tumor volume (GTV) and clinical target volume (CTV) definition on the total accuracy in radiotherapy theoretical aspects and practical experiences. Strahlentherapie Und Onkol Organ Der Dtsch Röntgengesellschaft. et Al. 2003;179:21–30. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0009] 9. Li XA, Liu F, Tai A, et al. Development of an online adaptive solution to account for inter‐ and intra‐fractional variations. Radiother Oncol. 2011;100:370–374. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0010] 10. Xu H, Gordon JJ, Siebers JV. Coverage‐based treatment planning to accommodate delineation uncertainties in prostate cancer treatment. Med Phys. 2015;42:5435–5443. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mp12158-bib-0011] 11. Tsuji SY, Hwang A, Weinberg V, Yom SS, Quivey JM, Xia P. Dosimetric Evaluation of Automatic Segmentation for Adaptive IMRT for Head‐and‐Neck Cancer. Int J Radiat Oncol Biol Phys. 2010;77:707–714. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0012] 12. Beasley WJ, McWilliam A, Aitkenhead A, Mackay RI, Rowbottom CG. The suitability of common metrics for assessing parotid and larynx autosegmentation accuracy. J Appl Clin Med Phys. 2016;17:41–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mp12158-bib-0013] 13. Eiland RB, Maare C, Sjöström D, Samsøe E, Behrens CF. Dosimetric and geometric evaluation of the use of deformable image registration in adaptive intensity‐modulated radiotherapy for head‐and‐neck cancer. J Radiat Res. 2014;55:1002–1008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mp12158-bib-0014] 14. Deurloo KEI, Steenbakkers RJHM, Zijp LJ, et al. Quantification of shape variation of prostate and seminal vesicles during external beam radiotherapy. Int J Radiat Oncol Biol Phys. 2005;61:228–238. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0015] 15. Sharma M, Weiss E, Siebers JV. Dose deformation‐invariance in adaptive prostate radiation therapy: implication for treatment simulations. Radiother Oncol. 2012;105:207–213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mp12158-bib-0016] 16. Xu H, Vile DJ, Sharma M, Gordon JJ, Siebers JV. Coverage‐based treatment planning to accommodate deformable organ variations in prostate cancer treatment. Med Phys. 2014;41:101705. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mp12158-bib-0017] 17. van Herk M. Errors and margins in radiotherapy. Semin Radiat Oncol. 2004;14:52–64. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0018] 18. Xu H, Gordon JJ, Siebers JV. Sensitivity of postplanning target and OAR coverage estimates to dosimetric margin distribution sampling parameters. Med Phys. 2011;38:1018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mp12158-bib-0019] 19. Gordon JJ, Sayah N, Weiss E, Siebers JV. Coverage optimized planning: probabilistic treatment planning based on dose coverage histogram criteria. Med Phys. 2010;37:550–563. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mp12158-bib-0020] 20. Stroom JC, De Boer HCJ, Huizenga H, Visser AG. Inclusion of geometrical uncertainties in radiotherapy treatment planning by means of coverage probability. Int J Radiat Oncol Biol Phys. 1999;43:905–919. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0021] 21. Gordon JJ, Siebers JV. Coverage‐based treatment planning: optimizing the IMRT PTV to meet a CTV coverage criterion. Med Phys. 2009;36:961–973. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mp12158-bib-0022] 22. Xu H. A Study of Coverage Optimized Planning Incorporating Models of Geometric Uncertainties for Prostate Cancer. PhD dissertation, Virginia Commonwealth University; 2013. [Google Scholar]

[mp12158-bib-0023] 23. Klein EE, Hanley J, Bayouth J, et al. Task Group 142 report: quality assurance of medical accelerators. Med Phys. 2009;36:4197–4212. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0024] 24. Huyskens DP, Maingon P, Vanuytsel L, et al. A qualitative and a quantitative analysis of an auto‐segmentation module for prostate cancer. Radiother Oncol J Eur Soc Ther Radiol Oncol. 2009;90:337–345. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0025] 25. Simmat I, Georg P, Georg D, Birkfellner W, Goldner G, Stock M. Assessment of accuracy and efficiency of atlas‐based autosegmentation for prostate radiotherapy in a variety of clinical conditions. Strahlentherapie und Onkol. 2012;188:807–813. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0026] 26. Gordon JJ, Crimaldi AJ, Hagan M, Moore J, Siebers JV. Evaluation of clinical margins via simulation of patient setup errors in prostate IMRT treatment plans. Med Phys. 2007;34:202–214. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0027] 27. Gordon JJ, Siebers JV. Evaluation of dosimetric margins in prostate IMRT treatment plans. Med Phys. 2008;35:569–575. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mp12158-bib-0028] 28. Tilly D, Ahnesjö A. Fast dose algorithm for generation of dose coverage probability for robustness analysis of fractionated radiotherapy. Phys Med Biol. 2015;60:5439–5454. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0029] 29. Bohoslavsky R, Witte MG, Janssen TM, vanHerk M . Probabilistic objective functions for margin‐less IMRT planning. Phys Med Biol. 2013;58:3563. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0030] 30. Stroom JC, Storchi PR. Automatic calculation of three‐dimensional margins around treatment volumes in radiotherapy planning. Phys Med Biol. 1997;42:745–755. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0031] 31. Stroom JC, Koper PC, Korevaar GA, et al. Internal organ motion in prostate cancer patients treated in prone and supine treatment position. Radiother Oncol J Eur Soc Ther Radiol Oncol. 1999;51:237–248. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0032] 32. Poortmans P, Bossi A, Vandeputte K, et al. Guidelines for target volume definition in post‐operative radiotherapy for prostate cancer, on behalf of the EORTC Radiation Oncology Group. Radiother Oncol. 2007;84:121–127. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0033] 33. Rasch C, Steenbakkers R, Van Herk M. Target definition in prostate, head, and neck. Semin Radiat Oncol. 2005;15:136–145. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0034] 34. Pinkawa M, Berneking V, König L, Frank D, Bretgeld M, Eble MJ. Hydrogel injection reduces rectal toxicity after radiotherapy for localized prostate cancer. Strahlentherapie und Onkol. 2017;193:22–28. [DOI] [PubMed] [Google Scholar]

[mp12158-bib-0035] 35. Altman MB, Kavanaugh JA, Wooten HO et al. A framework for automated contour quality assurance in radiation therapy including adaptive techniques. Phys Med Biol. 2015;60:5199–5209. [DOI] [PubMed] [Google Scholar]

PERMALINK

Clinical adequacy assessment of autocontours for prostate IMRT with meaningful endpoints

Hamidreza Nourzadeh

William T Watkins

Mahmoud Ahmed

Cheukkai Hui

David Schlesinger

Jeffrey V Siebers

Abstract

Purpose

Method and materials

Results

Conclusions

1. Introduction

2. Methods

2.A. Treatment plan creation and coverage probability evaluation

Figure 1.

Table 1.

Table 2.

Figure 2.

2.B. Clinical significance assessment

Figure 3.

3. Results

Table 3.

3.A. Similarity assessment of dosimetric and biological indices in the absence of uncertainties

Figure 4.

Figure 5.

Figure 6.

Figure 7.

3.B. Similarity assessment of dosimetric and biological indices in the presence of uncertainties

Figure 8.

Figure 9.

Figure 10.

3.B.1. Clinical significance assessment

Figure 11.

Table 4.

Table 5.

4. Discussion

5. Conclusion

Conflict of interest

Supporting information

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases