Skip to main content
The British Journal of Radiology logoLink to The British Journal of Radiology
. 2017 Oct 30;90(1080):20170633. doi: 10.1259/bjr.20170633

Dependence of volume calculation and margin growth accuracy on treatment planning systems for stereotactic radiosurgery

David J Eaton 1,, Kevin Alty 2
PMCID: PMC6047653  PMID: 29022748

Abstract

Objective:

Uncertainties in radiotherapy target structures are partly dependent on differences between volume calculation and margin growing methods in treatment planning systems (TPS). These uncertainties are exacerbated with very small structures such as those common in stereotactic radiosurgery.

Methods:

Data from a national commissioning programme for SRS was used to assess variation in reported volumes for six benchmark cases, including malignant and benign indications. Reported volumes were compared both with and without any margins added according to local practice.

Results:

137 plans were submitted, with a total of 311 structures and covering seven TPS. For volumes < 1 cm3 agreement was within 0.05 cm3, and for volumes > 1 cm3 agreement was within 5%. Systematic differences were seen between TPS, partly because of different methods for calculating the end slice volume. About one third of structures had a margin added, of 1–2 mm. Most TPS over-grew the volumes, compared to the approximation of a perfect sphere, especially Pinnacle and Eclipse.

Conclusion:

Differences between volume calculation methods may lead to 5–10% variation in reported volumes from different TPS. This should be taken into account when comparing multicentre studies, and it is recommended that a minimum volume of 0.05 cm3 be used for any near-point doses to allow more consistent comparisons. When margins are added to small structures, there may be up to 40% difference to nominal margin size. Such differences are still small compared to interobserver variation in delineation.

Advances in knowledge:

This study quantifies the potential uncertainties in clinical volume calculation and margin growth with small radiosurgical targets.

INTRODUCTION

Delineation of the target volume is a key step in the radiotherapy planning process, and has recently been suggested as one of the largest sources of uncertainty in radiotherapy delivery.1 One factor affecting this is the user’s ability to accurately and precisely delineate structures using limited image information; another is the accurate calculation of volumes in 3D based on series of points defined on pixelated 2D image slices. Methods for such calculations differ between treatment planning systems (TPS), with variation of 5–10% reported for both regular geometric shapes24 and actual clinical targets.46

For the majority of external beam treatments, uniform margins are added to the delineated (gross) target volume (GTV) to account for sub-clinical spread and for uncertainties in delivery, leading to the formation of the planning target volume (PTV). Methods for applying these margins also differ between TPS, with variations of about 1 mm compared to expected growth being reported.3,68

Accuracy of volume calculation can in turn affect plan parameters such as dose-volume histograms.9 For clinical trials’ quality assurance, benchmark cases are routinely used to assess variation from protocol in recruiting centres, and ensure consistency of contouring, planning and delivery.10 However, assessment of planning benchmarks can be undermined by differences in the "standard" volumes that are provided. For stereotactic radiosurgery (SRS), submillimeter accuracy is routinely achieved in treatment delivery and therefore even small variations in volumes might have significant impact on these plans.

Previous studies into volume and margin consistency have typically investigated what are now old or obsolete planning systems, and only one considered the very small structures typically encountered in radiosurgery.4 Therefore, the aim of this study was to use data collected as part of a quality assurance programme for the national commissioning of SRS, to determine variations in calculated volumes for a range of clinical treatment sites and planning systems.

METHODS AND MATERIALS

The National Radiotherapy Trials Quality Assurance (RTTQA) group performed a quality assurance programme for all clinical providers, who were commissioned to deliver SRS in England. This was used to facilitate sharing of best practice and progress standardization and improvement of service quality. All clinical centres undertook planning benchmark cases, providing a unique dataset of current practice across a large number of centres and a wide range of equipment. Six planning cases were distributed in DICOM-RT standard format, including CT and MR images and pre-delineated structure sets, as shown in Table 1. Structures in DICOM format consist of a series of 2D contours, comprised of point co-ordinates delineated in the plane of the image slices. For these cases, the contours were discretized to the resolution of the CT images.

Table 1.

Details of the planning benchmark cases

Case description Indicative volume (cm3) In-plane CT resolution (mm) CT slice thickness (mm)
Three metastases 0.1–0.6 0.7 × 0.7 1.0
Seven metastases 0.1–7.2 1.0 × 1.0 1.0
Intracanalicular vestibular schwannoma <0.1 0.5 × 0.5 1.5
Large vestibular schwannoma 1.9 1.0 × 1.0 1.0
Skull base meningioma 1.6 0.5 × 0.5 1.5
Pituitary 1.1 0.6 × 0.6 0.5

Centres reported the volumes as calculated on their local TPS for the GTV and PTV regions. Plans were created for some or all cases, depending on intended clinical practice. Volumes were also calculated in VODCA (Medical Software Solutions GmbH, Hagendorn, Switzerland), which is the independent software used for trials quality assurance plan review by the RTTQA group. This has two options for volume calculation, either extending the structure beyond the end slice co-ordinate by a rectangle of width equal to half the slice thickness (default) or stopping the structure on the end slice co-ordinate (half first and last slice option).

Independent calculation of margin size was performed by assuming structures were perfect spheres, and calculating the effective radius of GTV and PTV, independent of the pixel size in different directions. The difference was then compared to the stated margin applied. This is similar to the method used by Smith et al3 and Wang et al8 for spherical phantom structures.

RESULTS

Volume calculation

137 plan submissions were received for 25 different platforms, including 13 plan revisions with modified margins or dose grid size, giving a total of 311 structures. They covered seven TPS, as shown in Table 2.

Table 2.

Planning systems used for benchmark case submissions, and hence volume calculation

TPS Manufacturer Number of centres Version(s) Margin growth software (if different)
Eclipse Varian 3 11.0 (2), 13.7
GammaPlan Elekta 7 10.1 (5), 11.0 (2)
iPlan BrainLAB 5 4.51–4.54
Multiplan Accuray 3 5.21
Monaco Elekta (CMS) 1 5.2 Prosoma (MedCom)
Pinnacle Phillips 4 9.6, 9.8 (2), 14.0 Prosoma (1)
Tomotherapy Accuray 2 5.0, 5.1 Prosoma (1), Focal (CMS, 1)
VODCA Medical Software Solutions n/a 4.5

TPS, treatment planning systems.

If margins were grown first in another software, this is indicated.

Variation in mean volumes calculated for each TPS is shown in Figure 1 and Table 3. For volumes < 1 cm3 agreement was within 0.04 cm3, and for volumes > 1 cm3 agreement was within 5%. Average differences were most positive for Pinnacle and VODCA and most negative for GammaPlan and Eclipse, compared to the mean of all TPS. Recalculating VODCA volumes using the half first and last slice option gave an average difference of −1.9% compared to the mean of all TPS. Variation in volumes calculated between different centres with the same TPS is shown in Figure 2. For volumes < 1 cm3 agreement was within 0.05 cm3, and for volumes > 1 cm3 agreement was within 5%.

Figure 1.

Figure 1.

Variation in calculated mean GTV for each TPS against logarithm of mean volume across all TPS. Two values are not shown off-scale for the 7.2 cm3 lesion: VODCA + 0.14 cm3 (+1.9%) and Tomotherapy −0.17 cm3 (−2.4%). GTV, gross target volume; TPS, treatment planning systems.

Table 3.

Average percentage difference for each TPS from mean of all TPS, for volumes greater than 0.1 cm3

Eclipse GammaPlan iPlan Multiplan Tomotherapy Monaco Pinnacle VODCA
−6.3% −4.0% +0.3% +1.1% +1.1% 3.1% +5.2% +5.2%

TPS, treatment planning systems.

Figure 2.

Figure 2.

Range of variation in calculated GTV for centres with the same TPS, against mean volume across all TPS. Values for the 7.2 cm3 lesion are not shown, but were all within 0.11 cm3 (1.5%). GTV, gross target volume; TPS, treatment planning systems.

Margin growth

183 (59%) target structures had zero PTV margin, while 128 (41%) used a nominal growth of 1–2 mm, depending on local practice. All GammaPlan cases used zero margin, but other TPS varied from 0 to 2 mm. Variation between the margin calculated using the spherical volume approximation and the nominal margin applied is shown in Figure 3. Most TPS over-grow the volumes, compared to the spherical approximation, especially Pinnacle, Eclipse and iPlan. However, behaviour is not consistent with increasing volume. Variation in final volume between centres with the same nominal margin applied is shown in Table 4, for three example lesions of different sizes.

Figure 3.

Figure 3.

Variation in margin calculated assuming each volume was spherical, compared to the nominal margin applied. Values for the 7.2 cm3 lesion are not shown, but were all within +0.2 to +0.4 mm. No GammaPlan values are shown as all centres applied zero margin. Two Monaco (Prosoma) points were excluded as the centre had grown by 1 mm, then edited the volume to exclude OARs, so the margin was no longer uniform. GTV, gross target volume; OAR, organ-at-risk; TPS, treatment planning systems.

Table 4.

Range of inter-TPS variation in volumes, for typical small, medium and large lesions, with impact of different margins applied

Lesion Variation in GTV (zero margin, cm3) Ratio max/min Variation in PTV (1 mm margin, cm3) Ratio max/min Variation in PTV (2 mm margin, cm3) Ratio max/min
Seven metastases case (PTV5) Mean 0.21 (0.18–0.23) 1.28 0.44–0.61 1.39 0.70–1.00 1.43
Meningioma case Mean 1.58 (1.50–1.64) 1.09 2.64–3.00 1.14 4.01–4.63 1.15
Seven metastases case (PTV1) Mean 7.21 (7.04–7.36) 1.05 9.72–10.6 1.09 12.0–12.8 1.07

GTV, gross target volume; PTV, planning target volume; TPS, treatment planning systems.

DISCUSSION

Volume calculation

There exists no standard approach to the calculation of 3D volumes from point co-ordinates in DICOM, with methods varying between planning systems. These differences are particularly apparent in the small structures common with SRS, partly because of the limiting uncertainty of the image resolution. While no gold standard exists in this series, GammaPlan and Eclipse consistently gave the lowest volumes, with Pinnacle and VODCA (default settings) the highest. There was also unexpected variation between centres with same TPS, systematically highest for Pinnacle & iPlan, although individual outliers were seen for Eclipse and Multiplan. However, all results between centres showed agreement to within 0.05 cm3 (<1 cm3) or 5% (>1 cm3). This should have marginal impact on most treatments but could be significant for small structures.

Other studies have typically considered larger structures for fractionated extracranial treatments, but report similar behaviour for some TPS depending on their handling of the end slices. The impact of this effect will also depend on the shape of the structure, i.e. the relative cross-section in the longitudinal direction. The extreme approaches are to either truncate the structure on the final slice co-ordinate (giving a half slice thickness at each end) or to extend the structure on every slice by a rectangle of equal size to the on-slice contour. Ackerley et al5 found that the CadPlan TPS (now obsolete) underestimated lung target volumes by 6–12% since the end slice was truncated, and suggested manual correction to match other TPS within 1.5%. GammaPlan also does not extend the structure beyond the centre of the image plane (Björn Somell, Elekta Instrument AB, private communication), so gives values less than other TPS. However, when VODCA settings were changed to attempt to match this behaviour, higher values were generated (by 2% on average), so it appears that the approach taken by GammaPlan (and Eclipse) is even more conservative.

Conversely, Pinnacle (and VODCA with standard settings) extends the end slices by a rectangle, so would be expected to give larger volumes, as was observed. Smith et al3 used a phantom with spheres and rods of known volumes and found about 5% variation between TPS, including Pinnacle and Eclipse. These structures were typically much larger than found in SRS planning however. Prabhakar et al6 calculated volumes for 60 patients of various anatomical sites with seven TPS, finding that Pinnacle (and PLATO also obsolete) gave the highest values, up to 10% higher than Eclipse (v. 6.5) for approximately 10 cm3 volumes and ±3% for brain cases.

Ma et al4 performed the only known study using SRS structures and TPS, using four phantom spheres of known volume (0.1–29 cm3) and one 12 metastases clinical case. They reported volume differences of −1 to +22% (spheres), with all but one TPS overestimating the known sphere volumes. For the smallest sphere absolute differences were within 0.03 cm3, however. For the clinical case, 3–10% variation from the mean of all TPS was found. Although this study included GammaPlan, Multiplan, Pinnacle, Eclipse and Tomotherapy, results are presented anonymously, so comparison to the present study is difficult. They also emphasize the importance of handling the end slices in a structure.

Without a definitive volume it is not possible to state which systems are more accurate, but the differences are readily apparent, and could impact on intercentre studies. This study also considered typical clinical cases, rather than a systematic investigation of image resolution and other parameters that might impact individual systems. Details of calculation methods in commercial software are often proprietary and not completely described in user manuals, therefore intercentre comparison studies such as these may be the best way to highlight differences for realistic clinical cases. Differences between centres using the same TPS were unexpected and unresolved, but may reflect differences in software version, (dose) grid size, import method and internal settings for volume reporting. It is recommended that manufacturers provide estimates of uncertainty alongside volume calculations, and detailed description of any settings affecting calculations.

Margin growth

The spherical approximation does not show a substantial variation across the range of volumes, with few exceptions. Values for the meningioma case (1.6 cm3) appear systematically higher than other structures, which may reflect the irregular shape of this lesion compared to metastases which are often close to a spherical shape. The large vestibular schwannoma case (1.9 cm3) is also non-spherical, but appears consistent with the other (spherical) metastases lesions, suggesting that the approximation still holds for structures such as these, with partial elongation in just one direction.

Only 41% of centres applied any margin to the provided target structures. Within these, most TPS over-grow compared to this approximation, with Pinnacle typically giving the largest actual growth, followed by Eclipse and iPlan. Conversely, Multiplan and Monaco under-grow the volume compared to a perfect sphere, which should be the lower limit for an irregular shape, and may reflect the limited image resolution. Values were mostly consistent between centres with the same TPS, although for the seven metastases case, the two Pinnacle centres differed by 0.2–0.4 mm per mm of growth. Reasons for these differences are not known, but demonstrate the importance of intercentre comparisons such as these. Overall, differences may reflect methods for discretizing the grown volume back to the image grid resolution, and local parameters such as TPS software version and (dose) grid size.

Smith et al3 grew spheres with known volumes by 10–30 mm margins, and used a similar method to this study to compare effective radiuses, including for a “small” sphere (83 cm3). Differences to the nominal margin were +0.8 - + 1.0 mm (Pinnacle) and +0.3 - + 0.4 mm (Eclipse), which was considered acceptable since the in-plane resolution of the images was 1 mm. Wang et al8 performed an extensive study of margin growth from 1 to 30 mm for spherical phantoms with volumes of 0.02–1151 cm3, but in Eclipse only. For volumes less than 1 cm3 they found large errors in underlying volume calculation, although only +10% for a 0.1 cm3 sphere using 1 mm slices and 0.7 mm in-plane resolution, and lower errors for volumes greater than 1 cm3.

Pooler et al7 grew clinical prostate target structures by 5–10 mm margins, calculating all volumes in Pinnacle to avoid 11% differences in underlying calculation. Standard deviation of PTVs between seven TPS was 4–6% on 2.5 mm slices, but PTVs grown in Pinnacle (v. 8.0) were 7–10% bigger than those grown in Eclipse (v. 7.5), which was within 2% of the mean of all seven TPS. This led to the suggestion that Pinnacle structures be truncated by half a slice longitudinally to match other TPS (the opposite of the suggestion by Ackerley et al5 to manually increase the smallest volumes). Prabhakar et al6 grew target structures by 5–10 mm margins for 60 patients of various anatomical sites with seven TPS, also finding that Pinnacle (and PLATO also obsolete) gave the highest values.

These studies are generally in agreement with the findings of this work, except that differences are magnified for small structures, as shown in Table 4. Therefore, caution is advised when growing small structures by small margins, as there may be up to 40% difference to nominal margin size, with substantially different total volumes as a result. These differences will completely dominate any variations (of 0.05 cm3 or 5%) in underlying GTV calculation, and may be the reason for some centres applying no margin even when the system uncertainty is non-zero, such as Gamma Knife (GammaPlan) and CyberKnife (Multiplan).

Comparison to other uncertainties

In this study, we have not quantified the effect differences in volume will have on dose-volume histogram (DVH) calculation, or the clinical significance of these variations. Kirisits et al2 found 2–5% variation in total volume corresponded with 1–5% variation in dose to near-point-maximum (2 cm3) calculated by various brachytherapy TPS. Ebert et al9 assessed data from 33 centres for a pelvic phantom with various volume sizes. Less than 100 cm3, volumes agreed within 2–4% between five TPS and internal review software SWAN. Maximum dose point agreement was −0.1 ± 0.3% and minimum point dose was −0.9 ± 1.5% compared to SWAN. This was considered acceptable compared to desired accuracy of 5% for volumes and 2% for dose calculation. The impact of volume variations for SRS plans on DVH calculation can be investigated in future work; however, a similar level of variation in dose-volume values is possible (i.e. 5–10%).

Differences should be also considered relative to the uncertainty in delineation by human operators, which includes the sensitivity and specificity of the images themselves, and hence the ability to determine the underlying structure of clinical interest. Kirisits et al2 used phantoms with known volumes to assess variation across brachytherapy planning systems and found agreement within 5% using 2 mm slices, which was comparable to interobserver variation. Differences in total volumes have been reported of 20% (clinical lung targets)5 and 13% (small cylinder).2

For SRS, Stanley et al11 compared volumes of 14 metastases (0.1–2.5 cm3) across eight physicians, and found a median ratio of largest to smallest structures of 1.7 (range 1.3–4.7). They also found little, if any, correlation between the physicians’ confidence and the agreement in size of the contoured volumes. Yamazaki et al12 assessed target delineation for a pituitary and meningioma case, across 11 clinicians in seven centres. Median volumes were 9.2 cm3 (range 7.2–14.3) and 6.9 cm3 (range 6.1–14.6), respectively, giving ratios of largest to smallest of 2.0 and 2.4. Sandstrom et al13 compared contouring of a meningioma and astrocytoma case across 20 institutions. Median volumes were 6.7 cm3 (range 5.0–7.7) and 2.9 cm3 (range 1.7–21.5), respectively, giving ratios of largest to smallest of 1.5 and 12.6. However, these are uncommon cases, so greater variation would be expected. Sandstrom et al14 develop the argument further using target and organ-at-risk contouring on three cases (vestibular schwannoma, meningioma, pituitary) by demonstrating the adverse effect interoperator variability has on SRS treatments and reporting statistics.

All of these studies used the same software for all observers, but it is clear that interobserver variation currently exceeds the variation between calculation and margin growth because of the software itself. However, this is not a reason to ignore these effects and one conclusion relates to the use of near-point maximum doses used for reporting. Volumes below 0.05 cm3 are not consistent between TPS, so very small volumes should be avoided for dose reporting. Recently, the International Commission on Radiation Units and Measurements (ICRU) report 9115 recommended that near-maximum and near-minimum doses for SRS should be reported using an absolute volume of 0.035 cm3 rather than 2% of the volume, for targets less than 2 cm3. Our findings show that volumes below 0.05 cm3 are not consistent between TPS, and we recommended that the minimum volume for near-point doses should be at least 0.05 cm3. If the total volume is comparable to these limits, such as cochlea, the mean dose to the whole organ may be used instead.

CONCLUSION

Differences between volume calculation methods may lead to 5–10% variation in reported volumes from different TPS. This should be taken into account when comparing multicentre studies, and it is recommended that a minimum volume of 0.05 cm3 be used for any near-point doses to allow more consistent comparisons. When margins are added to small structures, there may be up to 40% difference to nominal margin size, with substantially different total volumes as a result. Such differences are still small compared to interobserver variation in delineation, and may not be clinically significant, but quantification of uncertainties allows meaningful conclusions to be made on the robustness of clinical data. Users should be aware of the variation between volume calculation and margin growth because of the software itself, and perform validation studies and comparison with other users before clinical use.

Funding

The SRS commissioning programme was delivered and funded by NHS England, and the RTTQA group is funded by the National Institute for Health Research.

Contributor Information

David J. Eaton, Email: davideaton@nhs.net.

Kevin Alty, Email: kevin.alty@nhs.net.

References

  • 1.Roques TW. Patient selection and radiotherapy volume definition—can we improve the weakest links in the treatment chain? Clin Oncol 2014; 26: 353–5. [DOI] [PubMed] [Google Scholar]
  • 2.Kirisits C, Siebert FA, Baltas D, De Brabandere M, Hellebust TP, Berger D, et al. Accuracy of volume and DVH parameters determined with different brachytherapy treatment planning systems. Radiother Oncol 2007; 84: 290–7. [DOI] [PubMed] [Google Scholar]
  • 3.Smith DW, Morgan AM, Pooler AM, Thwaites DI. Comparison between margin-growing algorithms in radiotherapy software environments. Br J Radiol 2008; 81: 406–12. [DOI] [PubMed] [Google Scholar]
  • 4.Ma L, Sahgal A, Nie K, Hwang A, Karotki A, Wang B, et al. Reliability of contour-based volume calculation for radiosurgery. J Neurosurg 2012; 117(Suppl): 203–10. [DOI] [PubMed] [Google Scholar]
  • 5.Ackerly T, Andrews J, Ball D, Guerrieri M, Healy B, Williams I. Discrepancies in volume calculations between different radiotherapy treatment planning systems. Australas Phys Eng Sci Med 2003; 26: 90–2. [PubMed] [Google Scholar]
  • 6.Prabhakar R, Rath GK, Haresh KP, Manoharan N, Laviraj MA, Rajendran M, et al. A study on the tumor volume computation between different 3D treatment planning systems in radiotherapy. J Cancer Res Ther 2011; 7: 168–73. [DOI] [PubMed] [Google Scholar]
  • 7.Pooler AM, Mayles HM, Naismith OF, Sage JP, Dearnaley DP. Evaluation of margining algorithms in commercial treatment planning systems. Radiother Oncol 2008; 86: 43–7. [DOI] [PubMed] [Google Scholar]
  • 8.Wang Y, Jin F, Zhou J, Luo H. A simple method of evaluating margin-growing accuracy in image-guided radiation therapy. Br J Radiol 2016; 89: 20140636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ebert MA, Haworth A, Kearvell R, Hooton B, Hug B, Spry NA, et al. Comparison of DVH data from multiple radiotherapy treatment planning systems. Phys Med Biol 2010; 55: N337–N346. [DOI] [PubMed] [Google Scholar]
  • 10.Melidis C, Bosch WR, Izewska J, Fidarova E, Zubizarreta E, Ulin K, et al. Global harmonization of quality assurance naming conventions in radiation therapy clinical trials. Int J Radiat Oncol Biol Phys 2014; 90: 1242–9. [DOI] [PubMed] [Google Scholar]
  • 11.Stanley J, Dunscombe P, Lau H, Burns P, Lim G, Liu HW, et al. The effect of contouring variability on dosimetric parameters for brain metastases treated with stereotactic radiosurgery. Int J Radiat Oncol Biol Phys 2013; 87: 924–31. [DOI] [PubMed] [Google Scholar]
  • 12.Yamazaki H, Shiomi H, Tsubokura T, Kodani N, Nishimura T, Aibe N, et al. Quantitative assessment of inter-observer variability in target volume delineation on stereotactic radiotherapy treatment for pituitary adenoma and meningioma near optic tract. Radiat Oncol 2011; 6: 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sandström H, Nordström H, Johansson J, Kjäll P, Jokura H, Toma-Dasu I. Variability in target delineation for cavernous sinus meningioma and anaplastic astrocytoma in stereotactic radiosurgery with Leksell Gamma Knife Perfexion. Acta Neurochir 2014; 156: 2303–13. [DOI] [PubMed] [Google Scholar]
  • 14.Sandström H, Chung C, Jokura H, Torrens M, Jaffray D, Toma-Dasu I. Assessment of organs-at-risk contouring practices in radiosurgery institutions around the world – The first initiative of the OAR Standardization Working Group. Radiother Oncol 2016; 121: 180–6. [DOI] [PubMed] [Google Scholar]
  • 15.International Commission on Radiation Units and Measurements (ICRU) ICRU report 91: prescribing, recording and reporting of stereotactic treatments with small photon beams. J Icru 2017; 14: 2. [Google Scholar]

Articles from The British Journal of Radiology are provided here courtesy of Oxford University Press

RESOURCES