Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Oct 1.
Published in final edited form as: Med Phys. 2024 Aug 1;51(10):7453–7463. doi: 10.1002/mp.17326

Improving 3D Dose Prediction for Breast Radiotherapy Using Novel Glowing Masks and Gradient-Weighted Loss Functions

Lance C Moore 1, Fatemeh Nematollahi 1, Lingyi Li 1, Sandra M Meyers 1, Kelly Kisling 1
PMCID: PMC11479821  NIHMSID: NIHMS2010796  PMID: 39088756

Abstract

Background:

The quality of treatment plans for breast cancer can vary greatly. This variation could be reduced by using dose prediction to automate treatment planning. Our work investigates novel methods for training deep learning models that are capable of producing high quality dose predictions for breast cancer treatment planning.

Purpose:

The goal of this work was to compare the performance impact of two novel techniques for deep learning dose prediction models for tangent field treatments for breast cancer. The first technique, a “glowing” mask algorithm, encodes the distance from a contour into each voxel in a mask. The second, a gradient-weighted mean squared error loss function, emphasizes the error in high dose gradient regions in the predicted image.

Methods:

Four 3D U-Net deep learning models were trained using the planning CT and contours of the heart, lung, and tumor bed as inputs. The dataset consisted of 305 treatment plans split into 213/46/46 training/validation/test sets using a 70/15/15% split. We compared the impact of novel “glowing” anatomical mask inputs and a novel gradient weighted mean squared error loss function to their standard counterparts, binary anatomical masks and mean squared error loss, using an ablation study methodology. To assess performance, we examined the mean error and mean absolute error (ME/MAE) in dose across all within-body voxels, the error in mean dose to heart, ipsilateral lung, and tumor bed, Dice similarity coefficient (DSC) across isodose volumes defined by 0–100% prescribed dose thresholds, and gamma analysis (3%/3mm).

Results:

The combination of novel glowing masks and gradient weighted loss function yielded the best performing model in this study. This model resulted in a mean ME of 0.40%, MAE of 2.70%, error in mean dose to heart and lung of −0.10 and 0.01Gy, and error in mean dose to the tumor bed of −0.01%. The median DSC at 50/95/100% isodose levels were 0.91/0.87/0.82. The mean 3D gamma pass rate (3%/3mm) was 93%.

Conclusions:

This study found the combination of novel anatomical mask inputs and loss function for dose prediction resulted in superior performance to their standard counterparts. These results have important implications for the field of radiotherapy dose prediction, as the methods used here can be easily incorporated into many other dose prediction models for other treatment sites. Additionally, this dose prediction model for breast radiotherapy has sufficient performance to be used in an automated planning pipeline for tangent field radiotherapy and has the major benefit of not requiring a PTV for accurate dose prediction.

1. Introduction

Breast cancer is the most common cancer in women worldwide1, and 60–85% of patients need radiotherapy to reduce the risk of local recurrence and to improve survival2,3. The success of radiation treatments is innately tied to the quality of the underlying treatment plan, which is tailored for each patient’s anatomy. A major goal of treatment planning is to minimize dose to adjacent healthy tissues (e.g. heart, lung). For each incremental dose increase to these organs, there is an increased risk of toxicity, such as heart disease and radiation-induced cancers47. Thus, having high quality treatment plans is critical. However, the manual treatment planning process results in widely variable plan quality across treated disease sites810. One study involving 19 institutions planning the same four breast patients found large differences in plan quality resulting from manual planning, in which the highest mean heart dose was three times the lowest for the same patient11. Furthermore, manual treatment planning is an inefficient process and can involve multiple plan iterations to achieve acceptable plan quality.

Automated treatment planning has been shown to improve both plan quality and planning efficiency across multiple cancers treated with radiation1216. One approach to automated planning that shows promise is using deep learning to first predict the 3D dose distribution and then extracting beam parameters from that distribution. Several studies have explored dose prediction for breast cancer radiotherapy1720. However, these all required additional inputs beyond those used for planning tangent field treatments in many clinics: namely, a CT scan and contours of the tumor bed and the organs-at-risk. One prior study required the dose distribution from the manually planned treatment in order to generate an isodose-based PTV input to the dose prediction model, which precludes its use for auto-planning18. Others required added manual contouring, such as a breast tissue and planning target volume (PTV), for the automation to run19,21,22. While including a PTV contour is likely beneficial to guide the dose prediction as to where the high dose region should be, this is not a standard contour to create during planning. Ideally, automated planning approaches would not require more work from manual planners to facilitate the automation, such as additional contouring. Therefore, automated planning for tangent field treatments needs other approaches, besides including a PTV contour input, to determine where the high dose region should be.

In this work we developed the first 3D dose prediction models for breast cancer that use only a CT image and three contours - tumor bed, heart, and ipsilateral lung - as inputs to a U-Net convolutional neural network (CNN). We compare the performance of models trained with either a mean squared error (MSE) loss function or a novel gradient weighted MSE error loss function (GW-MSE). The intent of this novel loss function was to preferentially weight voxels in the high dose gradient regions in order to empower the model to better predict the location of the field edge – the boundary of the high dose region. We also compared models trained using standard binary masks or ‘glowing’ masks generated by a novel inverse square algorithm. The purpose of the glowing masks was to encode the distance from each contour into each voxel of the mask, thereby enabling the model to learn these locations earlier in the network, without the use of a PTV. We have assessed the impact of these methods using an ablation study methodology, resulting in four total models: (1) MSE loss + binary mask inputs (MSE), (2) MSE loss + glowing mask inputs, (Glowing MSE), (3) GW-MSE loss + binary mask inputs (GW-MSE), and (4) GW-MSE loss + glowing mask inputs (Glowing GW-MSE). The accuracy of all four models was assessed by comparing the predicted dose distributions to the dose distributions from the clinically treated plans.

2. Methods

2.1. Patient data

To develop the models, we retrospectively used a total of 305 treatment plans for patients treated for left-sided, intact breast cancer in our clinic from 2009–2018 (see Table 1). Each treatment plan corresponded to one unique patient treated in the supine position using a tangent field technique planned in the Eclipse Treatment Planning System (Varian Medical Systems). Treatments were planned using a CT scan acquired in the treatment position on either a GE LightSpeed RT CT simulator or a GE Discovery RT CT Simulator (GE HealthCare). The range of prescription dose was between 40.05 Gy to 50.40 Gy. Plans were created using 6 MV beams, 15 MV beams, or a combination of both. They were planned using enhanced dynamic wedges, forward planned field-in-field, or an electronic compensation technique. This study was approved by our institutional review board (UCSD IRB Project #200065).

Table 1.

Summary of dataset characteristics.

Parameter Train Validation Test All
Number of Plans 213 46 46 305
Prescribed Dose [Gy] Median (Range) 50.00 (40.05–50.40) 44.36 (40.05–50.40) 42.72 (40.05–50.40) 50.00 (40.05–50.40)
Treatment Date Median (Range) 2014 (2009–2018) 2014 (2009–2017) 2015 (2010–2018) 2014 (2009–2018)
Heart Volume [cc] Mean (Range) 70.06 (25.61–168.87) 75.20 (42.27–113.23) 74.55 (47.82–138.33) 71.47 (25.61–168.87)
Lung Volume [cc] Mean (Range) 187.90 (88.93–404.62) 195.65 (88.51–347.80) 181.92 (82.45–352.78) 188.16 (82.45–404.62)
Tumor Bed Volume [cc] Mean (Range) 2.76 (0.30–25.12) 2.30 (0.22–19.85) 4.01 (0.22–26.69) 2.87 (0.22–26.69)

2.2. Data Preprocessing

The treatment planning data, including CT images, dose distribution, and contours of the tumor bed and two organs-at-risk, the heart and left lung, were obtained from Digital Imaging and Communication in Medicine (DICOM) files. We translated all data into the NifTI23 data format using an open source dataset generation tool24. The data were then resampled to 2mm isotropic spacing and cropped to 128 × 128 × 128 voxel images maximizing the amount of the left-anterior portion of the patient body within the field of view. The cropping was performed by identifying the index of the right-most outward-most non-zero voxels in the patient images and then padding by an additional five voxels. The intensity values for the CT data were first clamped between -1000 and 1000 Hounsfield Units, and then normalized between zero and one. The dose distribution images were normalized by the prescribed dose, such that 1.0 represented the prescribed dose within all images.

The three contours (tumor bed, heart, and left lung) were converted to two types of masks. For the first type, standard binary masks, any voxel within the contoured structure was set to one while all other voxels were set to zero. To compute the second type, novel ‘glowing’ contour masks (Figure 1), all voxels within the contoured structure are first set to one. Then, for each voxel in the image, the distance (di) from that originating voxel a to each voxel b in the contour is calculated using a modified Euclidean distance (Equation 1). The summed inverse of these distances is then used to calculate the glowing voxel value Dj for each voxel in the image (Equation 2). Finally, the within-contour voxels are reset with a value of 1 and the non-contour voxels are normalized by the maximum value outside the contour.

da,b=(xa-xb)2+(ya-yb)2+(za-zb)2+1 Equation 1
D=i=1N1di2i{ContouredVoxels} Equation 2

Figure 1.

Figure 1.

A single slice from a randomly selected lung contour in the dataset. 1.A shows the standard binary mask of the lung contour. 1.B shows the distance measurement (di) between a single originating voxel (pink) and a voxel within the binary mask. 1.C illustrates the combined total value (Dj) for the originating voxel by depicting the distance to all contoured voxels in the slice. 1.D shows the intermediate glowing mask (top), where the summed total inverse squared distance is encoded into each voxel and the intensity profile (bottom) for the centerline values (red band). 1.E depicts the glowing mask and intensity profile after the original binary contour voxels are again set to one. 1.F shows the final glowing mask and intensity profile, where the non-contour voxels are normalized by the maximum value outside the contour.

2.2. Model architecture

The model architecture selected for this project was a 6-level U-Net25. The U-Net model has seen widespread implementation within the medical field, and for dose prediction tasks specifically. The main driver for the selection of a U-Net architecture for our work is the model’s ability to propagate global context through the cross-concatenation connections. Intuitively, we know that the location of structures such as the heart and lung should affect the location of the tangent field edge and dose distribution and so we selected an architecture which could capture this long-distance relationship. Additionally, we selected a 3D variant in order to include both superior and inferior contextual information into the prediction of each voxel.

A U-Net deep learning model generally consists of a down-sampling (encoding) path and up-sampling (decoding) path (Figure 2). For each level in the encoding path, we used two in-series convolution blocks, followed by a dropout and max pooling operation. The convolution blocks consisted of a 3D convolution layer with a 3 × 3 × 3 kernel followed by a batch normalization layer, and a leaky rectified linear unit (Leaky ReLU) activation function.

Figure 2.

Figure 2.

The 3D U-Net model architecture used for all models in this work26. The encoding path consisted of six layers, each containing two in-series convolution blocks (3D convolution, batch normalization, and Leaky ReLU activation function) followed by a dropout layer and a max pooling operation. The decoding path also consisted of six layers, each containing an up-sampling convolution block (transposed 3D convolution, batch normalization, and Leaky ReLU activation function) followed by two in-series convolution blocks, just as in the encoding path. The final layer consisted of two in-series convolution blocks which each reduced the number of feature maps by half. Finally, a pointwise convolution was used to generate the predicted dose image.

The first convolution block in each level doubles the number of feature channels and the max pooling operation halves the size of the features, resulting in 768 features with size 4 × 4× 4 in the bottleneck level. In the decoding path, each level consists of an initial 3D transpose convolution layer, which halves the number of feature channels and doubles the dimension size, reversing the effects of the encoding levels. After the transpose convolution, each decoding level contains the same two in-series convolution blocks as the encoding levels. Additionally, the intermediate features from the corresponding sized encoding level are concatenated across the model to prevent vanishing gradient and propagate low level features deeper into the network. At the final layer, two in-series convolution blocks, each reducing the number of feature channels by half, are used to gradually combine the features before a 1 × 1 × 1 convolution is used to generate the predicted dose distribution. The AdamW27 optimizer with an initial learning rate 1e-3 and a weight decay of 1e-4 was implemented using PyTorch.

All models in this paper used the CT images and masks of the contours (tumor bed, heart, and lung) as inputs for a total of four channels. The contour masks were represented either as binary masks or glowing masks, as described above. We also assessed model performance using two loss functions, MSE and GW-MSE. Mean squared error is defined as MSE=1Ni=1N(yi-yi^)2 where yi is the clinical dose for voxel i, yi^ is the predicted dose, and N is the number of voxels. The goal of using GW-MSE is to enhance the fidelity of the prediction around the field edges by preferentially weighting voxels in high gradient regions. The GW-MSE loss is shown in Equation 3.

Loss=1Ni=1N((yiyi^)2+tanh((GG^)2)*(yiyi^)2) Equation 3

Here, G is the 3D gradient28 of the clinical dose distribution and G^ is the 3D gradient of the prediction. The subcomponents of the gradients are defined as Gx2+Gy2+Gz2 where Gx, Gy, Gz are the gradients in x, y, and z direction, respectively, and are approximated using a convolution with three dimensional Sobel operators (Equation 4). Both loss functions were restricted to voxels exclusively within the patient body.

Gx(0,:,:)=-1-2-1000121,Gx(1,:,:)=-2-4-2000242,Gx(2,:,:)=-1-2-1000121Gy(:,0,:)=-101-202-101,Gy(:,1,:)=-202-404-202,Gy(:,2,:)=-101-202-101Gz(:,:,0)=-1-2-1-2-4-2-1-2-1,Gz(:,:,1)=000000000,Gz(:,:,2)=121242121 Equation 4

In total, we trained and evaluated four different models: (1) MSE loss + binary mask inputs (MSE), (2) MSE loss + glowing mask inputs, (Glowing MSE), (3) GW-MSE loss + binary mask inputs (GW-MSE), and (4) GW-MSE loss + glowing mask inputs (Glowing GW-MSE). All model source code, the glowing mask algorithm, and the data pipeline are available open source at the UCSD QUIVER Github page at https://github.com/UCSD-Health-QUIVER/BreastNet.

2.3. Training and evaluation

The dataset was divided into train, validation, and test folds following a 70/15/15% split with 213/46/46 plans respectively. Training was conducted on a single node from the Expanse system at the San Diego Supercomputer Center through the ACCESS program. The single node provided four NIVIDA V100 GPUs with 32GB of memory for training. The data parallel method was used to split each batch across the four GPUs. Our global batch size was 24, which resulted in a within-GPU batch size of 6. In order to avoid overfitting, the validation performance was measured via the loss function at the end of each epoch and the best performing epoch was selected as the final model version. To improve model generalizability, data augmentation was performed on the training dataset for each batch. To perform data augmentation, we used the TorchIO30 package to apply a random flip along each axis and a random affine transformation, which randomly rotated the images by an angle drawn uniformly from U[-60O,600] and randomly translated the inputs by a distance drawn uniformly from U[0,10mm].

Each epoch for all models was completed in approximately four minutes to include all data loading and data augmentations. Training was terminated once the validation performance failed to improve over either 24 hours or 300 epochs whichever occurred first. Training for each model completed in less than 36 hours.

To validate dose prediction accuracy, we compared the predicted dose to the clinical dose for each patient using mean absolute error (MAE) defined as 1Ni=0N|yi-y^i| and mean error (ME) defined as 1Ni=0N(yi-y^i). Here, y is the clinical dose distribution and y^ is the predicted dose distribution and the index i is over voxels within the body for each patient, so as to not skew results by the many zero voxels outside the body. We also evaluated the difference in the mean dose to the heart, lung, and tumor bed. The differences in volume receiving 20 Gy (V20Gy) and maximum dose (D0.1cc) were assessed for the cropped lung volumes, and the difference in maximum dose (D0.1cc) was assessed for the cropped heart volumes. As with the ME and MAE, the differences for these metrics were computed using clinical dose minus predicted dose.

Additionally, we compared the overlap of isodose volumes using the Dice Similarity Coefficient (DSC) from 0 to 100% prescribed dose. We use the DSC to assess agreement between clinical and predicted dose in 3D at incremental dose thresholds by setting any voxel in the dose distribution above the threshold to 1 and any value below the threshold to 0. The equation for the DSC is given by DSC=(2|AB|)/(|A|+|B|) where A and B are two segmentations. The DSC of the isodose volumes is a surrogate for the similarity of coverage of the treated volume. Finally, we used a global 3D gamma analysis31 with 3% dose difference and 3mm distance-to-agreement criteria to assess similarity between the predicted and clinical dose distributions.

3. Results

All results presented in this section refer to the test set performance. Figure 3 presents the overall MAE and mean error between actual and predicted dose for voxels within patient body. From the MAE plot, we see that the Glowing GW-MSE model demonstrates the best performance with the lowest absolute error of 2.70% ± 1.28% (mean ± standard deviation). The mean error plot shows that all four models have a very small bias (mean value within +/- 1.00%); however, the Glowing GW-MSE model shows the smallest interquartile range (1.95%). We see that the Glowing GW-MSE model leads to the best performance for these metrics, with GW-MSE model without glowing masks generally performing the worst with a MAE of 3.44% ± 1.06.

Figure 3.

Figure 3.

Box plots of the mean absolute error (MAE) and the mean error (ME) as a percentage (%) of prescribed dose for all models and all voxels within the body. The box represents the lower Q1 and upper Q3 quartiles and the whiskers represent the minimum and maximum values, excluding outliers. The median is represented by the horizontal line within the box and the mean by the white dot.

Figure 4 depicts the difference in mean dose (in Gy) for the heart and lung for each model. Here we again see the Glowing GW-MSE model demonstrating the best or equivalent performance compared to the other three models with a mean difference of −0.10Gy ± 0.31 for the heart and 0.01 ± 1.07 for the lung.

Figure 4.

Figure 4.

Boxplots of the difference in mean dose (DMEAN ) for the heart and lung for all four models. The box represents the lower Q1 and upper Q3 quartiles and the whiskers represent the minimum and maximum values, excluding outliers. The median is represented by the horizontal line within the box and the mean by the white dot.

Figure 5 highlights the largest differences in model performance. Here we see that the Glowing GW-MSE model outperforms the other three models with a mean error in mean dose for the tumor bed contour of −0.01% ± 1.60. This plot indicates that the Glowing GW-MSE model is much more accurate than the other models in the high dose regions where the tumor bed contour will inevitably lie.

Figure 5.

Figure 5.

Box plots of the difference in mean dose (DMEAN) for tumor bed contours for all models, normalized to prescription (% Prescribed Dose). The box represents the lower Q1 and upper Q3 quartiles and the whiskers represent the minimum and maximum values, excluding outliers. The median is represented by the horizontal line within the box and the mean by the white dot.

Figure 6 shows the Dice similarity coefficient (DSC) for isodose volumes ranging from 0 to 100% prescribed dose. A DSC score of 1 indicates perfect agreement between clinical and predicted dose at any given isodose threshold, with lower scores indicating decreasing agreement. We can see from Figure 6 that all models perform equivalently for dose values between 30% and 70% prescribed dose. Above 70% prescribed dose, the model performance diverges with the Glowing GW-MSE model showing the best performance in these high dose regions with a median DSC of 0.87 for the 95% isodose volume.

Figure 6.

Figure 6.

DSC for all models as a function of isodose threshold. Here, we have plotted the median DSC score over all patient plans.

A global 3D gamma analysis was performed to highlight differences in the dose distributions. The Glowing GW-MSE model performed the best of all four models. Overall, the mean (standard deviation) gamma pass rate was 93.0% (3.0%), 88.3% (4.2%), 83.6% (4.3%), and 91.3% (2.9%) for the Glowing GW-MSE, Glowing MSE, GW-MSE, and MSE models, respectively.

Figure 7 shows the clinical dose, predicted dose, error in dose (clinical-predicted), and the results of the gamma analysis for a single axial slice from a randomly selected patient. We can see that the Glowing GW-MSE model displays the lowest error within the high dose regions inside the breast and roughly equivalent performance at the tangent field edge. The gamma analysis also shows superior performance for the Glowing GW-MSE model with far fewer failing voxels within the high dose regions inside the breast.

Figure 7.

Figure 7.

An axial slice from a randomly selected patient from the test dataset. Each row corresponds to one of the four models in this work. The first and second columns show the clinical dose and predicted dose, respectively. The third column shows the clinical dose minus predicted dose. The final column shows the gamma pass rate for the predicted dose distribution with passing voxels shown in blue and failing in red.

Table 2 depicts the difference in V20Gy to the cropped lung volumes and the D0.1cc to the cropped heart and lung volumes for all four models. We note here that the OAR volumes were resampled and frequently cropped during data preprocessing, and thus the metrics reported in Table 2 should represent an approximation of the true metrics only.

Table 2.

Summary of error (clinical – predicted) in dose volume metrics for the cropped OAR volumes across the four models assessed in this work.

Model ME (STD) Lung V20Gy (cc) ME (STD) Lung D0.1cc (Gy) ME (STD) Heart D0.1cc (Gy)

Glowing GW-MSE 0.26 (39.98) 0.71 (1.37) −1.56 (9.94)
Glowing MSE −1.28 (41.15) 2.72 (1.22) −0.48 (10.00)
GW-MSE −8.96 (40.27) 3.01 (1.03) −1.27 (9.35)
MSE 10.99 (39.00) 2.11 (0.85) 2.15 (9.81)

4. Discussion

We have developed a 3D dose prediction deep learning model which uses CT images and planning contours as inputs in order to support automated planning processes for breast cancer treatment. We compared four models which were trained using either binary mask inputs or glowing mask inputs and using either a standard MSE loss or a GW-MSE loss. All four models perform well, with the Glowing GW-MSE model outperforming the other three models. The performance out-of-field is high for all models, as shown by both the low heart and lung error in DMEAN as well as the high DSC values in isodose volumes less than approximately 80% prescribed dose. However, in the higher dose regions, the Glowing GW-MSE model demonstrates substantially higher performance as can be seen by the small error in DMEAN for the tumor bed and the higher median DSC values for dose levels greater than 90% prescribed dose. The Glowing GW-MSE model also had the highest average gamma pass rate of all four models.

We hypothesize that the reason for this improved performance is twofold. First, the GW-MSE loss function emphasizes model accuracy in the high gradient regions of dose, which mainly occur in the tangent field and the boundary of the body. This increased emphasis may allow a model to preferentially learn features which inform the location of these high gradient voxels. This is particularly important with the tangent field edge, as any deviation in the prediction of the location of this region would alter the beam parameters (e.g. gantry angle) generated during auto-planning and may reduce the quality of the plan. Second, the glowing masks for all contours allow the model to learn the joint proximity of the contours earlier in the network by encoding this information into each voxel in the image. Due to the impact of the effective receptive field32 for convolution based networks, information cannot travel large distances between voxels without either numerous convolution layers or some form of dimensionality reduction. One of the primary functions of the U-Net architecture is to overcome this limitation by repeatedly reducing the dimensionality of the feature maps through pooling operations or strided convolutions. However, this is not a perfect solution as two spatially distant voxels, say a voxel in a tangent field and a voxel in the contour of the heart, may not “discover” their proximity to each other until quite deep into the network. We believe that the glowing masks allow the model to learn these proximities as early as the first layer and therefore allow the model to learn more complex features relating to the positions of these structures. This is of critical importance for breast dose prediction because the location of these structures largely determines where the field edge should occur and therefore the beam parameters

It appears that these two components combined empowered the Glowing GW-MSE model to achieve the highest performance. We note here that the ablation study indicates that these components in isolation appear to either hinder performance or have little benefit, as indicated by the relatively lower performance of the GW-MSE and Glowing MSE model performance. In the GW-MSE model case, this could be due to an over-emphasis of high gradient regions without additional context. This may have resulted in a model which performs well in the high gradient regions but poorly in the high dose regions, as seen in Figure 7. In the Glowing MSE model case, the opposite may be true. Where here, the model was presented with more contextual information, but less emphasis on what to do with this information. This being said, without further testing all justifications and rationale mentioned here are pure conjecture. In future work, we will more closely investigate the causes of the improved performance found here.

Other publications have reported using deep learning to predict dose for breast cancer radiotherapy. One group used the 95% dose area from the manually planned treatment as an input to the dose prediction, which precludes its use for automated planning33. Five other works reported needing contours of a PTV of the breast as an input to the prediction model17,19,20,34,35. Including a PTV contour as input to our model would have likely improved our dose prediction accuracy since it would draw the model’s attention to where the high dose region should be. However, like many clinics, it is not our practice to manually create PTVs for the whole breast for tangent field treatments. Adding this as a step in our clinical workflow solely to facilitate automated planning would detract from the efficiency gains of such a system. As such, we only include inputs that are readily available in our clinical workflow. Another advantage of our work was that the size of the training dataset, with 213 patient plans, was much greater than any other work predicting dose for breast radiation, where the training dataset sizes ranges from 35–120 patient plans.

To compare to other groups’ work, we re-ran our gamma analysis in 2D using 3%/3mm criteria for our best model, the Glowing GW-MSE model, which resulted in a mean pass rate of 92.3% (compared to 93.0% for this model with 3D gamma). These results represent an improvement over published works that report pass rates for 2D gamma (3%/3mm) between 85% and 89%.17,19,35 Bai et al17 also reported a mean DSC of 0.94 at the 80% isodose threshold compared to a mean DSC of 0.90 in this work. These results show that our model performs equivalently to these models despite the lack of PTV as input.

Other works22 in the field have also attempted to model distance from contours for all voxels in a mask. In comparison to our glowing method, these authors modeled distance from the contour using a negative Euclidian distance equation. This results in increasingly large negative values for voxels at large distances from the contour. We believe that our method, which encodes these distant voxels with very small magnitude values, is more representative of the implied importance we are attempting to model. Additionally, the linear distance method results in a non-normalized distribution of values between plans. This could adversely affect a model’s ability to incorporate this mask into the dose prediction task, whereas our glowing method ensures that values are consistently between one and zero. In future work, we will examine this method in a direct comparison with our glowing mask method.

One limitation of our study is incidental cropping of parts of the heart and lung volumes during the data preprocessing. As stated above, efforts were made to maximize the amount of patient body within the cropped images after preprocessing. However, due to variations in patient shape and size, occasionally portions of heart and lung volume were cropped out of the images. As a result, these fractions of the volume did not factor into the Dmean calculations for the OAR metrics. However, we note that these cropped fractions are always the furthest away from the target breast volume and should not have much, if any, impact on automated planning considerations. Additionally, the four models in this work we trained and evaluated on the same dataset, so any difference in OAR metrics due to cropping is the same between all models.

Another limitation of this study is that it was conducted on a single institution dataset. This may limit the generalizability of the models due to variations in clinical practices and patient populations. Future works within our group will examine generalizability in a multi-institutional study. Additionally, this current work only examined the effectiveness of these models on one patient cohort, the left sided supine tangent cohort. In future work, we will extend the dose prediction model to other cohorts of breast patients, such as right-sided breast cancers, patients in the prone position, and the treatment of more advanced disease, such as regional nodal irradiation.

In the future, these models will be incorporated into the development of an automated treatment planning system for breast cancer, which aims to greatly reduce the variability and inefficiency in the radiation treatment of breast cancer. Additionally, we will further investigate the effects of the glowing mask algorithm for deep learning networks. We will examine the effects in both the future breast cancer dose prediction models as well as the effects for alternative datasets, such as the OpenKBP36 head and neck cancer dose prediction dataset.

5. Conclusion

In this work, we introduced and assessed the performance of a novel method of representing structures in 3D space, which we dubbed a “glowing” mask. We also analyzed the effects of adding a gradient weighting factor to the standard MSE in the loss function. Overall, we found that the combination of the gradient weighting and glowing masks resulted in the best performing model with little to no added computational cost. The best trained model in this work demonstrates state-of-the-art performance for dose prediction for breast radiotherapy, without the need for additional manual inputs, such as a breast PTV volume.

6. Acknowledgements

This work was partially supported by the National Institutes of Health, Grant UL1TR001442. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. We also would like to acknowledge the San Diego Supercomputing Center and the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.

Footnotes

7. Conflict of Interest Statement

Sandra Meyers has received a K08 award from the National Cancer Institute of the National Institutes of Health, which supported this work, as well as an honorarium and research grant from Varian Medical Systems. Kelly Kisling has received honoraria from Varian Medical Systems.

8. References

  • 1.Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209–249. doi: 10.3322/caac.21660 [DOI] [PubMed] [Google Scholar]
  • 2.Borras JM, Barton M, Grau C, et al. The impact of cancer incidence and stage on optimal utilization of radiotherapy: Methodology of a population based analysis by the ESTRO-HERO project. Radiother Oncol. 2015;116(1):45–50. doi: 10.1016/j.radonc.2015.04.021 [DOI] [PubMed] [Google Scholar]
  • 3.Tyldesley S, Delaney G, Foroudi F, Barbera L, Kerba M, Mackillop W. Estimating the need for radiotherapy for patients with prostate, breast, and lung cancers: verification of model estimates of need with radiotherapy utilization data from British Columbia. Int J Radiat Oncol Biol Phys. 2011;79(5):1507–1515. doi: 10.1016/j.ijrobp.2009.12.070 [DOI] [PubMed] [Google Scholar]
  • 4.Overgaard M, Jensen MB, Overgaard J, et al. Postoperative radiotherapy in high-risk postmenopausal breast-cancer patients given adjuvant tamoxifen: Danish Breast Cancer Cooperative Group DBCG 82c randomised trial. Lancet. 1999;353(9165):1641–1648. doi: 10.1016/S0140-6736(98)09201-0 [DOI] [PubMed] [Google Scholar]
  • 5.England TN. Journal Medicine ©. Published online 1997:949–955. [Google Scholar]
  • 6.Ragaz J, Olivotto IA, Spinelli JJ, et al. Locoregional radiation therapy in patients with high-risk breast cancer receiving adjuvant chemotherapy: 20-year results of the British Columbia randomized trial. Journal of the National Cancer Institute. 2005;97(2):116–126. doi: 10.1093/jnci/djh297 [DOI] [PubMed] [Google Scholar]
  • 7.Abe O, Abe R, Enomoto K, et al. Effects of radiotherapy and of differences in the extent of surgery for early breast cancer on local recurrence and 15-year survival: An overview of the randomised trials. The Lancet. 2005;366(9503):2087–2106. doi: 10.1016/S0140-6736(05)67887-7 [DOI] [PubMed] [Google Scholar]
  • 8.Nelms BE, Robinson G, Markham J, et al. Variation in external beam treatment plan quality: An inter-institutional study of planners and planning systems. Practical Radiation Oncology. 2012;2(4). doi: 10.1016/j.prro.2011.11.012 [DOI] [PubMed] [Google Scholar]
  • 9.Berry SL, Boczkowski A, Ma R, Mechalakos J, Hunt M. Interobserver variability in radiation therapy plan output: Results of a single-institution study. Practical Radiation Oncology. 2016;6(6). doi: 10.1016/j.prro.2016.04.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Batumalai V, Jameson MG, Forstner DF, Vial P, Holloway LC. How important is dosimetrist experience for intensity modulated radiation therapy? A comparative analysis of a head and neck case. Practical Radiation Oncology. 2013;3(3). doi: 10.1016/j.prro.2012.06.009 [DOI] [PubMed] [Google Scholar]
  • 11.Hurkmans C, Duisters C, Peters-Verhoeven M, et al. Harmonization of breast cancer radiotherapy treatment planning in the Netherlands. Technical Innovations and Patient Support in Radiation Oncology. 2021;19(June):26–32. doi: 10.1016/j.tipsro.2021.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Fried DV, Das SK, Marks LB, Chera BS. Clinical Use of A Priori Knowledge of Organ-At-Risk Sparing During Radiation Therapy Treatment for Oropharyngeal Cancer: Dosimetric and Patient Reported Outcome Improvements. Practical Radiation Oncology. 2022;12(3). doi: 10.1016/j.prro.2021.12.006 [DOI] [PubMed] [Google Scholar]
  • 13.Li N, Carmona R, Sirak I, et al. Highly Efficient Training, Refinement, and Validation of a Knowledge-based Planning Quality-Control System for Radiation Therapy Clinical Trials. International Journal of Radiation Oncology Biology Physics. 2017;97(1):164–172. doi: 10.1016/j.ijrobp.2016.10.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Moore KL, Schmidt R, Moiseenko V, et al. to suboptimal planning : a secondary study on RTOG0126. 2016;92(2):228–235. doi: 10.1016/j.ijrobp.2015.01.046.Quantifying [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kaderka R, Hild SJ, Bry VN, et al. Wide-Scale Clinical Implementation of Knowledge-Based Planning: An Investigation of Workforce Efficiency, Need for Post-automation Refinement, and Data-Driven Model Maintenance. International Journal of Radiation Oncology Biology Physics. 2021;111(3):705–715. doi: 10.1016/j.ijrobp.2021.06.028 [DOI] [PubMed] [Google Scholar]
  • 16.Bradley JD, Iyengar P, Higgins KA. Quality Controls in Cooperative Group Trials. 2020;9(2). doi: 10.1016/j.prro.2018.11.007.Multi-institutional [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bai X, Liu Z, Zhang J, et al. Comparing of two dimensional and three dimensional fully convolutional networks for radiotherapy dose prediction in left-sided breast cancer. Science Progress. 2021;104(3):00368504211038162. doi: 10.1177/00368504211038162 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hedden N, Xu H. Radiation therapy dose prediction for left-sided breast cancers using two-dimensional and three-dimensional deep learning models. Physica Medica. 2021;83(February):101–107. doi: 10.1016/j.ejmp.2021.02.021 [DOI] [PubMed] [Google Scholar]
  • 19.Ahn SH, Kim E, Kim C, et al. Deep learning method for prediction of patient-specific dose distribution in breast cancer. Radiation Oncology. 2021;16(1):154. doi: 10.1186/s13014-021-01864-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bakx N, Bluemink H, Hagelaar E, van der Sangen M, Theuws J, Hurkmans C. Development and evaluation of radiotherapy deep learning dose prediction models for breast cancer. Phys Imaging Radiat Oncol. 2021;17:65–70. doi: 10.1016/j.phro.2021.01.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bakx N, Bluemink H, Hagelaar E, van der Sangen M, Theuws J, Hurkmans C. Development and evaluation of radiotherapy deep learning dose prediction models for breast cancer. Physics and Imaging in Radiation Oncology. 2021;17:65–70. doi: 10.1016/j.phro.2021.01.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.van de Sande D, Sharabiani M, Bluemink H, et al. Artificial intelligence based treatment planning of radiotherapy for locally advanced breast cancer. Physics and Imaging in Radiation Oncology. 2021;20(December):111–116. doi: 10.1016/j.phro.2021.11.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Li X, Morgan PS, Ashburner J, Smith J, Rorden C. The first step for neuroimaging data analysis: DICOM to NIfTI conversion. Journal of Neuroscience Methods. 2016;264:47–56. doi: 10.1016/j.jneumeth.2016.03.001 [DOI] [PubMed] [Google Scholar]
  • 24.Anderson BM, Wahid KA, Brock KK. Simple Python Module for Conversions between DICOM Images and Radiation Therapy Structures, Masks, and Prediction Arrays. Pract Radiat Oncol. 2021;11(3):226–229. doi: 10.1016/j.prro.2021.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Navab N, Hornegger J, Wells WM, Frangi AF. Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015: 18th International Conference Munich, Germany, October 5–9, 2015 proceedings, part III. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2015;9351(Cvd):12–20. doi: 10.1007/978-3-319-24574-4 [DOI] [Google Scholar]
  • 26.Iqbal H. PlotNeuralNet. Published online September 3, 2023. Accessed September 3, 2023. https://github.com/HarisIqbal88/PlotNeuralNet [Google Scholar]
  • 27.Loshchilov I, Hutter F. Decoupled Weight Decay Regularization. Published online January 4, 2019. Accessed July 28, 2023. http://arxiv.org/abs/1711.05101
  • 28.Lu Z, Chen Y. Single image super-resolution based on a modified U-net with mixed gradient loss. Signal, Image and Video Processing. 2022;16(5):1143–1151. doi: 10.1007/s11760-021-02063-5 [DOI] [Google Scholar]
  • 29.UCSD-Health-QUIVER/BreastNet. Accessed November 30, 2023. https://github.com/UCSD-Health-QUIVER/BreastNet
  • 30.Pérez-García F, Sparks R, Ourselin S. TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Computer Methods and Programs in Biomedicine. 2021;208:106236. doi: 10.1016/j.cmpb.2021.106236 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Low DA, Harms WB, Mutic S, Purdy JA. A technique for the quantitative evaluation of dose distributions. Medical Physics. 1998;25(5):656–661. doi: 10.1118/1.598248 [DOI] [PubMed] [Google Scholar]
  • 32.Luo W, Li Y, Urtasun R, Zemel R. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks. Published online January 25, 2017. doi: 10.48550/arXiv.1701.04128 [DOI] [Google Scholar]
  • 33.Hedden N, Xu H. Radiation therapy dose prediction for left-sided breast cancers using two-dimensional and three-dimensional deep learning models. Phys Med. 2021;83:101–107. doi: 10.1016/j.ejmp.2021.02.021 [DOI] [PubMed] [Google Scholar]
  • 34.van de Sande D, Sharabiani M, Bluemink H, et al. Artificial intelligence based treatment planning of radiotherapy for locally advanced breast cancer. Phys Imaging Radiat Oncol. 2021;20:111–116. doi: 10.1016/j.phro.2021.11.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Ravari ME, Nasseri Sh, Mohammadi M, Behmadi M, Ghiasi-Shirazi SK, Momennezhad M. Deep-learning Method for the Prediction of Three-Dimensional Dose Distribution for Left Breast Cancer Conformal Radiation Therapy. Clinical Oncology. 2023;35(12):e666–e675. doi: 10.1016/j.clon.2023.09.002 [DOI] [PubMed] [Google Scholar]
  • 36.Babier A, Zhang B, Mahmood R, et al. OpenKBP: The open-access knowledge-based planning grand challenge and dataset. Medical Physics. 2021;48(9):5549–5561. doi: 10.1002/mp.14845 [DOI] [PubMed] [Google Scholar]

RESOURCES