Abstract
The patient’s eye-lens dose changes for each projection view during fluoroscopically-guided neuro-interventional procedures. Monte-Carlo (MC) simulation can be done to estimate lens dose but MC cannot be done in real-time to give feedback to the interventionalist. Deep learning (DL) models were investigated to estimate patient-lens dose for given exposure conditions to give real-time updates. MC simulations were done using a Zubal computational phantom to create a dataset of eye-lens dose values for training the DL models. Six geometric parameters (entrance-field size, LAO gantry angulation, patient x, y, z head position relative to the beam isocenter, and whether patient’s right or left eye) were varied for the simulations. The dose for each combination of parameters was expressed as lens dose per entrance air kerma (mGy/Gy). Geometric parameter combinations associated with high-dose values were sampled more finely to generate more high-dose values for training purposes. Additionally, dose at intermediate parameter values was calculated by MC in order to validate the interpolation capabilities of DL. Data was split into training, validation and testing sets. Stacked models and median algorithms were implemented to create more robust models. Model performance was evaluated using mean absolute percentage error (MAPE). The goal for this DL model is that it be implemented into the Dose Tracking System (DTS) developed by our group. This would allow the DTS to infer the patient’s eye-lens dose for real-time feedback and eliminate the need for a large database of pre-calculated values with interpolation capabilities.
Keywords: DNN, Eye-Lens Dose, Neuro-Interventional Procedures, DTS
1. INTRODUCTION
Fluoroscopically-guided neurointerventional procedures result in high patient radiation dose and associated deterministic risks. Skin dose has been a primary concern for these procedures due to the risk of skin effects starting with erythema, which has a dose threshold of 2000 mGy.2 However, these procedures also have the potential to deliver high dose to the patient eye lens, with one study showing that sixteen percent of embolization procedures resulted in doses exceeding 500 mGy,3 which is the estimated dose threshold for radiation-induced cataracts set by the ICRP.4 The lens of the eye is thus a very radiosensitive organ whose dose must be minimized, while still being able to achieve the clinical task.
The real-time Canon Dose Tracking System (DTS) is used during interventional fluoroscopic procedures in order to track patient skin dose and allow for dose management. This system displays a color-coded map of the patient’s skin dose during the procedures. The intersection of the x-ray beam with a graphic, representing the patient, is used to calculate the dose. The geometric and exposure parameters are obtained from the imaging system in real time through a digital Controller Area Network (CAN) bus. The patient graphic is placed relative to the imaging system table to match that of the patient and can be adjusted manually through use of a software GUI.5,6
For each projection view, the dose to the lens of the patient’s eyes changes during fluoroscopically-guided neuro-interventional procedures. Monte-Carlo (MC) simulation can be done to obtain an estimation of eye lens dose for each exposure projection, but MC is not able to provide real-time feedback to the interventionalist. Deep learning (DL) models were investigated in order to find if the patient lens dose for given exposure conditions could be estimated in real-time. This model could then be implemented in the DTS for the functionality of estimating the patient eye lens dose in real-time.
2. METHODS AND MATERIALS
2.1. Dataset
The dataset of the eye lens dose values generated to be used for the DL models was from MC simulations using a Zubal computational phantom7. BEAMnrc was used to match the beam spectra to the clinical unit (x-ray tube of the Canon Infinix C-arm fluoroscopy system). The generated beam phase-space files are imported into DOSXYZnrc for dose calculation. All simulations on DOSXYZnrc are done with 2 x 1010 photon histories. A CT scan of an adult male head, with all critical structures, including the eye lenses, being segmented are the basis for the Zubal head phantom. PEGS4 code is used to generate the tissue material data. The geometric center of the head, when placed at the gantry isocenter, was used as the reference zero point for this work. This setup is done for symmetry, however, the Zubal head phantom has a slight tilt which leads to the axial planes having some asymmetry. This could lead to differing results between the right and left eye lens’ doses.7
The simulations were performed with a variation of six geometric parameters. The entrance field sizes were 5 cm x 5 cm and 10 cm x 10 cm due to these being typically used during neuro-interventional procedures. The second parameter was whether it was the patient’s right or left eye (which is affected by the asymmetry of the Zubal head phantom as discussed above). The third parameter was the beam angulation, where the beam gantry was rotated from 0° to 90° LAO in increments of 15°. The remaining parameters are the patient head shift, relative to the zero point, on the three axes, X, Y and Z, with increments of 1 cm. As seen in Fig. 1, the range of the X-axis shift was −6 cm to 6 cm, the range of the Y-axis shift was −6 cm to 6 cm, and the range of the Z-axis shift was from −4 cm to 4 cm. For the X-axis, beam movement to the left is in the negative direction (head moves to the right) and vice versa for the positive direction. For the Y-axis, beam movement in the cranial direction is in the negative direction (head moves caudally) and it is the opposite for the positive side. For the Z-axis, beam movement to the anterior side is in the negative direction (head moves downward toward posterior) and again opposite for the positive direction. These ranges were chosen because they should cover the majority of movements present during neurointerventional procedures. All simulations were done for an 80 kVp beam with a 1.8 mm Al added beam filter. The dose for each combination of parameters was expressed as lens dose per entrance air kerma (mGy/Gy).7
Fig. 1.

Geometric parameters used for the Monte-Carlo simulations and the deep-learning training dataset. I. The entrance field size (whether 5 cm x 5 cm or 10 cm x 10 cm). II. The right or left eye of the patient. III. Gantry angulation ranging from 0–90° LAO. IV. The head shift of the patient vertically +/− 4 cm and in the cranial-caudal and lateral directions incrementally up to 6 cm in either direction.
Geometric combinations that were associated with high dose values were sampled more finely. This was done by altering the LAO gantry angulation for these geometric combinations using smaller increments. For example, if a high dose value had an associated gantry angulation of 30° LAO, then while keeping the other geometric parameters constant, the angulation was altered by ±2° (28°, 30°), ±3°, ±4, ±5°. This high dose data was then put into the dataset as well for training purposes. This was done since the majority of evenly spaced projections resulted in a low lens dose, resulting in insufficient training data for the high dose projections and thus the inability to accurately predict these high lens doses, in the previously used model. Thus, additional high dose data was used in order to allow the model to adequately train on these high dose values and overcome this problem. Furthermore, data was generated by MC simulations for intermediate values using an 8 cm x 8 cm field size (while keeping the other geometric parameters constant), to be used as ground truth data to test the interpolation capabilities of DL.
Once all the data was generated, there was a total of 1789 data points to be used for the DL models. There were an additional 64 data points, associated with the 8 cm x 8 cm field size, to be used to check the interpolation capabilities. The split of the data was as follows: the training set consisted of 1504 data points, the validation set was 167 data points, and the testing set was 118 data points. The validation set is a sample of data used to give an unbiased evaluation of a model fit on training data while the model hyperparameters are being tuned. This is used to show how the model might perform on unseen data. The testing dataset was randomly selected from the dataset and was not used during the training process; the testing set is used to give an unbiased evaluation of the final model which was fit on the training dataset. The testing set is used as the final evaluation in order to compare the finished models and find which one worked best.
2.2. Deep Learning Model
The basic model configuration for this work consisted of a number of dense layers, with each layer followed by some non-linear activation function. Fig. 2 gives a model configuration used where the data (geometric parameters) are input into the model and a dose prediction is made. Dense layers are made up of a specified number of neurons by the user. These neurons act as a linear model which take inputs, multiply them by weights, and add a bias term to produce an output. All of these neurons stacked together create the dense layer. Dense layers only have linear functions, which leads to a combination of them giving a linear output. In order to adapt to non-linear relationships, non-linear activation functions are used to follow the dense layers.8
Fig. 2.

Schematic showing the deep-learning model configuration for the DNN. The geometric parameter data (Fig. 1) is input into the model, and then the model predicts the patient eye lens dose based off of the geometric parameters.
The non-linear activation functions used in this work were the Rectified Linear Unit (ReLU) function and the Exponential Linear (eLU) function. ReLU is defined as y(x) = max(0, x) which leads to it being linear for all positive values and zero for all negative values. Due to its simple design, the model takes less time to train. Also, the linearity aspect leads to the slope not becoming a plateau or becoming zero when x gets large which allows it to converge faster. Furthermore, it is sparsely activated due to it being zero for all negative inputs. This allows for the neurons to not be firing at all times, and to be activated by different signals. In turn, this increases the likelihood that the neurons are actually processing important parts of the problem. This leads to better predicting power and less overfitting/noise.9
A problem associated with ReLU is known as “dying ReLU” and is associated with the function being zero for all negative values. A neuron is considered “dead” for ReLU if it is trapped on the negative side and continuously outputs zero. Due to the slope being zero in the negative range, once a neuron becomes negative, there is a very small chance it will recover. This leads to neurons such as this not contributing to the model and over an extended period of time, a large part of the model may actually be doing nothing. The eLU function is used to get around this problem by using a log curve in the negative portion. This leads to a small slope for negative values which avoids the dying ReLU problem.9
2.3. Training
After the model architecture was built, different DL models were tested using various configurations such as different combinations of dense layers and activation functions. K-fold cross validation methods were then used to produce subsets of the model. This method (Fig. 3) splits the data into “K” number of groups. When the model is being trained, one group is used as the validation set, and the rest of the data groups are used for training. This leads to each group being given the opportunity to be used as the validation set one time, and to also be used to train the model K-1 times. Using the K-fold method helps to avoid an unlucky split of the available data, which could lead to an ineffective model, and also helps to evaluate model quality.10
Fig. 3.

Diagram showing the K-fold cross validation process. The data is split into a number of folds (for this work K = 10). One fold is then chosen as the validation set, and the rest are used for the training set. This leads to a unique model for that validation set. This process is repeated so that each fold is used as the validation set, which leads to K (10) models that each have a different validation set.
Each K-fold model was run for 150 epochs, where the built-in Keras Model Checkpoint function was used. The Model Checkpoint function allows the user to save the model (after each epoch) based on a certain metric. In this case, the metric chosen was validation loss. This is basically the same as training loss, since both compare the network observations with the ground truth values, but the validation loss is not used to update the weights. This was done in order to avoid overfitting of the model. Overfitting is when the weights of the model are finely tuned to fit the training data in the model exactly. While this might seem preferable, it is a problem due to the model then not being able to handle any new data input afterward because the weights are no longer generalized. By using the validation loss as a metric, this allows for evaluation of how the model might perform on unseen data, and thereby shows how well the model weights are generalized. Thus, whenever the validation loss decreased (meaning the model weights are better generalized), in turn decreasing the chance of overfitting, the model weights would be saved.11
The optimizer and loss function used for this work were Adaptive Momentum Estimation (Adam) and Mean Absolute Percentage Error (MAPE), respectively, and work in conjunction with each other. The weights used to train the model are tweaked by the optimizer to obtain a prediction and the loss function (which gives how much the observations deviate from the ground truth value) tells if these alterations are moving in the right direction. Adam is useful due to it having an adaptive learning rate aspect which helps to ensure that the updates to the weights aren’t too small. This would lead to the model weights not changing, essentially leading to the model not learning, and a drop in computation efficiency.12 The MAPE function measures the mean absolute percentage error in percent, and for the entire dataset, gives the mean value for all such errors.13
2.4. Stacked and Median Model Algorithms
Once the best models were derived from the K-fold cross validation method (a total of 7 different models were used), these were then input into stacked models’ and median models’ algorithms (Fig. 4). The stacked algorithm works by taking the predictions of DNN models as input into a second-layer learning algorithm. This other algorithm (here another DNN) then is trained to optimally combine the first set of model predictions in order to produce a new set of predictions (Second-Layer DNN). The final outcome of this algorithm is a single, complex model, that is then used to make predictions, in this case patient eye lens dose based off of geometric parameters. The median algorithm is simpler where the median of the predictions of the DNN models is used to generate the new set of predictions. The outcome for this algorithm is a combination of models making predictions, instead of a single, complex model.
Fig. 4.

Schematic of the stacked models’ and median models’ algorithm. Stacked: The predictions of the first layer of models (DNNs in this case) are taken as input and fed into the second-layer algorithm (in this case another DNN). This model is then trained to combine the model predictions. The stacked model (a single model) produces a new set of predictions. Median: Median of DNN predictions is taken and used as new dose prediction.
3. RESULTS AND DISCUSSION
The testing set was used for the final model evaluation. The metric of mean absolute percent error (MAPE) was used to find which models worked best. A lower MAPE score (given in percent) is better, where a MAPE of 0% would mean perfect agreement between the model predictions and ground truth values (MC data). A total of 20 models from K-fold cross validation were looked at and then evaluated with MAPE on the testing set. The seven models that had the best MAPE values were then used as input into the median and stacked model algorithms. The median and stacked model algorithms were then tested using the same testing set, and the metric again were used to evaluate which algorithm worked best.
Table 1 shows the results after applying the various methods on the testing set and interpolation data. The interpolation data pertains to comparison of the predicted 8 cm x 8 cm field size data to that generated by MC simulations. This data was not used in the training process, thus the models only trained on 5 cm x 5 cm, and 10 cm x 10 cm data. In turn, this leads to a good check of the interpolation capabilities of the DNN models.
Table 1.
Table showing the values of the mean absolute percent error (MAPE) for the median and stacked models
| Data | Metric | Median Method | Stacked Method | Single Model* | Median w/o Hi-Dose** |
|---|---|---|---|---|---|
| Testing Set | MAPE (%) | 3.4 | 3.9 | 7.9 | 7.8 |
| Interpolation Set | MAPE (%) | 5.2 | 7.5 | 23.8 | 19.3 |
the best performing single model and
the median model trained without the added high-dose values. “Test” refers to results for the testing set. “Interpolation” refers to the 8 cm x 8 cm field size data generated to test the interpolation capabilities.
When using a data set without the added high-dose values, the models are not able to sufficiently train on and predict the high-dose values. While these extreme dose values would be rare in a clinical setting, it would still be important for our model to be able to accurately predict them and provide correct dose estimates. Thus, we incorporated more high-dose data into the training dataset to overcome this problem. This allowed the models to become more accustomed to these high-dose values, and in turn, be better able to predict these extreme-dose values. In fact, the median and stacked model methods with the added high-dose data performed better than the median method without the added high-dose values in the training set, which had a MAPE of 7.8% for the testing set. The MAPE values are also vastly different for the interpolation, with the median and stacked models having a MAPE score of 5.2% and 7.5%, respectively, as seen in Table 1, while the model without the added high-dose values had a MAPE score of 19.3%. The large difference in the MAPE values is most likely due to the prediction errors on the extreme dose values, which the added high dose data helped to overcome. The better performance of the median and stacked models compared to the median model without added high-dose data shows that the added high-dose data in the training set improved the prediction for these extreme-dose values.
The values in Table 1 show that both the median and stacked methods performed better than a normal, single model (where no K-fold or stacked/median was used) which had a minimum MAPE of 7.9% for all single models over the testing set. This is most likely due to the fact that a single model may have prediction errors on certain data. However, with the stacked/median algorithms, while one model in the algorithm may have prediction error on a point, the other models could predict the point without error. This ‘multi-prediction’ on a data point allows the algorithms to have better prediction power as compared to a single model.
The following plots (Fig. 5) show the model predictions vs the ground truth (MC) dose values for the different methods. This was done for both the testing set data and the 8 cm x 8 cm interpolation data. The test data plots refer to the testing set used to evaluate the final models after the training phase. Note that, to facilitate comparison with the median model without added high-dose values, the testing set data was not increased to include higher-dose values as was the training set. This is the same test data that was used for Table 1 above.
Fig. 5.

A1–2: Median model predictions vs the ground truth values for the testing set and interpolation data set, respectively. B1–2: Stacked model predictions vs the ground truth values for the testing set and interpolation data set, respectively.
There were still greater deviations observed in Fig. 5 for the higher dose values than the lower dose values for the stacked and median models, and additional high-dose values may be needed for the training set. However, these models still had better performance on the extreme dose values as compared to the model without added high-dose data. This leads us to conclude that the stacked and median models have better performance overall with added high-dose data and that, in general, a DNN works well in predicting the patient eye lens dose given the geometric exposure parameters as input.
The prediction time for the methods was also checked in order to make sure that these DNN methods would be feasible for real time update and display. For it to be real-time, and avoid introducing lag into the system, the feedback time should be under a second, reasonably on the order of a few tenths of a second. For the median method, the average prediction time for the entire testing set (118 data points) was 1.24 seconds, and the average time to predict a single exposure projection was 10 ms. For the stacked method, the average prediction time for the entire testing set was 0.32 seconds and the average time to predict a single exposure projection was 4 ms. Although creating the stacked model is more complex, once created, it is still only one model. The median algorithm has multiple models making predictions at the same time, for each prediction, which leads to longer prediction time.
4. CONCLUSIONS
A DNN method is able to accurately predict the patient eye lens dose based on the geometric and exposure parameters that would be used in a fluoroscopic neuro-interventional procedure. With the added high dose data (generated using MC), the DNNs showed improved performance (Table 1) and are able to more accurately predict higher doses. Furthermore, the prediction accuracy of the DNNs with the 8 cm x 8 cm field size (Table 1) shows that the DNN is able to interpolate well and would eliminate the need for a large database of pre-calculated factors as an alternative method. For comparison between the median method and the stacked models method, the median method had a better MAPE score for both the testing set and interpolation data (3.35% vs 3.92% for testing, and 5.17% vs 7.47% for interpolation). This is a bit surprising considering the simplicity of the median algorithm compared to the more complex stacked models algorithm. A reason for this could be more models need to be generated in order to be input into the stacked models algorithm for its complex nature to be used to its full effect. Both methods have a small prediction time (10 ms for median and 4 ms for stacked) meaning that either would be able to be used in real-time during procedures to predict the patient eye lens dose. This work shows that using a DNN is a viable option to be implemented in the DTS to predict patient eye lens dose in real time and eliminate the need to have a large database of pre-calculated factors.
Supplementary Material
Acknowledgments
This work was supported in part by Canon Medical Systems and NIH Grant No. 1R01EB030092. The Monte-Carlo results were obtained using the resources of the Center for Computational Research (CCR) of the University at Buffalo.
Footnotes
Disclosure Authors receive research support from Canon (Toshiba) Medical Systems. The dose tracking system (DTS) software is licensed to Canon Medical Systems by the Office of Science, Technology Transfer and Economic Outreach of the University at Buffalo.
REFERENCES
- [1].University of California, “Prepare for Interventional & Neuro Interventional Procedure”, UCSF, https://radiology.ucsf.edu/patient-care/prepare/ir (9 November 2019) [Google Scholar]
- [2].Murphy A, Jones J, “Deterministic effects”, Radiopaedia, https://radiopaedia.org/articles/deterministic-effects?lang=us
- [3].Sanchez RM, Vano E, Fernandez M, Roasati S, Lopez-Ibor L, “Radiation Doses in Patient Eye Lenses during Interventional Neuroradiology Procedures”, March 2016, AJNR, http://www.ajnr.org/content/37/3/402 (9 September 2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Thome C, “Deterministic Effects to the Lens of the Eye Following Ionizing Radiation Exposure: is There Evidence to Support a Reduction in Threshold Dose?”, NCBI, March 2018. https://www.ncbi.nlm.nih.gov/pubmed/29360710 (12 November 2019) [DOI] [PubMed] [Google Scholar]
- [5].Rana VK, Rudin S, Bednarek DR, “A tracking system to calculate patient skin dose in real-time during neurointerventional procedures using a biplane x-ray imaging system”. September 2016, https://www.ncbi.nlm.nih.gov/pubmed/27587043 (25 November 2019) [DOI] [PMC free article] [PubMed]
- [6].Bednarek DR, Barbarits J, Rana VK, Nagaraja SP, Josan MS, Rudin S, “Verification of the performance accuracy of a real-time skin-dose tracking system for interventional fluoroscopic procedures”. Proc SPIE 796127, (2011) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Guo C, Troville J, Sun S, Rudin S, Bednarek DR, “Variation of eye-lens dose with variation of the location of the beam isocenter in the head during neuro-interventional fluoroscopic procedures”. Proc. SPIE 11312, (2020) [Google Scholar]
- [8].Agrawal A, “Building Neural Network from Scratch”, 13 June 2018, https://towardsdatascience.com/building-neural-network-from-scratch-9c88535bf8e9
- [9].Liu D, “A Practical Guide to ReLU”, Medium, 30 November 2017, https://medium.com/@danqing/a-practicalguide-to-relu-b83ca804f1f7 (2 November 2017)
- [10].Brownlee J, “A Gentle Introduction to K-fold Cross-Validation”, 23 May 2018, https://machinelearningmastery.com/k-fold-cross-validation/
- [11].Bhande A, “What is underfitting and overfitting in machine learning and how to deal with it.”, 11 March 2018, https://medium.com/greyatom/what-is-underfitting-and-overfitting-in-machine-learning-and-how-to-dealwith-it-6803a989c76
- [12].Ruder S, “An overview of gradient descent optimization algorithms”, 19 January 2016, https://ruder.io/optimizing-gradient-descent/index.html#adadelta
- [13].“Regression Loss Metrics”, https://peltarion.com/knowledge-center/documentation/evaluation-view/regression-loss-metrics
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
