Abstract.
Purpose
X-ray scatter significantly affects the image quality of cone beam computed tomography (CBCT). Although convolutional neural networks (CNNs) have shown promise in correcting x-ray scatter, their effectiveness is hindered by two main challenges: the necessity for extensive datasets and the uncertainty regarding model generalizability. This study introduces a task-based paradigm to overcome these obstacles, enhancing the application of CNNs in scatter correction.
Approach
Using a CNN with U-net architecture, the proposed methodology employs a two-stage training process for scatter correction in CBCT scans. Initially, the CNN is pre-trained on approximately 4000 image pairs from geometric phantom projections, then fine-tuned using transfer learning (TL) on 250 image pairs of anthropomorphic projections, enabling task-specific adaptations with minimal data. 2D scatter ratio (SR) maps from projection data were considered as CNN targets, and such maps were used to perform the scatter prediction. The fine-tuning process for specific imaging tasks, like head and neck imaging, involved simulating scans of an anthropomorphic phantom and pre-processing the data for CNN retraining.
Results
For the pre-training stage, it was observed that SR predictions were quite accurate (). The accuracy of SR predictions was further improved after TL, with a relatively short retraining time ( times faster than pre-training) and using considerably fewer samples compared to the pre-training dataset ( times smaller).
Conclusions
A fast and low-cost methodology to generate task-specific CNN for scatter correction in CBCT was developed. CNN models trained with the proposed methodology were successful to correct x-ray scatter in anthropomorphic structures, unknown to the network, for simulated data.
Keywords: x-ray scatter, cone beam computed tomography, deep learning, transfer learning
1. Introduction
X-ray computed tomography (CT) is an imaging technique extensively used in modern healthcare for disease diagnosis and to guide medical treatments and interventions. As such, a patient’s outcome is often impacted by the image quality provided by the CT system.1 CT image quality is subject to several physical and engineering factors, particularly, x-ray scatter.2 Traditional CT reconstruction algorithms neglect x-ray scatter to simplify the solution to the CT inverse problem; however, such simplification could lead to severe image artifacts, hindering the imaging target.3 Compared to multi-detector CT, which is the most common approach for diagnostic tasks, the effect of x-ray scatter is more pronounced in cone beam computed tomography (CBCT); CBCT makes use of broad x-ray beams and wide flat-panel detectors; thus, the probability of detecting scattered photons increases dramatically.4
Several technical solutions have been proposed to reduce the effect of x-ray scatter in CBCT.5,6 While these solutions have proven to be successful in providing adequate image quality for medical imaging, relevant trade-offs are introduced by their implementation. For example, anti-scatter grids (ASG) are simple hardware solutions that effectively reduce the detection of scattered x-ray photons:7 however, ASG considerably decreases x-ray fluence at the detector’s entrance, constraining dose optimization efforts. Software solutions have been proposed as well; for instance, model based iterative reconstruction (MBIR) algorithms have the potential to incorporate x-ray scatter to the reconstruction process; nonetheless, the computational cost of including scatter in MBIR could be large and undesirable.8 A more common approach to address x-ray scatter is to apply corrections to projection images prior to CT reconstruction; such corrections are based on empirical or semi-empirical models and rely on simplifying assumptions that may limit the accuracy of the correction.5 While it is possible to perform Monte Carlo simulations, or solve Boltzmann’s transport equation, to estimate the scatter component of the image with a high degree of accuracy, this is often time consuming.6
In recent years, convolutional neural networks (CNNs) have been used to improve the image quality of degraded medical images.9 Particularly, in the field of medical imaging, previous works have demonstrated the potential of CNN to perform x-ray scatter corrections.9–18 However, in real clinical practice, there are still challenges to effectively use CNN to compensate for x-ray scatter. First, it is often the case that deep learning models benefit from large volumes of data, which are not always trivial to obtain or curate, especially for clinical applications, where patient safety and consent is a priority. In addition, large volumes of training data demand high performance computing hardware and long training times. Second, training data will always be finite and limited; thus, it is not clear how generalizable these models are, especially for those imaging settings and anatomical regions that were not included in the training dataset. It is therefore desirable to standardize the training data set and training methodology to guarantee a robust and consistent performance of a CNN compensating for x-ray scatter.
In this work, a framework to address the previously described challenges by means of transfer learning (TL) is proposed and studied using simulated datasets. Our final goal was to develop a fast and low-cost methodology to generate task-specific CNN for scatter correction in CBCT, that is, a custom-designed CNN that compensates for x-ray scatter given a specific anatomical region and acquisition settings. The use of TL allowed adapting a roughly pre-trained CNN to specialized tasks, with relatively small volumes of data and in a fast manner.19
2. Material and Methods
2.1. Overview of the Proposed Methodology
Figure 1 shows an overview of the methodology utilized in this work. First, a set of numerical phantoms of known geometry and composition were defined (Sec. 2.2). CBCT scans of such phantoms were simulated using dedicated software, for a given set of acquisition parameters (Sec. 2.4). Two projection datasets (in counts) were obtained from each simulation: one with- and the other without- the contribution of x-ray scatter. Projection data with x-ray scatter contribution was subject to log normalization. Projection data, both with and without x-ray scatter contribution, were combined to determine, view by view, 2D scatter ratio (SR) maps (Sec. 2.5). Individual log-normalized 2D projections were treated as CNN input, whereas their associated SR maps were regarded as CNN target. A CNN with U-net architecture (Sec. 2.6) was pre-trained with data derived from CBCT simulations of diverse phantoms in a variety of acquisition settings (Sec. 2.7).
Fig. 1.
An overview of the suggested methodology to obtain task-based CNN models.
The pre-trained CNN was then fine-tuned by means of TL, to obtain CNN models for specific imaging tasks (task-based CNNs). For example, for head and neck imaging, a numerical head phantom was first defined (Sec. 2.3). Then, CBCT scans of such anthropomorphic phantom were simulated as described above. Finally, the resulting projection data was pre-processed to obtain CNN inputs and targets and used to re-train the CNN.
Note that, in the proposed methodology, the scatter correction is not performed by the CNN; rather, the CNN act as an estimator of the SR map of a given view angle. To perform the actual scatter correction, each projection needs to be compensated by the predicted SR map.
The performance of a task-based CNN was tested in terms of SR predictions, and the subsequent scatter correction, for a CBCT scan unknown to the network (see Sec. 2.9). Such scan was also obtained from simulations.
2.2. Geometric Phantoms
The numerical phantoms used to generate projection data for CNN pre-training were defined as cylindrical volumes containing material inserts. Figure 2 shows, in axial view, the distribution and composition of these inserts for six phantom classes considered in this work. CBCT projections for each phantom class were obtained with varying phantom lengths (8 and 16 cm) and diameters (8 and 16 cm), resulting in 24 unique phantom configurations.
Fig. 2.
Axial view of the six cylindrical phantom classes used to obtain projection data for CNN pre-training. The display color scale is arbitrary. Phantom dimensions were varied to create different phantom configurations.
2.3. Anthropomorphic Phantoms
A well-known anthropomorphic numerical phantom, the NCAT cardiac/torso phantom,20,21 was used for TL. The composition of this phantom was defined as: polystyrene, representing soft tissue regions; Teflon, for compact bone areas; and Delrin, for trabecular bone sections. The phantom dimensions were adjusted to simulate two different imaging tasks: adult head [Figs. 3(a) and 3(b)] and pediatric pelvis [Figs. 3(c) and 3(d)].
Fig. 3.
Pairs of maximum intensity projections for the four anthropomorphic numerical phantoms used in this study; both coronal and sagittal views are presented. Phantoms (a)–(d) were designated for retraining, whereas (e)–(h) were for the testing of retrained networks. Retraining phantoms exhibit differences in composition and shape compared to testing phantoms. The pairs to the left [(a) and (b); (e) and (f)] represent an adult head model, whereas the pairs to the right [(c) and (d); (g) and (h)] depict a pediatric pelvis model. The display grayscale is arbitrary and merely for visual representation. Dimensions of the phantoms at the iso-center are also provided.
Anthropomorphic models to test the performance of task-based CNNs were also derived from the NCAT phantom. However, there were two differences compared to the phantoms utilized to obtain retraining data: (i) the phantom was defined with a tissue composition corresponding to its actual anatomical structure, and (ii) a series of rotations and deformations were applied to subtly alter the shape of the original NCAT phantom [see Figs. 3(e)–3(h)].
2.4. Data Simulation
CBCT projections were obtained using fastCAT,22 a fast CBCT simulator that employs pre-computed Monte Carlo phantom-specific scatter and detector response functions to expedite the simulation process. Such tool has proven to generate CBCT images that closely match, in terms of image quality, with real CBCT data.23 FastCAT has the flexibility to exclude from the projection the contribution of x-ray scatter; thus, pairs of projections (with and without scatter) were generated for each of the simulation settings considered in this work, being the projection without scatter considered as ground truth. In all cases, the scan geometry was defined as follows: a 152 cm distance from the source to the detector; a 100 cm distance from the source to the object (isocenter); and a detector size of , with a pixel size of 0.784 mm. Every scan involved full rotations (), sampled at 1-deg intervals. In each simulation, an x-ray tube with a tungsten (W) target, 0.4 mm Al inherent filtration, and 1.2 mm focal spot was considered. Neither a bowtie filter nor an anti-scatter grid was incorporated in the simulations. Other simulation parameters were let flexible and varied as detailed in the following subsections.
2.5. Data Pre-Processing
CBCT projection pairs were pre-processed as follows: first, a pure scatter projection () was obtained by subtracting the contribution of primary photons () from the total signal (); that is . Then, was divided by to obtain an SR map (); namely . On the other hand, was log-normalized with respect to an air signal (), and set to scale from 0 to 1 by dividing the result by a linear attenuation upper bound (); that is . This value was determined through inspection; we chose an arbitrary upper limit ensuring that the log-normalized pixel values of all projections remained below this threshold. Both SR and were saved in a 16-bit PNG format for CNN training purposes. always served as the input for the CNN, whereas SR was set as the CNN target.
2.6. CNN Architecture
A U-net architecture24 was utilized in this work, as it has been proven that U-nets have the potential to be effective for x-ray scatter correction.10–13,16,25 The particular configuration of the network was inspired in a previous work,25 and is briefly described as follows. Figure 4 shows the general structure of the network, where convolution layers are specified according to the attached color code, and the number of channels (ch) of the convolution layers are indicated at each depth.
Fig. 4.
Architecture of the U-net utilized in this work. Convolution layers are specified according to the attached color code. The number of channels (ch) of the convolution layers at each depth is also indicated.
Essentially, this network aims to decode information from an input signal and encode it into an output signal of the same dimension as the input. To do so, the input signal follows a down-sampling path (of seven stages in this case) where the CNN learns, via convolution layers, signal features at different resolution hierarchies. Then, the signal is restored to its original size by following an up-sampling path connected to the decoding path by concatenated skip connections. All convolution layers were followed by a rectified linear unit (ReLU) activation function. The input size of this CNN, in our case, was 512 by 256 pixels (0.784 mm pixel size).
2.7. CNN Pre-Training
CBCT scans of 24 phantom configurations, described in Sec. 2.2, were simulated using combinations of the parameters listed in the top row of Table 1. All CBCT projections were pre-processed, as described in Sec. 2.5. From each scan, 18 randomly selected projection angles were selected, to generate a total of 5184 (input, target) image pairs. Image pairs were randomly grouped into three subsets: training (3628 image pairs), validation (777 image pairs), and testing (777 image pairs). The training and validation subsets were used to pre-train a CNN model (hereafter referred to as pCNN) with a U-net architecture, as described in Sec. 2.6, whereas the testing dataset was reserved to assess the performance of the pre-trained network.
Table 1.
CBCT simulation parameters to generate pre-training and retraining data. All parameters listed in the first row are combined to generate the corresponding dataset.
| Phantom | kV | Detector | Dose per projection (mGy) | |
|---|---|---|---|---|
| Pre-training | Cylindrical with material inserts (24 configurations) | 80, 120, and 140 | CsI () and CWO () | 0.04 and 0.01 |
| Retraining adult head | Adult head | 100 | CsI () | 0.04 |
| Retraining pediatric pelvis | Pediatric pelvis | 100 | CsI () | 0.04 |
The pre-training was performed in the Keras Deep Learning library26 with a TensorFlow backend engine (version 2.10.0). The structural similarity (SSIM) index measure27 was considered as a benchmark-index for the CNN predictions, and thus included in the loss function as . The loss function was minimized using an Adam optimizer,28 with a learning rate of , across 150 epochs. CNN pre-training was performed with an NVIDIA GeForce RTX 3060 GPU, and required about 6 h, setting the batch size as 30 image pairs.
2.8. Transfer Learning
CBCT scans of two anthropomorphic phantoms, adult head and pediatric pelvis (described in Sec. 2.3), were simulated using combinations of the parameters listed in Table 1. Note that such a selection of phantoms and parameters provides retraining data with information unknown to the network, for example, anatomical structures and a different x-ray spectrum.
Projections from each CBCT scan were pre-processed, as described in Sec. 2.5, and randomly grouped into two subsets: training (250 image pairs) and validation (110 image pairs), both utilized to retrain the pCNN. TL was performed for each scan, resulting in eight different subspecialized networks (hereafter referred to as tCNN). All weights of the pCNN were subject to updates during TL, except for those layers in the bottleneck of the network architecture (gray layers in Fig. 4, with 1024 output channels).
Keeping the bottleneck weights fixed in U-net during transfer learning helps to preserve high-level features useful for similar tasks, prevents overfitting especially in small datasets, improves training efficiency by reducing the number of parameters to adjust, and facilitates more effective knowledge transfer by leveraging previously learned internal representations, which is particularly valuable when the new task shares similarities with the original task.29
The whole TL process was performed using Keras, with the same graphic card and training parameters as the CNN pre-training (Sec. 2.7), except for the number of epochs and batch size, which were set to 50 and 20, respectively. Under such conditions, TL required about 5 min per each tCNN.
2.9. Performance Evaluation
To evaluate the performance of tCNNs, we used simulated CBCT scans of anthropomorphic phantoms that were not previously exposed to the network (refer to Sec. 2.2). A scatter correction was performed using SR maps predicted by the tCNN, prior to reconstruction. This correction involved multiplying each projection with a correction factor given by . The parameters for CBCT simulation, given an imaging task, matched those for network retraining, except for angular sampling rate, which was augmented to 800 projections. Reconstruction parameters were: FDK algorithm, a ramp filter, and 1 mm voxel size. Reconstructed images of ground truth projections (primary photons) were compared to tCNN corrected images, in terms of the SSIM (ensuring the comparison was restricted to areas with photon counts significantly above background levels). The SSIM is a comprehensive index that assesses visual impact by comparing three key components of an image: luminance, contrast, and structure. A high SSIM value (i.e., closer to one) suggests that the compared images are highly similar, reflecting accurate preservation of texture, contrast, and structural integrity.
3. Results
3.1. CNN Pre-Training
Figure 5 summarizes the performance of the CNN pre-training. Figure 5(a) illustrates the dynamics of the loss function as the number of epochs increases. From here, it can be observed that the loss is reduced for both the training and validation subsets across epochs, reaching convergence after about 100 epochs. Figure 5(b) shows, for the testing subset, the distribution of the SSIM associated with pCNN predictions. It can be observed that all predictions achieved SSIM values of 0.91 or higher, being (0.99, 1) the most frequent range of SSIM values.
Fig. 5.
CNN-pre-training performance. (a) Loss reduction as a function of the number of epochs, for both training and validation subsets. (b) SSIM distribution of SR predictions in the testing subset.
Figure 6 provides a visual reference of the pCNN performance. This figure shows a representative example from the testing subset. Three sub figures are displayed: a ground truth SR image [Fig. 6(a)], the pCNN predicted SR [Fig. 6(b)] and central profiles (averaged over 10 adjacent rows) of both target and pCNN prediction, respectively [Fig. 6(c)]. From this figure it can be observed that pCNN predictions match accurately with the target. In addition, pCNN predictions are considerably less noisy than their respective targets, which suggest that the proposed pCNN also performs some degree of denoising to the input data.
Fig. 6.
pCNN performance. This figure depicts a representative example from the testing dataset. Panels (a) and (b) correspond to target and pCNN-predicted SR images (projection domain), respectively. (c) Central profiles (averaged over 10 adjacent rows) of (a) and (b).
3.2. Transfer learning
Figure 7 summarizes the performance of pCNN TL for the considered imaging tasks. Loss value plots, as a function of the number of epochs, are displayed in the figure for adult head and pediatric pelvis imaging tasks. In both cases, it was observed that TL further diminishes the loss function to considerably lower levels (). Moreover, loss function convergence was reached relatively fast (around 30 epochs).
Fig. 7.
TL performance. Each plot depicts, for a given anatomical region, the loss reduction as a function of the number of epochs, for both training and validation datasets.
Figure 8 showcases the performance of tCNNs. Figure 8(a) displays an SR image for a representative projection from the adult head test scan, whereas Fig. 8(b) shows the associated tCNN prediction. Central profiles (averaged over 10 adjacent rows) of these SR images are given in Fig. 8(c). The distribution of SSIM values corresponding to tCNN predictions for all projections of the adult head test scan is shown in Fig. 8(d). Figures 8(e)–8(h) provide analogous data for pediatric pelvis imaging.
Fig. 8.
Performance of tCNN for two imaging tasks: adult head [(a)–(d)] and pediatric pelvis [(e)–(h)]. For each imaging task it is shown: the true SR of a given projection; the corresponding tCNN prediction; central profiles (averaged over 10 adjacent rows) for both true and predicted SR; and the SSIM distribution for all predicted SR values across the test scan, respectively.
From these results it can be observed that tCNN predictions match accurately with the target. In addition, tCNN predictions are less noisy than their respective targets, which corroborates that the proposed model performs some degree of denoising to the input data. On the other hand, it can be observed that tCNN predictions achieved high similarity ().
Figure 9 illustrates the performance of scatter correction based on CNN predictions, for both adult head and pelvis imaging. Figure 9(a) presents the ground truth image (in axial view) of a given slice from the reconstructed CBCT adult head test scan. Figure 9(b) showcases the contribution of x-ray scatter to the reconstructed image. As a reference, Fig. 9(c) shows the difference between the ground truth image and the CBCT image corrected using pCNN predictions. Similarly, Fig. 9(d) displays the difference when the CBCT image is corrected using tCNN predictions. Figure 9(e) through Fig. 9(h) provide analogous data for pediatric pelvis imaging.
Fig. 9.
Reconstructed images (axial view) from a representative slice of the test scan for both adult head [(a)–(d)] and pediatric pelvis [(e)–(h)] imaging. For each imaging task, the following are presented: the reference ground truth image (without scatter); pure scatter contribution; the difference between ground truth and a pCNN corrected image; and the difference between ground truth and a tCNN corrected image, respectively.
From these results it can be observed that the scatter correction derived from tCNN predictions is effective to remove the effect of x-ray scatter. On the other hand, the scatter correction derived from pCNN predictions leads to considerable residuals when compared to the ground truth, which confirms the need for TL to predict adequate SR estimates.
Similar results were observed for all other slices within the reconstruction volume, as illustrated in Fig. 10. This figure shows, for adult head and pediatric pelvis imaging tasks, respectively, the distribution of SSIM values for images corrected using tCNN predictions, with respect to the ground truth. Note, however, that SSIM values for adult head imaging tend to be closer to one when compared to those for pediatric pelvis imaging. This discrepancy is likely attributable to the more complex anatomy and greater variation in material composition of the pediatric pelvis compared to the head.
Fig. 10.
Distribution of SSIM values for scatter-corrected images, with respect to the ground truth, within the reconstruction volume. Histograms representing adult head and pediatric pelvis are depicted in panels (a) and (b), respectively. The scatter corrections utilized tCNN SR predictions.
4. Discussion
This work proposes a methodology to develop task-specific CNNs for scatter correction in CBCT imaging, such methodology makes use of TL to repurpose a pre-trained CNN. The pre-training was performed using geometric phantom data, whereas TL was achieved using anthropomorphic data of the target imaging task. For the pre-training stage, it was observed that SR predictions were quite accurate. Furthermore, it was noted that the SR predictions from pCNN are smoother than the actual training targets, suggesting that CNN-based methods do not contribute additional noise to the reconstructed image.30 The accuracy of SR predictions was further improved after TL (see Fig. 9). Although an explicit analysis of speed and cost-effectiveness was not included in this work, a considerable disparity in training time and dataset size between pre-training and retraining (at least an order of magnitude difference) suggests that TL has the potential to repurpose general CNN models with a relatively short retraining time and using considerably fewer samples, compared to the pre-training. Moreover, SR predictions of tCNN were adequate to derive actual scatter corrections.
The proposed framework overcomes two important limitations intrinsic to CNN implementation: (i) high demand of data and computation time and (ii) model generalization uncertainty. Note that, while large volumes of data and long computation times are still needed for the pre-training stage, such a stage is only performed once. Task-based retraining to generate repurposed models is fast, about 70 times faster compared to the pre-training, and makes use of relatively small datasets. Besides, the pre-training dataset only included phantom data, which is relatively easy to obtain or simulate.
The reason to include only geometric-phantom data in the pre-training dataset is to generate a simple CNN model that could be repurposed for imaging tasks of arbitrary complexity. Scatter patterns derived from simple geometries and given ranges of material compositions, contrast levels and feature sizes are provided by the phantoms considered for simulation and aim to be reflective of the spectrum typically encountered in clinical settings. Note, however, that many other phantoms satisfy such conditions; thus, it is unclear how to standardize the pre-training dataset.
Another key point to achieve CNN generalization was an adequate normalization of the data. By using the log transform of projections and the associated SR as input and output of the network, respectively, CNN performance becomes independent of the native signal magnitude, and less sensitive to object size and radiation dose. Ultimately, what the CNN learns is a transformation between , which is a dimensionless factor, and signal buildup fraction due to x-ray scatter. Note, however, that noise magnitude is dose dependent and cannot be scaled by the proposed normalization; therefore, projection data at different dose levels was included in the pre-training, to make the pCNN robust.
It is worth emphasizing that the use of SSIM as loss function was not arbitrary. The SSIM is often favored over traditional metrics like root mean square error (RMSE) or mean absolute error (MAE) when assessing perceptual image quality.27 Unlike RMSE and MAE, which focus on pixel-wise differences, SSIM evaluates the structural similarity between images, considering changes in structure, luminance, and texture. This aligns better with human visual perception, emphasizing structural changes over mere pixel-level discrepancies. Thus, SSIM provides a more holistic and comprehensive assessment of image quality, especially in scenarios where structural integrity is paramount.
One of the limitations of this work was the use of a numerical anthropomorphic phantom for testing. While it would be preferable to use patient data to validate our approach, several challenges arise when accessing real clinical data. One challenge is navigating the complexities of patient privacy and ethical considerations. Additionally, the scatter correction features in many commercial imaging systems limit access to raw, uncorrected data, often requiring specific licenses. Given these constraints, our study primarily relied on numerical simulations. Although digital phantoms serve as useful stand-ins, they cannot fully represent human anatomy. Bearing this in mind, future work aims to further refine and validate the proposed scatter correction methodology using genuine clinical data. Nonetheless, our results showed the potential of TL to adapt the pCNN to imaging tasks of higher complexity.
The size of the anthropomorphic phantom was restricted due to the capabilities of the fastCAT software. Specifically, this software can only simulate scatter patterns for objects up to 16 cm in diameter. While phantom size restrictions ensured compatibility with the simulation software and streamlined the reconstruction process, it also means that our findings might not directly translate to scenarios involving larger phantoms with truncated views. We recognize that this limitation is crucial, as it underscores the need for further research.
Another limitation is the use of simulated data. While fastCAT is a flexible tool and has shown to be reliable for 16 mm diameter phantoms,23 a CNN may be sensitive to subtle differences between real projections and fastCAT simulations, which could ultimately impact the SR prediction accuracy. It is highly recommended that future studies include real phantom projections as part of the training datasets.
A key contribution of this work is the introduction of a task-based paradigm. As discussed above, CNN generalization is an important epistemological concern. Conventional deep learning approaches aim to develop general models by training a single CNN with volumes of data as large and diverse as possible. This work proposes an alternative approach: first, a CNN is pre-trained with simplified cases; second, a specialized network is derived from the pre-trained network, by means of TL. While this approach requires retraining for any new task, such retraining is usually fast, and does not require large nor diverse volumes of data. Moreover, a stepped training strategy is more likely to how humans learn. Also note that our approach could be easily adapted to other patient and detector sizes, as well as to other medical imaging applications.
5. Conclusions
A fast and low-cost methodology to generate task-specific CNNs for scatter correction in CBCT was developed. CNN models trained with the proposed methodology were successful to correct x-ray scatter in anthropomorphic structures, unknown to the network, for simulated data.
Acknowledgments
JPCB is grateful for the postdoctoral scholarship received from Dirección General de Asuntos del Personal Académico (DGAPA), Universidad Nacional Autónoma de México (UNAM). The authors acknowledge the support from DGAPA-UNAM PAPIIT, grant number IN108721, for the funding of this research work. FM acknowledges the master’s scholarship granted by Conahcyt.
Biographies
Juan P. Cruz-Bastida is an alumnus of the University of Wisconsin-Madison. His research interests focus on the objective assessment of image quality and the development of quantitative methods for image analysis in CT. His current work emphasizes the application of AI technologies to medical imaging. The research presented in this paper is a culmination of his postdoctoral tenure at the Biomedical Imaging Laboratory of the Institute of Physics at UNAM.
Fernando Moncada, MSc, is currently pursuing a doctorate in physics at UNAM, where he previously earned a master’s degree in medical physics in 2021. His academic career commenced with an undergraduate degree in physics, obtained from the National Polytechnic School in Ecuador in 2018. During his time there, he dedicated 3 years to working in the Physics Laboratory. His research interests are primarily focused on medical imaging, deep learning, and compressed sensing.
Arnulfo Martínez-Dávalos is professor of physics at the Institute of Physics, UNAM. He received his BSc degree in physics from UNAM and his PhD in radiation physics from the Department of Medical Physics and Bioengineering of University College London, United Kingdom. His main scientific interests are the Monte Carlo simulation of radiation transport in matter, the design and development of ionizing radiation detectors, and their applications in medical imaging and high energy physics.
Mercedes Rodríguez-Villafuerte is a professor at the Institute of Physics, UNAM. She received her BSc degree in physics from UNAM and her PhD from the graduate program in radiation physics at University College London, University of London, United Kingdom. Her research interests include attenuation and scatter corrections in computed tomography and positron emission tomography.
Contributor Information
Juan P. Cruz-Bastida, Email: cruzbastida@fisica.unam.mx.
Fernando Moncada, Email: fernandomoncadagutierrez@hotmail.com.
Arnulfo Martínez-Dávalos, Email: arnulfo@fisica.unam.mx.
Mercedes Rodríguez-Villafuerte, Email: mercedes@fisica.unam.mx.
Disclosures
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Code and Data Availability
All code and data related to this research are publicly available. The source code can be accessed via our GitHub repository at https://github.com/calcutech/task-based-DSC. Additionally, the datasets used in this study are available in a separate repository, which can be found at https://zenodo.org/doi/10.5281/zenodo.10823375.
References
- 1.Barrett H. H., et al. , “Task-based measures of image quality and their relation to radiation dose and patient risk,” Phys. Med. Biol. 60(2), R1 (2015). 10.1088/0031-9155/60/2/R1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Sayed M., et al. , “The principles and effectiveness of X-ray scatter correction software for diagnostic X-ray imaging: a scoping review,” Eur. J. Radiol. 158, 110600 (2023). 10.1016/j.ejrad.2022.110600 [DOI] [PubMed] [Google Scholar]
- 3.Buzug T. M., “Computed tomography,” in Springer Handbook of Medical Technology, Kramme R., Hoffmann K.-P., Pozos R. S., Eds., pp. 311–342, Springer, Berlin, Heidelberg: (2011). [Google Scholar]
- 4.Niu T., et al. , “Shading correction for on-board cone-beam CT in radiation therapy using planning MDCT images,” Med. Phys. 37(10), 5395–5406 (2010). 10.1118/1.3483260 [DOI] [PubMed] [Google Scholar]
- 5.Rührnschopf E.-P., Klingenbeck K., “A general framework and review of scatter correction methods in X-ray cone-beam computerized tomography. Part 1: Scatter compensation approaches,” Med. Phys. 38(7), 4296–4311 (2011). 10.1118/1.3599033 [DOI] [PubMed] [Google Scholar]
- 6.Rührnschopf E.-P., Klingenbeck K., “A general framework and review of scatter correction methods in cone beam CT. Part 2: Scatter estimation approaches,” Med. Phys. 38(9), 5186–5199 (2011). 10.1118/1.3589140 [DOI] [PubMed] [Google Scholar]
- 7.Schafer S., et al. , “Antiscatter grids in mobile C-arm cone-beam CT: effect on image quality and dose,” Med. Phys. 39(1), 153–159 (2012). 10.1118/1.3666947 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Nuyts J., et al. , “Modelling the physics in the iterative reconstruction for transmission computed tomography,” Phys. Med. Biol. 58(12), R63 (2013). 10.1088/0031-9155/58/12/R63 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sahiner B., et al. , “Deep learning in medical imaging and radiation therapy,” Med. Phys. 46(1), e1–e36 (2019). 10.1002/mp.13264 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Erath J., et al. , “Deep learning-based forward and cross-scatter correction in dual-source CT,” Med. Phys. 48(9), 4824–4842 (2021). 10.1002/mp.15093 [DOI] [PubMed] [Google Scholar]
- 11.Jiang Y., et al. , “Scatter correction of cone-beam CT using a Deep Residual Convolution Neural Network (DRCNN),” Phys. Med. Biol. 64(14), 145003 (2019). 10.1088/1361-6560/ab23a6 [DOI] [PubMed] [Google Scholar]
- 12.Lee H., Lee J., “A deep learning-based scatter correction of simulated X-ray images,” Electronics 8(9), 944 (2019). 10.3390/electronics8090944 [DOI] [Google Scholar]
- 13.Maier J., et al. , “Deep Scatter Estimation (DSE): accurate real-time scatter estimation for X-ray CT using a deep convolutional neural network,” J. Nondestruct. Eval. 37(3), 57 (2018). 10.1007/s10921-018-0507-z [DOI] [Google Scholar]
- 14.Nomura Y., et al. , “Projection-domain scatter correction for cone beam computed tomography using a residual convolutional neural network,” Med. Phys. 46(7), 3142–3155 (2019). 10.1002/mp.13583 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Piao Z., et al. , “Adaptive scatter kernel deconvolution modeling for cone-beam CT scatter correction via deep reinforcement learning,” Med. Phys. 51(2), 1163–1177 (2023). [DOI] [PubMed] [Google Scholar]
- 16.Sakaltras N., et al. , “Deep-learning based scatter correction in digital radiography,” in IEEE Int. Conf. Imaging Syst. and Tech. (IST), pp. 1–4 (2021). 10.1109/IST50367.2021.9651422 [DOI] [Google Scholar]
- 17.Zhang X., et al. , “Image-based scatter correction for cone-beam CT using flip swin transformer U-shape network,” Med. Phys. 50(8), 5002–5019 (2023). 10.1002/mp.16277 [DOI] [PubMed] [Google Scholar]
- 18.Zhuo X., et al. , “Scatter correction for cone-beam CT via scatter kernel superposition-inspired convolutional neural network,” Phys. Med. Biol. 68(7), 075011 (2023). 10.1088/1361-6560/acbe8f [DOI] [PubMed] [Google Scholar]
- 19.Zhuang F., et al. , “A comprehensive survey on transfer learning,” Proc. IEEE 109(1), 43–76 (2021). 10.1109/JPROC.2020.3004555 [DOI] [Google Scholar]
- 20.Segars W. P., Lalush D. S., Tsui B. M. W., “A realistic spline-based dynamic heart phantom,” IEEE Trans. Nucl. Sci. 46(3), 503–506 (1999). 10.1109/23.775570 [DOI] [Google Scholar]
- 21.Segars W. P., Lalush D. S., Tsui B. M. W., “Modeling respiratory mechanics in the MCAT and spline-based MCAT phantoms,” IEEE Trans. Nucl. Sci. 48(1), 89–97 (2001). 10.1109/23.910837 [DOI] [Google Scholar]
- 22.O’Connell J., Bazalova-Carter M., “fastCAT: fast cone beam CT (CBCT) simulation,” Med. Phys. 48(8), 4448–4458 (2021). 10.1002/mp.15007 [DOI] [PubMed] [Google Scholar]
- 23.O’Connell J., Lindsay C., Bazalova-Carter M., “Experimental validation of Fastcat kV and MV cone beam CT (CBCT) simulator,” Med. Phys. 48(11), 6869–6880 (2021). 10.1002/mp.15243 [DOI] [PubMed] [Google Scholar]
- 24.Shelhamer E., Long J., Darrell T., “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017). 10.1109/TPAMI.2016.2572683 [DOI] [PubMed] [Google Scholar]
- 25.Maier J., et al. , “Real-time scatter estimation for medical CT using the deep scatter estimation: method and robustness analysis with respect to different anatomies, dose levels, tube voltages, and data truncation,” Med. Phys. 46(1), 238–249 (2019). 10.1002/mp.13274 [DOI] [PubMed] [Google Scholar]
- 26.Chollet F., “Keras,” https://keras.io/ (accessed 14 Feb. 2023).
- 27.Wang Z., et al. , “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13(4), 600–612 (2004). 10.1109/TIP.2003.819861 [DOI] [PubMed] [Google Scholar]
- 28.Kingma D. P., Ba J., “Adam: a method for stochastic optimization,” arXiv:1412.6980 (2017).
- 29.Yosinski J., et al. , “How transferable are features in deep neural networks?,” arXiv:1411.1792 (2014).
- 30.Duan X., et al. , “Deep-learning convolutional neural network-based scatter correction for contrast enhanced digital breast tomosynthesis in both cranio-caudal and mediolateral-oblique views,” J. Med. Imaging 10(S2), S22404 (2023). 10.1117/1.JMI.10.S2.S22404 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All code and data related to this research are publicly available. The source code can be accessed via our GitHub repository at https://github.com/calcutech/task-based-DSC. Additionally, the datasets used in this study are available in a separate repository, which can be found at https://zenodo.org/doi/10.5281/zenodo.10823375.










