Abstract
Digital mammography is still the most common imaging tool for breast cancer screening. Although the benefits of using digital mammography for cancer screening outweigh the risks associated with the x-ray exposure, the radiation dose must be kept as low as possible while maintaining the diagnostic utility of the generated images, thus minimizing patient risks. Many studies investigated the feasibility of dose reduction by restoring low-dose images using deep neural networks. In these cases, choosing the appropriate training database and loss function is crucial and impacts the quality of the results. In this work, we used a standard residual networks (ResNet) to restore low-dose digital mammography. We compared the restored images to the standard full-dose images. Moreover, we evaluated the performance of several loss functions for this task. For training purposes, we extracted 256,000 image patches from a dataset of 400 images of retrospective clinical mammography exams, where different dose levels were simulated to generate low and standard-dose pairs. To validate the network in a real scenario, a physical anthropomorphic breast phantom was used to acquire real low-dose and standard full-dose images in a commercially available mammography system, which were then processed through our trained model. An analytical restoration model for low-dose digital mammography, previously presented, was used as a benchmark in this work. Objective assessment was performed through the signal-to-noise ratio (SNR) and mean normalized squared error (MNSE), decomposed into residual noise and bias. The statistical test showed that the perceptual loss functions (PL3 and PL4) are statistically different for all other losses, in terms of residual noise, and has the closest value to the full-dose (p-values <0.01). In terms of bias, PL3, GAN-0.1 and SSIM achieved the lowest value for both dose reduction factors. The source code for our deep neural network is available at https://github.com/WANG-AXIS/LdDMDenoising.
1. Introduction
Early diagnosis of breast cancer is crucial to improve the survival rate. The expansion of breast screening programs contributed to improve this rate, which has been significantly increased in the last years [1]. This disease is still the main cause of cancer-related deaths among women and screening mammography is the primary tool for detecting tumors at early stages, especially for women at the age of 50-69 [2].
Full-field digital mammography and digital breast tomosynthesis are the most common imaging tools for breast cancer screening [3]. In these systems a small dose of x-ray radiation is used to generate projections of the breast, which are then interpreted by a radiologist [4]. Although the radiation dose levels are kept within a safe margin, it is desirable to keep the dose as low as possible while maintaining satisfied image quality to fulfill the clinical screening purposes [5]. However, reducing radiation doses can degrade image quality, limiting the performance of the radiologist on searching and characterizing subtle lesions [6–9].
Several works in the field of medical imaging investigate the potential of reducing radiation dose by restoring low-dose (LD) exams to achieve image quality comparable to the ones at the clinical routine. Some proposals, in the field of computed tomography [10–12] and digital mammography [13–15], evaluated this technique in a model-based (MB) approach, using restoration methods through denoising techniques to improve image quality. In [14] and [15], the authors proposed a pipeline to restore LD mammography through a variance stabilizing transformation (VST). Recently, it was shown that the denoising method may improve the localization of microcalcifications (MC) in these exams [16].
With the rapid development of deep learning techniques, in particular the convolutional neural networks (CNNs), many studies have proposed algorithms to improve the quality of LD images and achieved comparable, or even better results, than the MB ones [17–27]. This new data-based approach takes advantage of learning features directly from a dataset rather than explicitly applying advanced feature extraction techniques or modeling the system mathematically. One important constraint of data-based techniques is the necessity of a great number of images and a diverse dataset for the training process [28]. In the field of medical imaging, access to large datasets is limited and some techniques, such as data augmentation and transfer learning, have been applied to increase the size of the dataset [29, 30]. Moreover, when it comes to LD image restoration using supervised learning, acquisitions at different dose levels are required in the training process. Although it may be possible to create experimental LD/FD imaging protocols, such as in [20], exposing the patient multiple times increases the risks of induced cancer.
Thus, in the field of computed tomography, a common approach is to train these deep networks using clinical data [31], where LD images are obtained by injecting extra noise into the standard full-dose (FD) projections [26]. In the field of mammography, it is common to use breast specimen [32], physical phantoms [33] or virtual clinical trials (VCT) software [34] to generate low-dose and full-dose image pairs for the training process of these deep networks. In [35] the authors adapted a noise injection technique from digital chest radiography to simulate ultra-low-dose mammography acquisitions and thus trained a CNN for denoising.
When it comes to the deep neural network (DNN) structure and the training step for the restoration process, there are two key components: the network architecture and loss function. The former determines the complexity of the denoising model, while the latter controls the learning process. Thus, the loss function has a direct impact on image quality and it is relatively more important than the network architecture for the task of image restoration [24].
As the restoration of LD mammography is composed of a denoising process, most image translation models can be adapted to this task, such as residual encoder-decoder convolutional neural networks (RED-CNN) [19], U-net [36] and dense networks [37]. Even though the denoising process is part of the restoration process, the main goal of such a task is to map LD images to standard FD. That is where the loss function plays an important role in measuring the similarity between the image pair. Commonly-used loss functions include error visibility methods, such as mean squared error (MSE) and mean absolute error (MAE); structural similarity methods, such as structural similarity index (SSIM) [38] and complex Wavelet SSIM index (CW-SSIM) [39]; information-theoretical methods, such as IFC [40] and VIF [41]; and DNN methods, such as perceptual loss (PL) [42], adversarial loss [43], Fréchet inception distance (FID) [44], and image quality transformer (IQT) [45]. In [46], the authors investigated commonly-used losses for image restoration with neural networks for natural images. In [47], the authors also investigated several image quality assessment methods as loss function for low-level computer vision tasks on natural images.
However, in contrast with natural images, image restoration for medical images, more specifically for mammography, is a meticulous task, where subtle structures such as MC are extremely important and must be preserved in the restoration process. Moreover, it is desired that the noise properties of the restored image match the FD ones, which is also important for radiologists in the clinical routine. This was demonstrated in [48], where the authors proposed a loss function that takes into account the bias and noise variance for the restoration algorithm.
The objective of this work is to investigate different loss functions of the DNN and their impact on the performance of the restoration of LD mammography images. We assess the efficiency of each loss function measuring the signal-to-noise ratio (SNR) and the mean normalized squared error (MNSE), decomposed into residual noise and bias, after the restoration process. To this end, we used a commonly known residual convolutional neural network (ResNet) [49], illustrated on Fig. 1. The reasons why we used this model are two-fold: (1) it has been extensively used for image restoration, and (2) we could avoid any potential bias of a new network for evaluating the impact of loss functions. For the learning process, we build a clinical dataset from retrospective mammography exams, combined with a computational method to simulate LD acquisitions pairs [50, 51]. The adopted method performs noise injection in a variance-stabilizing domain, avoiding assumptions about the unknown noise-free signal. Furthermore, the spatial dependence of the quantum noise, the electronic noise and the noise spatial correlation were taken into account in the simulation, which enabled the generation of accurate clinical image samples for each patient, as if they had been acquired with lower radiation doses. At the end, the assessment of the trained model was performed on real LD mammography acquisitions using a physical anthropomorphic breast phantom.
Fig. 1:
Architecture of the network used in this study.
The main contributions of this paper are summarized as follows.
A comprehensive evaluation of common loss functions specifically to the field of medical imaging with an emphasis on digital mammography.
A new training strategy using 400 LD clinical data from retrospective mammography examinations was proposed. This enables the model to have contact with a diverse dataset, with different tissue structures, densities, signal and noise levels, etc.
Validation of each loss function with an objective metric that can evaluate blurring and noise individually.
The remainder of the paper is organized as follows. Section 2 introduces the theoretical background for the image degradation model and for the restoration model including both the MB and data-based, and the loss functions used in this study. Section 3 presents the datasets used in this work, all the implementation details and the metrics used for the evaluation. In Section 4, the quantitative results and also some regions of interest (ROIs) are shown for all the loss functions, followed by a concluding summary in Section 5.
2. Theoretical Background
The image restoration methods can be approached on several fronts. A common practice, which has been extensively presented in the literature, is to model the x-ray acquisition system and create mathematical formulations to perform the desired restoration. Although this was a common procedure in the past few years, new deep learning techniques have shown a great capacity of learning from the data, through supervised learning. This section presents the background information behind the image degradation model for mammography systems, the basics of the MB approach to restore the LD images and the used model.
2.1. Degradation Model
Considering an observed X-ray mammography image of size w × h at standard FD, following [15], we can model an acquisition as follows:
(1) |
where Y is the noise-free image, and η is the corresponding noise at standard FD. Note that [η]ij represents the element at the i-th row and j-th column of the noise matrix η, and follows a Gaussian distribution with zero mean and variance equal to ; here, λ is the quantum noise gain and is the variance of the electronic noise. Although in x-ray images the noise is often modeled through a Poisson-Gaussian distribution, the energy ranges at which digital mammography operates allow the assumption of a signal-dependent Gaussian distribution, as done in Eq. (1), thanks to the Central Limit Theorem [52]. Now let us consider another raw mammogram Xγ, acquired using the same radiographic factors as X except for a reduction in current-time product (mAs), resulting in lower dose. The mammogram Xγ can be described as a function of the noise-free signal in Eq. (1) as follows:
(2) |
where 0 < γ < 1 is the mAs scaling factor, and thus the dose reduction factor.
The goal of this work is to investigate appropriate lossfunctions capable of training a CNN to achieve Eq. (1) starting from Eq. (2), while keeping the noise-free signal Y as preserved as possible, with minimal blur and smear caused by the restoration process, i.e.:
(3) |
where is the restored image and Ψ(·) is the non-linear restoration operator.
2.2. Restoration Model
From Eqs. (1) and (2), the restoration task may be approached as a mathematical operator, which we will refer to as MB approaches. Alternatively, taking advantage of the great capacity of DNN, it is also possible to train a model to perform the whole restoration process avoiding the estimation of noise parameters. In this section we discuss both the model- and data-based approaches.
2.2.1. Model-based
In [14, 15], our group proposed a MB pipeline to restore LD mammography images, leveraging a variance stabilization technique, namely the generalized Anscombe transformation (GAT) [53]. With this technique, it is possible to use any denoising technique designed to treat signal-independent Gaussian-distributed data. The pipeline involves modeling the equipment’s noise parameters, such as quantum noise gain, electronic noise variance, and the detector offset. These components are used in the GAT to bring the image to the domain where the noise model is signal-independent and approximately Gaussian. In the GAT domain, the block-matching and 3D filtering (BM3D) [54] is used to suppress noise and the exact unbiased inverse of the generalized Anscombe transform is applied [55]. Finally, the denoised image is blended with the LD image through a weighted average. We refer to [15] for more details about the MB restoration process.
2.2.2. Data-based
This subsection presents the deep learning-based denoising model for restoration of LD digital mammography. Despite the high dimensionality of the Euclidean space in medical images, it is known that these images lie in low-dimensional manifolds [17, 22]. As DNNs, given sufficient trainable parameters, can approximate any non-linear transformation functions, one can model the non-linear transformation Ψ, from Eq. (3), using deep neural-networks with the appropriate architecture and loss-function.
Inspired by the success of residual skip connection in various tasks such as image classification [49], image denoising [19, 20], reinforcement learning [56], we used the ResNet to better model the noise distribution of the LD digital mammography.
The network has four residual blocks, each of them has a skip connection and serves as basic units. Each residual block contains four layers: two convolutional layers and two batch-normalization layers [57]. The batch normalization layer has been proved to accelerate deep network training by reducing internal covariate shift. Rectified linear unit (ReLU) activation function is used after batch normalization layer or addition operation.
Specifically, all convolutional layers have 64 convolutional filters of size 3 × 3 with a stride of 1 and a zeropadding of 1 except for the final convolutional layer that has only one convolutional filter. Throughout the network, the feature-maps have the same size as the input image.
2.3. Loss Functions
The loss function plays an important role in training a restoration model and can determine the visual aspect of the generated images. In [46], the authors investigated the effects of several commonly-used losses for image restoration with neural networks. The difference between this study and theirs lies in two aspects. First, we study the effects of different losses for image restoration of LD digital mammography, while [46] focused on natural images. In digital mammography, as opposed to natural images, specific high-frequency and low-contrast features of the exams, such as MC and small masses, are vital for the successful clinical use of the data. Thus, in this scenario, it is preferable to retain some residual noise and keep those subtle features intact instead of aggressively filtering the data at the risk of masking lesions. Second, in addition to those metrics studied in [46], we also studied adversarial loss, the PL [42], which is based on a pretrained VGG model [58] and the FID [44, 59], which is based on the Inception model [60].
Different losses compute the similarity between the generated and GT images in different ways. We denote the generated and GT images as and , respectively.
Mean squared error (MSE).
MSE is the most widely used metric to measure the pixel-wise difference between generated and GT images. It can be formally defined as:
(4) |
where Xij indicates the element at i-th row and j-th column in X.
Mean absolute error (MAE) or ℓ1.
Slightly different from MSE, MAE computes the ℓ1 loss between the generated and GT image. As a result, MAE does not overpenalize larger errors and can overcome the smoothness caused by MSE. It can be defined as:
(5) |
Structural similarity index (SSIM).
SSIM is a widely-used image quality evaluation metric [38]. SSIM can measure the visual similarity between two images in terms of their structures and textures. The SSIM index is computed based on various windows of an image. The measure between the window over and the window x over X, based on a common window size k × k, can be defined as:
(6) |
where and μx are the averages of and x respectively, and σx are the variances of and x respectively, and is the covariance of and x. Also, c1 = 1 × 10−4 and c2 = 9 × 10-4 are two constants, which are used to stabilize the division with a weak denominator. The window size k is 11, as suggested in [38]. The MSSIM between two images and X, , refers to the average of the SSIM index over all windows. The SSIM loss is defined as:
(7) |
Perceptual loss (PL).
PL attempts to compare the similarity between two images in a high-level feature space [42]. A pretrained VGG model is widely used to extract features from an image to form such a high-level feature space, which is expected mimic the human visual system. PL is similar to MSE, but in a feature space instead of pixel space. It can be defined as:
where Φ represents the feature extractor, whose output is a tensor of size w′ × h′ × c′. The PL can be computed on early or later layers of the VGG network. Each layer is commonly denominated as a block of convolutions and activation functions, e.g. ReLU, before the max-pooling. For example, PL1 contains the first two convolution layers and their respective activation function.
Adversarial loss.
Adversarial loss was first presented in [43] to train a model to generate realistic synthetic images. The GAN framework contains two networks: a generator (G) to create the images and a discriminator (D) to evaluate the created image. In this work, we used the Wasserstein GANs with gradient penalty (WGANGP) [61]. The generator loss is given by:
(9) |
and
(10) |
where ℒMAE is the pixel-wise previously stated, λadv is weighting factor, and ℒadv(G) the adversarial loss for the generator. We followed the the discriminator loss is given by the original paper [61].
Fréchet inception distance (FID).
FID was first proposed to measure the image quality of images generated by GANs [59]. The metric is calculated as:
(11) |
where d is the Fréchet distance, Tr(·) is the trace of a matrix, μ is the mean vector, and Σ the covariance matrix. Both μ and Σ are calculated over feature map vectors acquired after a batch of images are forwarded through a Inception-v3 model.
3. Materials & Methods
For reliable training and validation, we created two distinct sets of images: the first with clinical cases and the second with anthropomorphic breast phantom images. We used a hybrid dataset of clinical images with simulated LD/FD pairs and later restricted testing the trained model in the phantom data, with LD acquisitions acquired directly in the mammography equipment. Doing so, we avoid the so-called “inverse crime”, which is to test and train the model with the same fabricated synthetic data [46]. In this section, we specify how both datasets used in this study were constructed.
Also, in this section we present the DNN implementation details and the evaluation metrics used in this study.
3.1. Training Dataset
The training dataset consists of 400 clinical mammography acquired at the Barretos Cancer Hospital (Brazil). These data were obtained retrospectively from breast cancer screening examinations, after approval by the institutional review board. This dataset is relative to 100 patients with their respective images from the craniocaudal (CC) and mediolateral oblique (MLO) views of the left and right breast. All images was acquired using a Hologic Selenia Dimensions Mammography System (Hologic, Bedford, MA) and all the images were saved as raw data, i.e., DICOM “for processing”. Fig. 2 shows the occurrence of different values of kVp, mAs, and breast thickness in the training dataset. All clinical images were fully anonymized to preserve patients’ medical records.
Fig. 2:
Histograms showing the variability of (a) kVp, (b) mAs and (c) breast thickness in the clinical images from the training dataset.
As previously discussed, exposing patients to x-ray radiation several times to build an image dataset with LD acquisitions can be dangerous and impractical in the clinical routine. In order to acquire clinical images at lower doses, we used a previous work to simulate dose reduction in these data [50, 51]. The method injects quantum and electronic noise in the VST domain and also accounts for the detector crosstalk. We refer to both these works for further details about this technique. We applied this technique to all clinical images to simulate acquisitions of 75% and 50% of the standard FD, in which the image was originally obtained (γ = 0.75 and γ = 0.50, respectively). After the simulation, we reached a total of 1,200 images among full and reduced doses.
3.2. Testing Dataset
To validate the restoration methods that were trained on the clinical dataset, we acquired images of a physical anthropomorphic breast phantom at the Hospital of the University of Pennsylvania. We also used a Hologic Selenia Dimensions Mammography System, as for the training dataset. The phantom has six slabs, with total thickness of 51 mm. It consists of a material that mimics the breast tissue and was prototyped by CIRS, Inc. (Reston, VA), under the license from the University of Pennsylvania [62]. Small pieces of calcium oxalate (99%, Alfa Aesar, Ward Hill, MA) were placed between the six slabs to simulate MCs, as illustrated in Fig. 3.
Fig. 3:
Image of the physical anthropomorphic breast phantom used in this study. The red arrow points to the pieces of calcium oxalate that were placed between the slabs to simulate a cluster of microcalcifications.
We acquired a total of 25 images of the anthropomorphic breast phantom. Firstly, for the FD, the automatic exposure control (AEC) was used to select the standard radiographic factors, which yielded a combination of 29 kVp and 160 mAs. Without physically moving the phantom, the system was set to manual mode and 15 exposures at the standard radiographic factors were performed to generate a set of FD images. For reduction factors of 75% and 50%, 10 images were acquired by reducing the current-time product from 160 mAs to 120 mAs and 80 mAs respectively. It is important to note that all these images were saved as raw data.
3.3. Figures of Merit
To take advantage of being capable of exposing the physical anthropomorphic breast phantom several times, we generated a pseudo-GT and compared all the restored images with this GT through an assessment of the signal-to-noise ratio (SNR) and the mean normalized squared error (MNSE) decomposed into residual noise (𝒩) and bias squared (2). This metric was previously presented in [15] for the assessment of digital breast tomosynthesis images.
We first measured the signal-to-noise ratio (SNR) in all phantom images and its restorations as the ratio of the mean pixel value and its standard deviation along the five realizations. The metric was calculated only inside the breast region and an average filter of size 15 × 15 was used to smooth both the mean signal value and the standard deviation after their calculation.
Considering a subspace of FD mammography acquisitions, suppose we have a set of N realizations X* ∈ 𝒳 for the the GT estimation, and a set of P realizations X′ ∈ 𝒳 for MNSE assessment. Here we refer to the GT as the expectation of the acquisitions on this subspace . Also, after breast phantom segmentation, we denote (i, j) as a pair of 2D indices running inside a set of size I < w × h. The set ℐ represents the collection of pixel coordinates after the segmentation. The MNSE can be calculated as:
(12) |
where ϕ1 is accounting for the error associated with the limited number of images used for the GT estimation and is a point-wise variance among this set of images. It is possible to decompose the MNSE into 𝒩 and 2 portions, such that:
(13) |
For the practical evaluation, we first manually segmented the phantom image to avoid calculating the metric outside the breast tissue. From the 15 images at the FD, 10 were used to generate the GT (N = 10) and 5 to calculate the metric (P = 5). As previously mentioned, 5 images of each reduced dose (Pγ = 5) were used to calculate the metric as well. The scheme shown in Fig. 4 illustrates how we separated all the phantom images. The MNSE and its decomposition were evaluated on all different colored sub-groups of LD and restored images.
Fig. 4:
Illustration of how the anthropomorphic breast phantom dataset was subdivided for the assessment of the MNSE and its decomposition. The metrics were evaluated on all different colored sub-groups of FD, LD and restored images.
To avoid error on the 2 due to differences in the mean value from different acquisitions, we used a technique of fitting a first-order polynomial to correct the image mean value. First, the GT was calculated following the previous equations, and then all the images used to generate the GT itself were adjusted based on the calculated GT, as done in [15]. After this correction, the GT was calculated again and all the non-GT FD acquisitions were also adjusted through the fitting technique. All the restored images and the LD acquisitions had their mean value adjusted through this method. This overall process guarantees that the 2 error measurement is due to the smearing/blurring imposed by the restoration process and not small changes in the mean value of the images.
We also compared the time spent to run the restoration process on the model based approach and in the deep network.
3.4. Implementation Details
Overall, we trained 22 models in this study: 11 dedicated to restore 75%-dose images and 11 dedicated to 50%-dose. Each of these 11 models refer to the mentioned loss functions from Section 2.3. We also performed 3-fold cross-validation for each loss, leading to a total of 66models. Although there is a range of different loss functions for image restoration, as shown in [47], we used in this work examples of error visibility methods, e.g., the MAE, structural similarity method, e.g., the SSIM, and DNN method, e.g., the PL, FID, and adversarial loss. These losses are commonly used by previous work in the field of medical imaging restoration [19, 21–26]. For the PL, following [42], we used VGG-16 network [58], which was pretrained on the ImageNet dataset and publicly available from PyTorch official website. Since different layers of the VGG-16 networks can form different feature space, we used the feature-maps right before the first four max-pooling layers to form four different feature spaces, and then obtain four corresponding PLs from shallow to deep. We denote these four PLs as ℒPL1, ℒPL2, ℒPL3, and ℒPL4. We found that too many max-pooling layers involved in PL largely affect pixel-wise comparison, therefore, we removed all max-pooling layers in PL4. For the FID loss, we used a pre-trained Inception v3 model and followed the work of Mathiasen and Hvilshøj [44] for implementing the algorithm2. In the adversarial loss, we use three candidate values for λadv to illustrate the different impacts of this parameter. We used the ResNet as architecture for the generator and the same architecture described in the original paper for the discriminator (also called critic).
For all neural networks with different losses, the initial learning rate λ was set as 1.0 × 10−4 and was reduced half by every 10 epochs. The trainable parameters in the network was optimized using the Adam optimization [63], whose coefficients used for computing running averages of gradients and its square were set as 0.5 and 0.999 , respectively. The network was implemented with PyTorch DL library [64] and trained within 60 epochs using a NVIDIA GeForce GTX 1080 Ti GPU. The batch size was set as 256 during the training; however, it was reduced accordingly for the training with PL since VGG network also needs to be in the GPU. To make the training for non-pixel-wise losses easier, we used a pre-trained MAE network as the starting point.
To train the restoration model, a total of 256,000 patches of size 64×64 were randomly selected from the breast regions of the 400 clinical images available in this study. These patches include pairs of LD and FD images, i.e., we trained one neural network for each reduction factor.
After the training process, which is illustrated in Fig. 5, the model was quantitatively tested on the anthropomorphic breast phantom dataset to validate the effects of different loss functions. The testing stage, illustrated in Fig. 6, involved measuring the MNSE between the GT against the LD, FD and restored images. The value from the LD×GT gives us the starting point of 𝒩 and 2. At this stage, 𝒩 is usually high and 2 low. The FD×GT guide us to the goal of the restoration regarding the lowest 2 and the desired 𝒩 level. When evaluating the Restored×GT, it is possible to see how the restoration performed, i.e., how much blurring was imposed, measuring through the 2, and how close the 𝒩 is to the FD value. Implementation details of how this metric was calculated on each of these groups are presented in Section 3.3.
Fig. 5:
Schematic explaining the training process of the models.
Fig. 6:
Schematic explaining the test procedure for the models.
4. Results & Discussions
In this section, we present and discuss the quantitative assessment performed on the phantom images and also the ROIs for both clinical and phantom images. We compared the LD and FD acquisitions to the images restored by the model using the following loss functions: MSE, MAE, SSIM, PL1, PL2, PL3, PL4, GAN, and FID. Also, for comparison with an analytical method previously published in the literature, we used the model-based (MB) approach proposed in [15] as a benchmark for all different loss functions.
4.1. Visual Analysis
Figs. 7 and 8 display a magnified ROI of a clinical image acquired at the standard radiation dose and at the dose reduction factors of 75% and 50%, respectively. For all visual analysis results, we show the results of each loss function that achieved the closest 𝒩 to the FD within all training realizations. As expected, the LD mammography presents more perceived noise as compared with the FD mammography. The restored results of LD mammography are shown in (c)-(j). We chose an ROI with a MC cluster, which is an important feature for cancer diagnosis. As expected, for the reduction factor of 50% (Fig. 8), we can see that the loss functions MSE and MAE yield an overall smoothing in the image. For the SSIM, there is a subtle difference in the noise level when compared with MAE and MSE. Again, when we go from PL1 to PL4, it is visually noticeable that the blurring effect decreases and the fine details are preserved. Moreover, we can recognize PL3 has slightly more noise compared with PL4, which can be confirmed in the objective metrics, specifically in the 𝒩 in the MNSE decomposition. We can see that FID also achieves good results with more overall noise compared with PL3 and PL4. This is also observed in the objective metrics as expected since this loss uses a pre-trained network to extract image features. The adversarial losses could also restore the image with less noise compared with the losses that use DNN as image quality metric. It is important to note the visual similarity between the DNN with FID, PL3 and PL4 loss functions and the MB method. All the visual analysis discussed agree with the quantitative results presented in the next section. The same discussion can be can be done for the 75% case, in Fig. 7; however, the differences are more subtle when compared with the 50%.
Fig. 7:
Illustration of a magnified ROI of a clinical image focusing a small cluster of MC. (a) LD acquisition for a dose reduction factor of 75%; (b) FD acquisition; restored images generated by the neural network with the loss function: (c) MSE, (d) MAE, (e) SSIM, (f)-(i) PL1 to PL4, respectively, (j)-(l) adversarial loss with different λadv and (m) FID. (n) model-based restoration method. All the images were normalized based on the FD and are displayed in the same dynamic range.
Fig. 8:
Illustration of a magnified ROI of a clinical image focusing a small cluster of MC. (a) LD acquisition for a dose reduction factor of 50%; (b) FD acquisition; restored images generated by the neural network with the loss function: (c) MSE, (d) MAE, (e) SSIM, (f)-(i) PL1 to PL4, respectively, (j)-(l) adversarial loss with different λadv and (m) FID. (n) model-based restoration method. All the images were normalized based on the FD and are displayed in the same dynamic range.
Figs. 9 and 10 display a magnified ROI from the mammography image of the anthropomorphic breast phantom at dose reduction factors of 75% and 50%, respectively. In this case, it is important to note that the radiation dose reduction was performed on the equipment itself, changing the mAs with each acquisition. The same discussion previously mentioned for the clinical images can be used here as the images have the same visual properties. From the results of the anthropomorphic phantom we can infer that the neural network is generalizing well even for cases with slightly different noise properties, i.e., even with a careful simulation as used in this work some discrepancies are expected between the simulated LD image and the actual LD image; and the trained model was able to overcome such small discrepancies. Also, it reinforces the fact that the model might be tested in real cases where the mammography is acquired at a reduced dose, even though it was trained with the simulated injection of noise, since anthropomorphic phantoms are designed to match clinical images as close as possible [62].
Fig. 9:
Illustration of a magnified ROI of an anthropomorphic breast phantom image focusing a simulated cluster of MC. (a) LD acquisition for a dose reduction factor of 75%; (b) FD acquisition; restored images generated by the neural network with the loss function: (c) MSE, (d) MAE, (e) SSIM, (f)-(i) PL1 to PL4, respectively, (j)-(l) adversarial loss with different λadv, (m) FID and (n) model-based (MB) restoration method. All the images were normalized based on the FD and are displayed in the same dynamic range
Fig. 10:
Illustration of a magnified ROI of an anthropomorphic breast phantom image focusing a simulated cluster of MC. (a) LD acquisition for a dose reduction factor of 50%; (b) FD acquisition; restored images generated by the neural network with the loss function: (c) MSE, (d) MAE, (e) SSIM, (f)-(i) PL1 to PL4, respectively, (j)-(l) adversarial loss with different λadv, (m) FID and (n) model-based (MB) restoration method. All the images were normalized based on the FD and are displayed in the same dynamic range.
4.2. Quantitative Evaluation
Figs. 11 and 12 illustrate the SNR map inside the breast region of the anthropomorphic breast phantom image. Through a visual inspection, we can see that all the restorations were able to increase the value from the LD. Also, the maps indicate that the restorations with MSE, MAE, SSIM, and adversarial loss achieved greatest values throughout the breast. Table 1 presents the mean SNR value, demonstrating the aforementioned statements. The observance of higher SNR values for these loss functions is explained as they remove more noise compared with the other restorations, thus achieving a lower standard deviation.
Fig. 11:
Illustration of the SNR map of the anthropomorphic breast phantom images. (a) LD acquisition for a dose reduction factor of 75%; (b) FD acquisition; restored images generated by the neural network with the loss function: (c) MSE, (d) MAE, (e) SSIM, (f)-(i) PL1 to PL4, respectively, (j)-(l) adversarial loss with different λadv, (m) FID and (n) model-based (MB) restoration method. All the maps were adjusted and clipped in the range of 47-120, based on the SNR of the FD image.
Fig. 12:
Illustration of the SNR map of the anthropomorphic breast phantom images. (a) LD acquisition for a dose reduction factor of 50%; (b) FD acquisition; restored images generated by the neural network with the loss function: (c) MSE, (d) MAE, (e) SSIM, (f)-(i) PL1 to PL4, respectively, (j)-(l) adversarial loss with different λadv, (m) FID and (n) model-based (MB) restoration method. All the maps were adjusted and clipped in the range of 47-120, based on the SNR of the FD image.
Table 1:
Mean SNR values for the anthropomorphic breast phantom images acquired at the LD (for a dose reduction factors of 75% and 50%) and FD. Results for the restored images generated by neural network with the loss function: MSE, MAE, SSIM, PL1 to PL4, adversarial loss with different λadv, FID and for the model-based (MB) restoration method.
75% | 50% | |
---|---|---|
LD | 69.18 | 56.41 |
| ||
DL-ℒMSE | 84.84±0.74 | 94.78±1.47 |
DL-ℒMAE | 85.05±0.48 | 95.44±0.46 |
DL-ℒSSIM | 84.67±0.24 | 95.51±1.09 |
DL-ℒPL1 | 83.90±0.19 | 93.08±0.63 |
DL-ℒPL2 | 81.56±0.22 | 86.07±0.29 |
DL-ℒPL3 | 76.97±0.10 | 77.73±0.38 |
DL-ℒPL4 | 79.85±0.11 | 81.45±0.37 |
DL-ℒGAN–0.1 | 84.72±0.13 | 96.39±0.61 |
DL-ℒGAN–0.9 | 84.91±0.14 | 95.62±0.35 |
DL-ℒGAN–1.5 | 85.00±0.38 | 95.61±0.72 |
DL-ℒFID | 77.22±0.66 | 74.94±0.20 |
MB | 78.39 | 77.53 |
| ||
FD | 78.11 |
From Table 1 it is observable that pixel-wise losses and the adversarial loss yielded the highest SNR, even higher than the SNR of the FD image, thus indicating the best image quality. However, a quick inspection of Fig. 10(d) shows that these loss functions result in some signal smearing/blurring, especially compared to the FD image in Fig. 10(a). This emphasizes the need for a metric sensitive to signal smoothing and residual noise separately. To that end, we adopted the decomposition of the NMSE into 2 and 𝒩
As our primary goal is to restore the LD images to achieve the quality of the standard FD images, we desire a resulting image which has the overall characteristics of the standard FD. Thus, we seek a restoration method that yields an image with similar 𝒩 compared with the FD and as low 2 error as possible. This intuition comes from the fact that our goal is to generate restored images that have similar noise properties to the FD images, and also that we want to keep the underlying signal characteristics as close as the original image, as radiologists tend to dislike overly smoothed images.
To this end, the total MNSE was measured against the pseudo-GT and decomposed into 2 and 𝒩 for the FD and for both the radiation dose reduction factors, considering all different loss functions.
Here, we slightly modified how we calculated the MNSE, from Eq. (12), so that it could take into account the three folds in the cross-validation and generate enough samples for statistical testing. Instead of computing the mean over the pixels as the first step, as shown in Eq. (12), now we calculate the normalized quadratic error (NQE), compute the pixel-wise average over the three folds, and calculated the mean over the pixels as:
(14) |
For paired t-tests, we obtained the samples after the average over K folds. Note that we performed this step for the MNSE and its decomposition. Tables 2 and 3 present the mean values of MNSE results for the dose reduction factors of 75% and 50%, respectively.
Table 2:
Quantitative analysis of the total MNSE, decomposed into 𝒩 and 2, for the anthropomorphic breast phantom images acquired at the LD (at a dose reduction factor of 75%) and at the FD. Also, the results for the restored images generated by neural network with the loss function: MSE, MAE, SSIM, PL1 to PL4, adversarial loss with different λadv, FID and for the model-based (MB) restoration method. Confidence interval is shown for p-value =0.05.
Total MNSE(%) | 𝒩(%) | 2(%) | |
---|---|---|---|
LD | 14.06 [14.04, 14.08] | 13.95 [13.93, 13.97] | 0.11 [0.10, 0.12] |
FD | 10.47 [10.46, 10.49] | 10.40 [10.38, 10.41] | 0.07 [0.07, 0.08] |
DL-ℒMSE | 9.46 [9.45, 9.48] | 9.13 [9.11, 9.14] | 0.33 [0.32, 0.34] |
DL-ℒMAE | 9.30 [9.29, 9.31] | 9.09 [9.07, 9.10] | 0.21 [0.20, 0.22] |
DL-ℒSSIM | 9.35 [9.33, 9.36] | 9.16 [9.15, 9.17] | 0.19 [0.18, 0.20] |
DL-ℒPL1 | 9.54 [9.52, 9.55] | 9.32 [9.31, 9.34] | 0.21 [0.20, 0.22] |
DL-ℒPL2 | 10.10 [10.08, 10.11] | 9.89 [9.87, 9.90] | 0.21 [0.20, 0.22] |
DL-ℒPL3 | 11.45 [11.43, 11.46] | 11.23 [11.21, 11.24] | 0.22 [0.21, 0.23] |
DL-ℒPL4 | 10.56 [10.55, 10.58] | 10.33 [10.32, 10.35] | 0.23 [0.22, 0.24] |
DL-ℒGAN–0.1 | 9.35 [9.34, 9.37] | 9.16 [9.15, 9.18] | 0.19 [0.18, 0.20] |
DL-ℒGAN–0.9 | 9.35 [9.34, 9.37] | 9.11 [9.10, 9.13] | 0.24 [0.23, 0.25] |
DL-ℒGAN–1.5 | 9.32 [9.31, 9.34] | 9.09 [9.08, 9.10] | 0.23 [0.22, 0.24] |
DL-ℒFID | 11.10 [11.09, 11.12] | 10.84 [10.83, 10.86] | 0.26 [0.25, 0.27] |
MB | 10.93 [10.92, 10.95] | 10.81 [10.80, 10.83] | 0.12 [0.11, 0.13] |
Table 3:
Quantitative analysis of the total MNSE, decomposed into 𝒩 and 2, for the anthropomorphic breast phantom images acquired at the LD (at a dose reduction factor of 50%) and at the FD. Also, the results for the restored images generated by neural network with the loss function: MSE, MAE, SSIM, PL1 to PL4, adversarial loss with different λadv, FID and for the model-based (MB) restoration method Confidence interval is shown for p-value=0.05.
Total MNSE(%) | 𝒩(%) | 2(%) | |
---|---|---|---|
LD | 21.79 [21.76, 21.82] | 21.64 [21.61, 21.67] | 0.15 [0.13, 0.16] |
FD | 10.47 [10.46, 10.49] | 10.40 [10.38, 10.41] | 0.07 [0.07, 0.08] |
DL-ℒMSE | 8.27 [8.25, 8.29] | 7.54 [7.53, 7.55] | 0.73 [0.71, 0.75] |
DL-ℒMAE | 8.16 [8.13, 8.18] | 7.48 [7.46, 7.49] | 0.68 [0.66, 0.70] |
DL-ℒSSIM | 8.16 [8.14, 8.18] | 7.49 [7.48, 7.50] | 0.67 [0.65, 0.69] |
DL-ℒPL1 | 8.58 [8.56, 8.60] | 7.93 [7.92, 7.95] | 0.65 [0.63, 0.67] |
DL-ℒPL2 | 9.79 [9.77, 9.81] | 9.16 [9.14, 9.17] | 0.63 [0.62, 0.65] |
DL-ℒPL3 | 11.73 [11.71, 11.75] | 11.31 [11.30, 11.33] | 0.42 [0.40, 0.43] |
DL-ℒPL4 | 10.94 [10.92, 10.96] | 10.39 [10.37, 10.40] | 0.55 [0.54, 0.57] |
DL-ℒGAN–0.1 | 8.05 [8.03, 8.07] | 7.38 [7.37, 7.39] | 0.67 [0.65, 0.68] |
DL-ℒGAN–0.9 | 8.20 [8.18, 8.22] | 7.50 [7.49, 7.51] | 0.70 [0.69, 0.72] |
DL-ℒGAN–1.5 | 8.27 [8.25, 8.28] | 7.50 [7.49, 7.51] | 0.77 [0.76, 0.78] |
DL-ℒFID | 12.28 [12.26, 12.30] | 11.77 [11.75, 11.79] | 0.51 [0.50, 0.52] |
MB | 11.88 [11.86, 11.89] | 11.60 [11.59, 11.62] | 0.28 [0.26, 0.29] |
Changing loss functions directly affects the behavior of the deep network in terms of signal preservation and noise suppression. As we can see in Tables 2 and 3 , the MSE loss function tends to decrease the 𝒩 values lower than the standard FD (goal), at the cost of excessively blurring the image, thus increasing the 2 error. Although this loss function is extensively used in most applications, this blurring behavior is well known in the literature [46]. The MAE loss function (ℓ1), when compared to the MSE, leads to more image details preservation, observed by lower 2 values but has a strong denoising effect, noted in the lower 𝒩 values. For the SSIM loss function case, the model performed better compared to the MSE and MAE, reporting sightly lower 2 at similar 𝒩.
The PL function brings an interesting case. There is a tendency to increasing image detail preservation coming from the PL1 to the PL4, where the 𝒩 slightly increases whereas the 2 decreases. This behavior is explained by the fact that the deeper neural network layers are responsible for general image characteristics, whereas the initial layers are responsible for local fine details of the image, i.e., primitive information like edges [65–67]. When looking at the essential properties of the images, in the case of deep layers, the network tends to penalize errors on the underlying signal which causes a decrease in 2. The opposite happened with the initial layers, where they are trying to match the fine details, thus performing a more aggressive local denoising, decreasing the 𝒩 and increasing the 2 error. The previous discussion brings a consequence that the deeper in the network the loss function analysis is done, less aggressive the denoising and more image details are preserved overall.
FID achieved results close to PL3 and PL4, with higher 2 in the 75% restoration case. The adversarial loss yields results very similar to the pixel-wise loss functions, i.e., MSE, MAE and SSIM. This is expected since this loss is a combination of an image fidelity loss with an image distribution loss. Looking at the λadv parameters, increasing its value also increases the mean and standard deviation of 2. This behavior can be explained by the total loss being more weighed to the adversarial loss, which look at the image as an overall. An extreme case, where just the adversarial loss is used, i.e., λadv = ∞ the CNN is very hard to train and could lead to results where the network hallucinate some objects, as know in generative models [68]. Similar behavior was expected with the PL and FID, since they all use DNN to extract features and optimize the distance of them. However, both FID and PL use pre-trained networks that were trained on very large dataset with natural images. Although pre-trained networks are very good feature extractors, the domain shift from nature images to medical images may cause negative impacts. Therefore, it is suggested to carefully check the quality of restored images in clinical routine.
It is important to note the great similarity that the neural networks with PL4 and PL3 have with the FD in terms of 𝒩 both for restorations of 75% and 50%. We highlighted in bold the closest 𝒩 to the FD and the lowest 2. Although the 2 is higher for the deep network compared to the mathematical model, the data-based approach benefits from the fact that it does not need to know any previous information about the equipment and its physics acquisition process.
Here, we also examine whether any two DL losses have MSE statistically significant differences. The null hypothesis is that the two losses have identical average. We perform the paired t-test for any two different DL losses and provided the p-values on MNSE and its decomposition in Figs. 13 and 14 for 75% and 50% dose levels respectively. In Fig. 13, when looking at the decomposition of MNSE, PL4 𝒩 is statistically different for all other losses and has the closest value to the FD. However, it is still statistically different from the FD(p-values<0.01). For 2, there is no statistical difference between GAN-01 and SSIM and they have the lowest values within all losses. We also notice that many losses have statistically the same 2. It is interesting to note that even though losses are equal for some metric, e.g., 𝒩, they are not for the other one like 2, suggesting the importance of the decomposition of this metric. The only exception is GAN-01 with SSIM which is equal for all metrics. In Fig. 14, PL4 𝒩 is again statistically different for all other losses and has the closest value to the FD. Now, there is no statistical difference to the FD (p-values>0.31). For 2, PL3 is statistically different for all other losses and has the lowest bias among them.
Fig. 13:
p-value results among all DL losses, for 50% dose reduction, by paired t-test in terms of (a) MNSE, (b) 𝒩, and (c) 2.
Fig. 14:
p-value results among all DL losses, for 50% dose reduction, by paired t-test in terms of (a) MNSE, (b) 𝒩, and (c) 2.
Finally, Table 4 demonstrates the average time spent by the neural network and also by the MB to restore a single raw clinical image of size 4096×3328. Although the deep network takes a very long time for training, for example, roughly 7 hours with MSE and 32 hours with PL4 (respectively the minimum and maximum training time for all loss functions), the restoration process has a processing time of the same order of magnitude as the MB method. Note that we do not intend to compare processing times, as the DL runs on a GPU under Python language while the MB runs on a CPU using a MATLAB code. However, both methods have room for code optimization as fast processing time is especially important for clinical use.
Table 4:
Average processing time to run a full restoration on a single raw clinical mammography.
Method | Time (s) |
---|---|
DL-based | 9.0 |
MB | 16.5 |
5. Conclusion
In this work, we investigated the impact of various loss functions on the quality of LD mammograms restored by deep networks. We used a standard CNN architecture to evaluate such loss functions.
In terms of loss functions, the MSE and MAE had strong denoising properties yielding excessive smoothness in the restored image, while the PLs functions preserved images details as we go deeply in the VGG-16 network, i.e., PL3 and PL4 function preserved more details compared to the PL1. This behavior was observed both in the quantitative results and also in the visual analysis. Furthermore, it is possible to note with both quantitative results and visually the similarity between PL3, PL4 functions and the MB method. The adversarial loss yielded results close to pixel-wise losses, since this function contains one of these losses. It is worth to explore more combinations of λadv to achieve desired properties and also different training strategies, since GANs are known to be hard to train and susceptible to mode collapse. FID as a loss was demonstrated to be a good alternative as well, achieving visual and objective results close to PLs, MB and FD images.
The fact that we used a physical anthropomorphic breast phantom to validate the proposed methodology reinforces that the neural network is able to restore real LD mammography images. This also implies that the model did not overfit on the training dataset and it is generalizing well for other images with slightly different noise properties.
With this work, we showed the potential of DNNs for digital mammography image restoration and evaluated some well-known loss functions, presenting the strength and weaknesses of each one so we may choose which one is appropriated for each task. Also, with the new training strategy proposed, it is possible to use clinical images and the networks can learn and benefits from a great variability of data and their radiographic factors, as illustrated in Fig. 2.
The limitations of this work are presented as follows. First, since we focused on the comparison among different loss functions, the comparison of different network architectures for the restoration was not considered in this paper. As argued in [24], loss function is relatively more important than network architecture as the loss function has a direct impact on the image quality of the restored images. Second, we did not consider the a combination of different loss functions in this paper, besides the ones used for GAN. Although the combination of several loss functions could improve the results, it will bring extra balancing hyper-parameters to be carefully tuned and exponentially more combinations. We can see this difficulty on choosing λadv values for GAN. Third, the model was not tested on real clinical images as there will be no GT to evaluate the performance of different loss functions. Forth, to achieve a meaningful significant test, we slightly modified the order of mean operations in the MNSE equation so we could get many samples in terms of pixels. In future, one should perform cross-validation with more folds to further analyze the significant tests for each dose level if computational resources permit. Finally, we restricted the training to the dataset representing a local and specific woman population. For future works, real LD mammography images should be tested through the model and cancer detectability evaluations should be performed with radiologists to analyze the relevance of the proposed method in the clinical routine. It is important to test the model with other datasets to see if the model is generalizing well for other populations as the dataset used in the current study is limited to certain characteristics of some population. Also, it is important to evaluate different network architecture and measure the impact of them on the restoration performance.
A comprehensive evaluation of common loss functions specifically to the field of medical imaging with an emphasis on digital mammography;
A new training strategy using clinical data from retrospective mammography examinations was proposed. This enables the model to have contact with a diverse dataset, with different tissue structures, densities, signal and noise levels;
Validation of each loss function with an objective metric that can evaluate blurring and noise individually.
Acknowledgments
This work was supported in part by National Natural Science Foundation of China (62101136), the São Paulo Research Foundation (FAPESP grant 2021/12673-6 and 2018/19888-5), the National Council for Scientific and Technological Development (CNPq), the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES finance code 001), and National Institute of Health (R01EB026646, R01CA233888, R01CA237267, and R01HL151561).
The authors would like to thank Dr. Andrew D. A. Maidment, from the University of Pennsylvania, for making the anthropomorphic breast phantom images available for this work. The authors also would like to thank the Barretos Cancer Hospital, in particular the medical physicist Renato F. Caron, for providing the clinical images.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Available at www.github.com/AlexanderMath/FastFID
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- [1].Saadatmand S, Bretveld R, Siesling S, Tilanus-Linthorst MM, Influence of tumour stage at breast cancer detection on survival in modern times: population based study in 173 797 patients, BMJ 351 (2015) h4901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].WHO, World Health Organization - Breast cancer, https://www.who.int/cancer/prevention/diagnosis-screening/breast-cancer, accessed: 2020-08-07 (2020). [Google Scholar]
- [3].Michell M, Batohi B, Role of tomosynthesis in breast imaging going forward, Clinical Radiology 73 (4) (2018) 358–371. [DOI] [PubMed] [Google Scholar]
- [4].Vedantham S, Karellas A, Vijayaraghavan GR, Kopans DB, Digital breast tomosynthesis: state of the art, Radiology 277 (3) (2015) 663–684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].IAEA, Radiation Protection and Safety in Medical Uses of Ionizing Radiation, no. SSG-46 in Specific Safety Guides, International Atomic Energy Agency, Vienna, 2018. [Google Scholar]
- [6].Haus AG, Yaffe MJ, Screen-film and digital mammography: image quality and radiation dose considerations, Radiologic Clinics of North America 38 (4) (2000) 871–898. [DOI] [PubMed] [Google Scholar]
- [7].Huda W, Sajewicz AM, Ogden KM, Dance DR, Experimental investigation of the dose and image quality characteristics of a digital mammography imaging system, Medical Physics 30 (3) (2003) 442–448. [DOI] [PubMed] [Google Scholar]
- [8].Saunders RS Jr, Baker JA, Delong DM, Johnson JP, Samei E, Does image quality matter? Impact of resolution and noise on mammographic task performance, Medical Physics 34 (10) (2007) 3971–3981 [DOI] [PubMed] [Google Scholar]
- [9].Chan H-P, Helvie MA, Klein KA, McLaughlin C, Neal CH, Oudsema R, Rahman WT, Roubidoux MA, Hadjiiski LM, Zhou C, et al. , Effect of dose level on radiologists’ detection of microcalcifications in digital breast tomosynthesis: An observer study with breast phantoms, Academic Radiology (2020). [DOI] [PubMed] [Google Scholar]
- [10].Kalra MK, Maher MM, Sahani DV, Blake MA, Hahn PF, Avinash GB, Toth TL, Halpern E, Saini S, Low-dose CT of the abdomen: evaluation of image improvement with use of noise reduction filters-pilot study, Radiology 228 (1) (2003) 251–256. [DOI] [PubMed] [Google Scholar]
- [11].Manduca A, Yu L, Trzasko JD, Khaylova N, Kofler JM, McCollough CM, Fletcher JG, Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT, Medical Physics 36 (11) (2009) 4911–4919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Li Z, Yu L, Trzasko JD, Lake DS, Blezek DJ, Fletcher JG, McCollough CH, Manduca A, Adaptive nonlocal means filtering based on local noise level for CT denoising, Medical Physics 41 (1) (2014) 011908. [DOI] [PubMed] [Google Scholar]
- [13].Wu G, Mainprize JG, Yaffe MJ, Dose reduction for digital breast tomosynthesis by patch-based denoising in reconstruction, in: International Workshop on Digital Mammography, Springer, 2012, pp. 721–728. [Google Scholar]
- [14].Borges LR, Bakic PR, Foi A, Maidment AD, Vieira MA, Pipeline for effective denoising of digital mammography and digital breast tomosynthesis, in: Medical Imaging 2017: Physics of Medical Imaging, Vol. 10132, International Society for Optics and Photonics, 2017, p. 1013206. [Google Scholar]
- [15].Borges LR, Azzari L, Bakic PR, Maidment AD, Vieira MA, Foi A, Restoration of low-dose digital breast tomosynthesis, Measurement Science and Technology 29 (6) (2018) 064003 [Google Scholar]
- [16].Borges LR, Caron RF, Azevedo-Marques PM, Vieira MA, Effect of denoising on the localization of microcalcification clusters in digital mammography, in: 15th International Workshop on Breast Imaging (IWBI2020), Vol. 11513, International Society for Optics and Photonics, 2020, p. 115130K. [Google Scholar]
- [17].Wu D, Kim K, El Fakhri G, Li Q, Iterative low-dose CT reconstruction with priors trained by artificial neural network, IEEE Transactions on Medical Imaging 36 (12) (2017) 2479–2486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Kang E, Min J, Ye JC, A deep convolutional neural network using directional wavelets for low-dose x-ray CT reconstruction, Medical Physics 44 (10) (2017) e360–e375. [DOI] [PubMed] [Google Scholar]
- [19].Chen H, Zhang Y, Kalra MK, Lin F, Chen Y, Liao P, Zhou J, Wang G, Low-dose CT with a residual encoder-decoder convolutional neural network, IEEE Transactions on Medical Imaging 36 (12) (2017) 2524–2535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Wolterink JM, Leiner T, Viergever MA, Išgum I, Generative adversarial networks for noise reduction in low-dose CT, IEEE Transactions on Medical Imaging 36 (12) (2017) 2536–2545. [DOI] [PubMed] [Google Scholar]
- [21].Chen H, Zhang Y, Zhang W, Liao P, Li K, Zhou J, Wang G, Low-dose CT via convolutional neural network, Biomedical Optics Express 8 (2) (2017) 679–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Yang Q, Yan P, Zhang Y, Yu H, Shi Y, Mou X, Kalra MK, Zhang Y, Sun L, Wang G, Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss, IEEE Transactions on Medical Imaging 37 (6) (2018) 1348–1357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Kang E, Chang W, Yoo J, Ye JC, Deep convolutional framelet denosing for low-dose CT via wavelet residual network, IEEE Transactions on Medical Imaging 37 (6) (2018) 1358–1369. [DOI] [PubMed] [Google Scholar]
- [24].Shan H, Zhang Y, Yang Q, Kruger U, Kalra MK, Sun L Cong W, Wang G, 3-D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2-D trained network, IEEE Transactions on Medical Imaging 37 (6) (2018) 1522–1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Yang X, De Andrade V, Scullin W, Dyer EL, Kasthuri N De Carlo F, Gürsoy D, Low-dose x-ray tomography through a deep convolutional neural network, Scientific Reports 8 (1) (2018) 1–13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Shan H, Padole A, Homayounieh F, Kruger U, Khera RD Nitiwarangkul C, Kalra MK, Wang G, Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose CT image reconstruction, Nature Machine Intelligence 1 (6) (2019) 269–276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Yin X, Zhao Q, Liu J, Yang W, Yang J, Quan G, Chen Y Shu H, Luo L, Coatrieux J-L, Domain progressive 3D residual convolution network to improve low-dose CT imaging, IEEE Transactions on Medical Imaging 38 (12) (2019) 2903–2913. [DOI] [PubMed] [Google Scholar]
- [28].Sun C, Shrivastava A, Singh S, Gupta A, Revisiting unreasonable effectiveness of data in deep learning era, in: Proceedings of the IEEE international conference on computer vision 2017, pp. 843–852 [Google Scholar]
- [29].Lee G, Fujita H, Deep learning in medical image analysis: challenges and applications, Vol. 1213, Springer, 2020. [Google Scholar]
- [30].Costa AC, Oliveira HC, Borges LR, Vieira MA, Transfer learning in deep convolutional neural networks for detection of architectural distortion in digital mammography, in: 15th International Workshop on Breast Imaging (IWBI2020), Vol. 11513, International Society for Optics and Photonics, 2020, p. 115130N. [Google Scholar]
- [31].McCollough CH, Bartley AC, Carter RE, Chen B, Drees TA, Edwards P, Holmes DR III, Huang AE, Khan F, Leng S, et al. , Low-dose CT for the detection and classification of metastatic liver lesions: Results of the 2016 Low Dose CT Grand Challenge, Medical Physics 44 (10) (2017) e339–e352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Liu J, Zarshenas A, Qadir A, Wei Z, Yang L, Fajardo L, Suzuki K, Radiation dose reduction in digital breast tomosynthesis (DBT) by means of deep-learning-based supervised image processing, in: Medical Imaging 2018: Image Processing, Vol. 105740, International Society for Optics and Photonics, 2018, p. 105740F. [Google Scholar]
- [33].Gao M, Fessler JA, Chan H-P, Deep convolutional neural network with adversarial training for denoising digital breast tomosynthesis images, IEEE Transactions on Medical Imaging (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Sahu P, Huang H, Zhao W, Qin H, Using Virtual Digital Breast Tomosynthesis for De-Noising of Low-Dose Projection Images, in: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), IEEE, 2019, pp. 1647–1651. [Google Scholar]
- [35].Green M, Sklair-Levy M, Kiryati N, Konen E, Mayer A Neural Denoising of Ultra-low Dose Mammography, in: International Workshop on Machine Learning for Medical Image Reconstruction, Springer, 2019, pp. 215–225 [Google Scholar]
- [36].Ronneberger O, Fischer P, Brox T, U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241. [Google Scholar]
- [37].Huang G, Liu Z, Van Der Maaten L, Weinberger KQ, Densely connected convolutional networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708 [Google Scholar]
- [38].Wang Z, Bovik AC, Sheikh HR, Simoncelli EP, Image quality assessment: from error visibility to structural similarity, IEEE Transactions on Image Processing 13 (4) (2004) 600–612. [DOI] [PubMed] [Google Scholar]
- [39].Wang Z, Simoncelli EP, Translation insensitive image similarity in complex Wavelet domain, in: Proceedings.(ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005, Vol. 2, IEEE, 2005, pp. ii–573. [Google Scholar]
- [40].Sheikh HR, Bovik AC, De Veciana G, An information fidelity criterion for image quality assessment using natural scene statistics, IEEE Transactions on Image Processing 14 (12) (2005) 2117–2128. [DOI] [PubMed] [Google Scholar]
- [41].Sheikh HR, Bovik AC, Image information and visual quality IEEE Transactions on Image Processing 15 (2) (2006) 430–444 [DOI] [PubMed] [Google Scholar]
- [42].Johnson J, Alahi A, Fei-Fei L, Perceptual losses for real-time style transfer and super-resolution, in: European conference on computer vision, Springer, 2016, pp. 694–711. [Google Scholar]
- [43].Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y, Generative adversaria nets, in: Advances in neural information processing systems, 2014, pp.2672–2680 [Google Scholar]
- [44].Mathiasen A, Hvilshøj F, Backpropagating through fr∖‘echet inception distance, arXiv preprint arXiv:2009.14075 (2020). [Google Scholar]
- [45].Cheon M, Yoon S-J, Kang B, Lee J, Perceptual image quality assessment with transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 433–442. [Google Scholar]
- [46].Zhao H, Gallo O, Frosio I, Kautz J, Loss functions for image restoration with neural networks, IEEE Transactions on Computational Imaging 3 (1) (2016) 47–57. [Google Scholar]
- [47].Ding K, Ma K, Wang S, Simoncelli EP, Comparison of full-reference image quality models for optimization of image processing systems, International Journal of Computer Vision 129 (4) (2021) 1258–1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Nagare M, Melnyk R, Rahman O, Sauer KD, Bouman CA A Bias-Reducing Loss Function for CT Image Denoising, in: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 1175–1179 [Google Scholar]
- [49].He K, Zhang X, Ren S, Sun J, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [Google Scholar]
- [50].Borges LR, Oliveira H. C. d., Nunes PF, Bakic PR, Maidment AD, Vieira MA, Method for simulating dose reduction in digital mammography using the Anscombe transformation Medical Physics 43 (6Part1) (2016) 2704–2714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Borges LR, Guerrero I, Bakic PR, Foi A, Maidment AD, Vieira MA, Method for simulating dose reduction in digital breast tomosynthesis, IEEE Transactions on Medical Imaging 36 (11) (2017) 2331–2342. [DOI] [PubMed] [Google Scholar]
- [52].Azzari L, Borges LR, Foi A, Chapter 1 - Modeling and Estimation of Signal-Dependent and Correlated Noise, in: Bertalmío M (Ed.), Denoising of Photographic Images and Video: Fundamentals, Open Challenges and New Trends, Springer, Switzerland, 2018, pp. 13–36. doi: 10.1007/978-3-319-96029-6 [DOI] [Google Scholar]
- [53].Starck J-L, Murtagh FD, Bijaoui A, Image processing and data analysis: the multiscale approach, Cambridge University Press, 1998. [Google Scholar]
- [54].Dabov K, Foi A, Katkovnik V, Egiazarian K, Image denoising by sparse 3-d transform-domain collaborative filtering, IEEE Transactions on image processing 16 (8) (2007) 2080–2095. [DOI] [PubMed] [Google Scholar]
- [55].Makitalo M, Foi A, Optimal inversion of the generalized anscombe transformation for poisson-gaussian noise, IEEE transactions on image processing 22 (1) (2012) 91–103 [DOI] [PubMed] [Google Scholar]
- [56].Silver D, Schrittwieser J, Simonyan K, Antonoglou I Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, et al. , Mastering the game of go without human knowledge, Nature 550 (7676) (2017) 354–359. [DOI] [PubMed] [Google Scholar]
- [57].Ioffe S, Szegedy C, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint arXiv:1502.03167 (2015). [Google Scholar]
- [58].Simonyan K, Zisserman A, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014). [Google Scholar]
- [59].Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S, Gans trained by a two time-scale update rule converge to a local nash equilibrium, Advances in neural information processing systems 30(2017) [Google Scholar]
- [60].Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D Erhan D, Vanhoucke V, Rabinovich A, Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9. [Google Scholar]
- [61].Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC, Improved training of wasserstein gans, Advances in neural information processing systems 30 (2017). [Google Scholar]
- [62].Carton A-K, Bakic P, Ullberg C, Derand H, Maidment AD, Development of a physical 3D anthropomorphic breast phantom, Medical Physics 38 (2) (2011) 891–896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [63].Kingma DP, Ba J, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014). [Google Scholar]
- [64].Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A, Automatic differentiation in pytorch (2017).
- [65].Erhan D, Bengio Y, Courville A, Vincent P, Visualizing higher-layer features of a deep network, University of Montreal; 1341(3)(2009)1. [Google Scholar]
- [66].Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, et al. , Recent advances in convolutional neural networks, Pattern Recognition 77 (2018) 354–377. [Google Scholar]
- [67].Zhang Q, Nian Wu Y, Zhu S-C, Interpretable convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 88278836. [Google Scholar]
- [68].Saharia C, Ho J, Chan W, Salimans T, Fleet DJ, Norouzi M, Image super-resolution via iterative refinement, arXiv preprint arXiv:2104.07636 (2021). [DOI] [PubMed] [Google Scholar]