Skip to main content
Radiology: Artificial Intelligence logoLink to Radiology: Artificial Intelligence
. 2021 Apr 14;3(4):e200097. doi: 10.1148/ryai.2021200097

A Deep Learning Approach to Re-create Raw Full-Field Digital Mammograms for Breast Density and Texture Analysis

Hai Shu 1, Tingyu Chiang 1, Peng Wei 1, Kim-Anh Do 1, Michele D Lesslie 1, Ethan O Cohen 1, Ashmitha Srinivasan 1, Tanya W Moseley 1, Lauren Q Chang Sen 1, Jessica W T Leung 1, Jennifer B Dennison 1, Sam M Hanash 1, Olena O Weaver 1,
PMCID: PMC8328112  PMID: 34350403

Abstract

Purpose

To develop a computational approach to re-create rarely stored for-processing (raw) digital mammograms from routinely stored for-presentation (processed) mammograms.

Materials and Methods

In this retrospective study, pairs of raw and processed mammograms collected in 884 women (mean age, 57 years ± 10 [standard deviation]; 3713 mammograms) from October 5, 2017, to August 1, 2018, were examined. Mammograms were split 3088 for training and 625 for testing. A deep learning approach based on a U-Net convolutional network and kernel regression was developed to estimate the raw images. The estimated raw images were compared with the originals by four image error and similarity metrics, breast density calculations, and 29 widely used texture features.

Results

In the testing dataset, the estimated raw images had small normalized mean absolute error (0.022 ± 0.015), scaled mean absolute error (0.134 ± 0.078) and mean absolute percentage error (0.115 ± 0.059), and a high structural similarity index (0.986 ± 0.007) for the breast portion compared with the original raw images. The estimated and original raw images had a strong correlation in breast density percentage (Pearson r = 0.946) and a strong agreement in breast density grade (Cohen κ = 0.875). The estimated images had satisfactory correlations with the originals in 23 texture features (Pearson r ≥ 0.503 or Spearman ρ ≥ 0.705) and were well complemented by processed images for the other six features.

Conclusion

This deep learning approach performed well in re-creating raw mammograms with strong agreement in four image evaluation metrics, breast density, and the majority of 29 widely used texture features.

Keywords: Mammography, Breast, Supervised Learning, Convolutional Neural Network (CNN), Deep learning algorithms, Machine Learning Algorithms

See also the commentary by Chan in this issue.

Supplemental material is available for this article.

©RSNA, 2021

Keywords: Mammography, Breast, Supervised Learning, Convolutional Neural Network (CNN), Deep learning algorithms, Machine Learning Algorithms


Summary

A deep learning approach was developed to re-create for-processing digital mammograms from for-presentation mammograms and showed strong agreement with the original images.

Key Points

  • ■ The proposed deep learning approach used the U-Net convolutional network as a nonlinear regression model and generalized it from categorical image segmentation to continuous image estimation.

  • ■ The proposed approach demonstrated good performance in re-creating for-processing mammograms, with small normalized mean absolute error (0.022 ± 0.015) and high structural similarity index (0.986 ± 0.007) to the original images.

  • ■ The re-created and original for-processing mammograms had strong agreement in breast density (Pearson r = 0.946; Cohen κ = 0.875) and satisfactory correlations for 23 commonly used texture features (Pearson r ≥ 0.503 or Spearman ρ ≥ 0.705).

Introduction

Screening mammography plays a critical role in early breast cancer detection (1). Mammogram-based measures of breast density (2,3) and texture patterns (4,5) have been shown to be important risk factors for developing breast cancer. Breast density measures the amount of fibroglandular (ie, dense) tissue compared with fatty (ie, nondense) tissue in the breast, which can be evaluated by the percentage of dense tissue or graded using the four categories of the Breast Imaging Reporting and Data System (BI-RADS) (6). Mammographic texture features, such as those based on the gray-level co-occurrence matrix, are also strong independent risk factors and further improve predictive ability when combined with breast density (5,7,8).

Full-field digital mammography generates “for-processing” images (also referred to as “raw” data) (9) in which the grayscale is proportional to the x-ray attenuation through the breast. These data are then digitally manipulated to enhance some features, such as contrast and resolution, to produce “for-presentation” (also called “processed”) images that are optimized for visual cancer detection by radiologists. Raw images are more appropriate for quantitative analysis than processed images because the raw images retain the original x-ray attenuation information and reflect the original physical properties of the breast (10). For example, the commonly used commercial software programs Volpara (Volpara Health) and Quantra (Hologic) use raw mammograms for volumetric breast density measurement and have excellent agreement with MRI (considered the “reference standard” of measuring breast density) (11). In the literature (10,12), it has been reported that there are significant differences in breast density and texture measures between raw and processed images. However, in clinical settings, raw images are rarely archived due to cost and storage constraints; only processed images are routinely stored. Moreover, mammography equipment manufacturers do not disclose their raw-to–processed image conversion steps, and inversion algorithms are not available. Hence, retrospective calculations of breast density and texture features using preferred raw image–based algorithms and software (5,11) are not applicable to most historical images stored only in the processed format. As the BI-RADS criteria (6) and software algorithms (5,11) for breast density and texture assessments update over time, such retrospective calculations enable new research without collecting new data. This is particularly helpful in the settings of retrospective comparative research of breast density or texture dynamics and their effect on breast cancer risk, diagnosis, and prognosis.

Together, we were motivated to re-create raw images from processed images for breast density and texture analysis. We developed an image re-creation approach by using a powerful deep learning technique. Deep learning excels in modern computer vision and image processing (13). In particular, the U-Net convolutional network (14) and its variants have exhibited state-of-the-art performance in various medical image segmentation tasks such as breast cancer histologic segmentation (15), brain tumor MRI segmentation (16), and diabetic retinal lesion segmentation (17). We extended the use of U-Net to image estimation, in which the re-created image can be viewed as a continuous-valued segmentation map. Furthermore, we applied kernel regression (18) based on image acquisition parameters to facilitate image re-creation.

Materials and Methods

Image Dataset

We conducted a Health Insurance Portability and Accountability Act–compliant retrospective analysis of raw and processed mammograms that were acquired in the first 1000 consecutive women enrolled in an institutional review board–approved prospective breast cancer screening cohort (ClinicalTrials.gov identifier: NCT03408353). The images were obtained from October 5, 2017, to August 1, 2018. This retrospective analysis was approved by our institution’s institutional review board, and informed consent was waived.

All raw and processed mammograms were generated using the Selenia Dimensions Mammography System (Hologic). The processed images were retrieved from the picture archiving and communication system (PACS), and the raw images were stored on a PACS-based research server. There were 4394 matched pairs of raw and processed images available for 964 patients. The mammograms were reviewed by radiologists, and images with a visible presence of breast surgery (n = 3), implants (n = 608), marker clips (n = 64), or implantable devices (n = 6) that considerably contaminated the breast region and could result in invalid assessment of breast density and texture were excluded (Fig E1 [supplement]).

The final dataset contained 3713 pairs of raw and processed mammograms in 884 women (age range, 28–80 years; mean age, 57 years ± 10), including 1891 pairs in the mediolateral oblique (MLO) view and 1822 pairs in the craniocaudal (CC) view. In these data, 1032 MLO-view and 985 CC-view pairs were 3328 × 4096 pixels in size; the remainder were 2560 × 3328 pixels. The pixel size for all images was 0.065 × 0.065 mm2. The dataset was randomly split into disjoint training and testing sets, consisting of images in 737 and 147 women, respectively, (approximately a 5:1 ratio) with 1569 and 322 pairs in the MLO view and 1519 and 303 pairs in the CC view. Both original raw and processed images were used in model training. In testing, the processed images were input into the trained model to generate the estimated raw images, and the original raw images were used only to evaluate image re-creation performance.

Image Preprocessing

Mammograms of right breasts were flipped horizontally to the left. Breast segmentation masks were generated from processed images using a contour-based algorithm (19). The width of each image was cut to the effective width (range, 544–3328 pixels; mean, 1647.4 pixels ± 502.2), which is the smallest multiple of 16 not less than the width of its breast mask. The original raw images were further negative log-transformed (10). For each image, pixels outside its breast mask were set to zero, and the intensity range in the breast mask was min-max normalized on a 0 to 255 scale. Intensity normalization was performed because it improves the prediction of deep neural networks by reducing data variability (20). Thus, intensity normalization is essential for mammograms, in which original raw images have much more varied intensity ranges than processed images (Table E1 [supplement]). The estimated raw images from our deep neural network (ie, modified U-Net) were later rescaled to their original intensity ranges (estimated using our kernel regression model with image acquisition parameters). Due to the 48-GB memory limit of our graphics processing units (GPUs), the input and reference images for our U-Net were resized to 256 × 512 square pixels, and the output images of the network were also in the size of 256 × 512 square pixels and would be resized back to the original image sizes of their corresponding processed mammograms. The size of 256 × 512 pixels was determined because it did not induce overfitting, but our trials with larger image sizes did—probably due to the limited sample size (3088 image pairs for training) (21).

Deep Learning Approach

Our deep learning approach included two nonlinear model components: a modified U-Net for initial image estimation and kernel regression for estimating original intensity ranges. Figure 1 illustrates this approach.

Figure 1:

Flowchart of the study's deep learning approach for re-creating raw mammograms. The solid arrows connect steps in both model training and testing, and the dashed arrows connect steps in model training only. DICOM = Digital Imaging and Communications in Medicine.

Flowchart of the study's deep learning approach for re-creating raw mammograms. The solid arrows connect steps in both model training and testing, and the dashed arrows connect steps in model training only. DICOM = Digital Imaging and Communications in Medicine.

The U-Net was originally developed for image segmentation in which a discrete-valued label map was generated as the output (14). In contrast, for this study, it was modified to estimate the continuous-valued raw image. To this end, the training loss function of the U-Net was changed from the cross entropy of classification to the mean square error Inline graphic where Yij and Ŷij were the jth pixels of the ith preprocessed and initially estimated raw images Yi and Ŷi, respectively. The final activation function was changed from the softmax function with categorical probabilities to the rectified linear unit, ReLU(x) = max (0, x), to output continuous Ŷi. Dropout (22) was applied between convolutional layers with a rate of 0.2 to avoid overfitting. Intrinsically, we used the U-Net architecture (Fig 2) to fit a nonlinear regression function Ŷi = Fw(Xi) from the processed image Xi to minimize the above mean square error with respect to the parameter vector w.

Figure 2:

The modified U-Net for initial image re-creation. Yij and Ŷij are the jth pixels of the ith preprocessed and initially estimated raw images. BatchNorm = batch normalization, Conv = convolutional operation, ReLU = rectified linear unit.

The modified U-Net for initial image re-creation. Yij and Ŷ ij are the jth pixels of the ith preprocessed and initially estimated raw images. BatchNorm = batch normalization, Conv = convolutional operation, ReLU = rectified linear unit.

The initial estimated raw image from our modified U-Net was transformed to its original image size and intensity range by the inversion of preprocessing. The original maximum and minimum breast values were required in the inversion of min-max normalization, and the original median background value was desirable to assign to all background pixels outside the breast mask. Multivariate kernel regression (18), a nonlinear modeling technique, was used to estimate the three values. The acquisition parameters of the original raw mammogram, also stored in the Digital Imaging and Communications in Medicine (DICOM) header of its processed image, that we used as predictor variables in the kernel regression were: kVp, exposure time, x-ray tube current, exposure, exposure in μAs, body part thickness, compression force, relative x-ray exposure, and organ dose. The Gaussian kernel and the local linear estimation were used for kernel regression.

Model Training

To train our modified U-Net, the training dataset was randomly split into two parts. Specifically, 2467 pairs of processed and raw images in either the MLO or CC views in 589 patients were used for direct training, and 621 pairs in 148 patients were used as validation data to monitor training. Our modified U-Net was trained for 300 epochs using the Adam optimizer (23) to minimize the mean square error. The network weights were initialized using the He normal initializer (24), and network biases were initialized as zero. The optimization was specified with a mini-batch size of 16, and the learning rate was set to 0.01, 0.005, and 0.001, respectively, for the three consecutive sets of 100 epochs. The U-Net model was implemented with the Keras and Tensorflow libraries using Python software (version 3.5.2; www.python.org). The training was run on four NVIDIA Titan Xp GPUs (total memory, 48 GB).

Three kernel-regression models were trained to fit the maximum and minimum breast values and median background value, respectively, in the original raw image. All 3088 image pairs in the 737 training patients were used for training kernel regression. The kernel bandwidth was selected using the leave-one-out least squares cross-validation method (18). Kernel regression was implemented with the “np” package using R software (version 3.6.0; www.r-project.org).

Statistical Analysis

The image re-creation performance of our proposed approach was evaluated on the testing dataset by four image error and similarity metrics, breast density calculations, and 29 widely used texture features.

We adopted the four image metrics widely used in machine learning (25,26): normalized mean absolute error (nMAE), scaled mean absolute error (sMAE), mean absolute percentage error (MAPE), and structural similarity index (SSIM).

graphic file with name ryai.2021200097eq2.jpg
graphic file with name ryai.2021200097eq3.jpg
graphic file with name ryai.2021200097eq4.jpg
graphic file with name ryai.2021200097eq5.jpg

Where IS and ÎS are the pixel s∈S of the original and estimated raw images, respectively, and S is an index set, {μ1, μ2}, {σ12, σ22}and σ12 are the means, variances, and covariance of {IS}s∈S and {ÎS}s∈S, and C1 = (0.01L)2 and C2 = (0.03L)2 with L = maxs∈S(IS,ÎS) − min s∈S(IS,ÎS). The four metrics were computed for both the breast area and the whole image using Python 3.5.2.

The original and estimated raw images were also put into Volpara software (version 1.5.4.0) to obtain breast density percentages and grades. Volpara is approved by the U.S. Food and Drug Administration for computing volumetric breast density in raw mammograms. The original and estimated raw image breast density percentages were compared using linear regression, Bland-Altman plot (27), and Pearson correlation. The agreement between density grades was determined using the quadratic-weighted Cohen κ value (28). The comparison was analyzed at the per-image and per-breast levels. We did not obtain breast density values from processed mammograms because they were not accepted by Volpara.

We applied 29 commonly used mammographic texture descriptors (5,10), including 12 gray-level histogram features, eight co-occurrence features, seven run-length features, and two structural features, to compare original raw images with estimated raw images and also with original processed images. We followed the same texture analysis pipeline of Gastounioti et al (10) to compute each texture feature by averaging its values from adjacent 6.3 × 6.3 mm2 local windows that cover the entire breast region. All texture features were calculated using MATLAB software (version R2019a; Mathworks). Pearson correlation and Spearman rank correlation were both computed for feature comparison at the per-image level. Pearson correlation only measures the linear association between two variables, whereas Spearman rank correlation can assess their monotonic association regardless of linear or nonlinear relationship (29). Linear regression and Bland-Altman plot analyses were also conducted.

In the above comparisons, we treated the breast density and texture feature data as clustered data, with each woman as a cluster, because the mammogram images of the same woman may be correlated. We therefore applied the within-cluster resampling approach of Hoffman et al (30) with 5000 resamples to fit linear regression models and to approximate the standard deviation values for the agreement limits of Bland-Altman plots, the cluster-weighted approach of Lorenz et al (29) to compute Pearson and Spearman correlations, and the nonparametric method of Yang and Zhou (28) to estimate the quadratic-weighted Cohen κ value. As recommended by Krouwer (31), we showed the reference measurement, the measurement using the original raw images, on the x-axis of the Bland-Altman plot. All above comparisons were performed using R 3.6.0.

Results

Patient Characteristics

The characteristics of the patients and their original raw mammograms are summarized in Table 1. The training and testing datasets consisted of 737 and 147 patients, respectively, with comparable ages (56.6 years ± 9.8 and 57.0 years ± 10.0; P = .71). The paired original raw and processed images had the same aforementioned acquisition parameters in the DICOM header, and the training and testing datasets did not differ significantly in these parameters (Table 1). All 3713 pairs of original raw and estimated raw images were put into Volpara for breast density calculation. Only 3067 training and 618 testing pairs passed the Volpara evaluation; the remaining 28 pairs failed due to errors generated for their original raw images. For the original raw images, the Volpara density percentages (VDPs) for the training and testing sets did not differ significantly (Table 1).

Table 1:

Characteristics of Patients and Raw Mammograms

graphic file with name ryai.2021200097.tbl1.jpg

Although our estimated raw images were resized to the image sizes of their original raw mammograms, it is notable that the re-created spatial resolutions were approximately one-seventh (0.140 ± 0.027) of the original spatial resolutions due to the image downsizing used for the U-Net. The ratio of spatial resolutions was computed as (256/original effective width × 512/original height)1/2.

Comparison by Image Error and Similarity Metrics

Table 2 reports the nMAE, sMAE, MAPE, and SSIM calculated for the estimated raw mammograms in the testing dataset. The nMAE had small means (≤ 0.0221) and standard deviations (≤ 0.0164). The mean ± standard deviation of the more stringent metrics, sMAE and MAPE, were 0.1340 ± 0.0782 and 0.1152 ± 0.0588, respectively, for the breast portion and 0.0504 ± 0.0278 and 0.0692 ± 0.0356, respectively, for the whole image. The SSIM values showed strong structural similarity between original raw and estimated raw images, with large means (≥ 0.9813) and small standard deviations (≤ 0.0102).

Table 2:

Metric Values of Estimated Raw Mammograms in Testing Dataset

graphic file with name ryai.2021200097.tbl2.jpg

To visually evaluate image re-creation, we randomly selected four independent breasts from the testing dataset that had different Volpara density grades (VDGs). Their estimated raw and original raw mammograms are shown in Figure 3. We negative log-transformed gray-scale levels to improve image readability and provided the associated metric values for the estimated images. Visually, the estimated raw images exhibited close approximations to the original raw mammograms in both the MLO and CC views.

Figure 3:

Comparison of original and estimated raw images for four randomly selected patients with different true Volpara breast density grades. The images were negative log-transformed to improve readability. B = breast portion, MAPE = mean absolute percentage error, nMAE = normalized mean absolute error, sMAE = scaled mean absolute error, SSIM = structural similarity index, W = whole image.

Comparison of original and estimated raw images for four randomly selected patients with different true Volpara breast density grades. The images were negative log-transformed to improve readability. B = breast portion, MAPE = mean absolute percentage error, nMAE = normalized mean absolute error, sMAE = scaled mean absolute error, SSIM = structural similarity index, W = whole image.

Comparison of Breast Density

We further evaluated our deep learning approach on the testing dataset by comparing the VDPs and VDGs for the estimated and original raw images. Figure 4 shows scatterplots (with regression lines) and Bland-Altman plots for the estimated and original log-transformed VDPs; here, VDPs were log-transformed to meet the Gaussian assumption of linear regression (30). The two regression lines at per-image and per-breast levels almost overlapped the unity lines, for which the slope values were 1.048 and 1.053 with standard error values of 0.013 and 0.015, respectively. The Bland-Altman plots showed good agreement between the estimated and original log-transformed VDPs, with small mean differences (–0.06 and –0.05) and acceptable 95% agreement limits. The 95% agreement band was moderately narrower at the per-breast level than at the per-image level ([–0.32, 0.21] vs [–0.41, 0.29]). Table 3 shows strong Pearson correlations between the estimated and original VDPs; here, VDPs were not log-transformed because Gaussianity is not required in the correlation analysis (29). We observed a larger Pearson correlation value at the per-breast level (r = 0.972; 95% CI: 0.962, 0.983) than the per-image level (r = 0.946; 95% CI: 0.926, 0.966).

Figure 4:

Scatterplots and Bland-Altman plots of log-transformed Volpara breast density percentages obtained from the original and estimated raw images (618 image pairs for 293 breasts in the testing dataset). (A, C) In each scatterplot, the red solid and black dashed lines represent the regression and unity lines, respectively; the standard error of the regression slope was 0.013 at the per-image level and 0.015 at the per-breast level. (B, D) In each Bland-Altman plot, the red solid line represents the mean difference, the red dashed lines represent the 95% limits of agreement, and the black dashed line represents the zero mean difference. EstVDP = estimated Volpara breast density percentage, OriVDP = original Volpara breast density percentage, SD = standard deviation.

Scatterplots and Bland-Altman plots of log-transformed Volpara breast density percentages obtained from the original and estimated raw images (618 image pairs for 293 breasts in the testing dataset). (A, C) In each scatterplot, the red solid and black dashed lines represent the regression and unity lines, respectively; the standard error of the regression slope was 0.013 at the per-image level and 0.015 at the per-breast level. (B, D) In each Bland-Altman plot, the red solid line represents the mean difference, the red dashed lines represent the 95% limits of agreement, and the black dashed line represents the zero mean difference. EstVDP = estimated Volpara breast density percentage, OriVDP = original Volpara breast density percentage, SD = standard deviation.

Table 3:

Comparison of Volpara Density Percentages and Grades for Original and Estimated Raw Mammograms in Testing Dataset

graphic file with name ryai.2021200097.tbl3.jpg

Table 3 also compares the VDGs for the estimated raw and original raw images. Cohen κ was 0.913 (95% CI: 0.882, 0.943) at the per-breast level and 0.875 (95% CI: 0.845, 0.905) at the per-image level. The estimated and original VDGs achieved the “almost perfect” agreement of Landis et al (33) (defined as κ∈ [0.81, 1]).

The above results demonstrated strong consistency of the estimated and original VDPs and VDGs, indicating good performance of our deep learning approach. The better breast density results at the per-breast level than at the per-image level was attributed to using both MLO and CC views. The two-view use is standard in radiology practice and increases the accuracy of volumetric breast density estimation.

Comparison of Texture Features

We applied 29 widely used texture features to compare the original raw images with estimated raw and original processed images in the testing dataset. Table 4 reports the Pearson correlation (r) and Spearman rank correlation (ρ) for the feature comparison. Original raw images were highly linear correlated with the estimated raw images in 19 texture features, including 11 of the 12 gray-level histogram features (r ∈ [0.806, 0.986]), six of the eight co-occurrence features (r ∈ [0.762, 0.915]), and two of the seven run-length features (r = 0.951 and 0.834, respectively), whereas original raw images were significantly less linear correlated with processed images in each of the 19 features (P < .05) and were less in 16 of them by large margins (Δr > 0.4). In addition, original raw and estimated raw images were moderately linear correlated in local binary pattern and fractal dimension (r = 0.503 and 0.668, respectively), and were strongly monotonic correlated in kurtosis and energy (ρ= 0.719 and 0.705, respectively). For the remaining six features, though estimated raw images were poorly correlated (r ≤ 0.268; ρ ≤ 0.293) with original raw images, processed images were strongly linear correlated (r ∈ [0.832, 0.846]) in run-length nonuniformity, run percentage, long-run emphasis, and short-run emphasis, and were moderately correlated in low gray-level run emphasis (r = 0.462; ρ = 0.553) and inverse difference moment (r = 0.650; ρ = 0.633). More results are shown in Appendix E1 and Figure E2 (supplement). Hence, from our estimated raw mammograms and complemented with routinely available processed images, one may find substitutes that are satisfactorily correlated with all the 29 texture features of raw images.

Table 4:

Correlations Between Texture Features from Different Mammogram Types in Testing Dataset

graphic file with name ryai.2021200097.tbl4.jpg

We selected sum average and fractal dimension (top-ranked predictors for breast cancer risk in the literature [5,34]) as well as kurtosis (with r = 0.062 but ρ = 0.719 for original raw vs estimated raw) for linear regression and Bland-Altman plot analyses. Figure 5 shows the results. Our estimated raw images demonstrated remarkably stronger agreement with raw images than processed images did in sum average, which characterizes the dispersed patterns of breast density (no-intercept regression slope 1.026 vs 0.706; 95% agreement limits of Bland-Altman plots [–0.90, 7.33] vs [–44.87, –28.68]; raw feature range [118.5, 138.5]), consistent with the correlation results (r = 0.864 vs 0.240; ρ = 0.825 vs 0.162). Fractal dimension had a negative bias of −0.3 in estimated raw images but was satisfactorily approximated using processed images (95% agreement limits of Bland-Altman plots: [–0.36, –0.24] vs [–0.00, 0.06]; raw feature range [2.00, 2.29]; r = 0.668 vs 0.934). One may instead use processed images (routinely stored) to better approximate the fractal dimension of original raw images. Kurtosis was less than 3.5 for 98.4% (615 of 625) of raw images and for 97.3% (608 of 625) of estimated raw images but was larger than 3.5 for 88.2% (551 of 625) of processed images, which is clearly seen in the scatter and the Bland-Altman plots. After removing 27 (4.32%; 27 of 625) outliers (those with kurtosis > 3.5 for original raw or estimated raw), we obtained improved agreement between original raw and estimated raw images, with an increased Pearson correlation (r = 0.686 [after] vs 0.062 [before]), and a narrower 95% agreement band ([–0.36, 0.09] [after] vs [–1.88, 1.81] [before]) and an alleviated linear trend in the Bland-Altman plot (Fig 5J). As fractal dimension and kurtosis measure surface roughness and histogram tailedness in the breast region, respectively, our deviations in estimating the two features, in particular, the undesirable negative-linear trends in Bland-Altman plots (Fig 5F, 5J), may be caused by some loss of spatial resolution in resizing the images in our deep learning approach (Fig 1).

Figure 5:

Scatterplots and Bland-Altman plots of sum average, fractal dimension, and kurtosis for comparisons of original raw images with estimated raw images and with processed images (625 image pairs in the testing dataset). (A, C, E, G, I, K) In each scatterplot, the black dashed line represents the unity line, and the red and green solid lines represent the regression lines fitted with nonzero and zero intercepts, respectively. (B, D, F, H, J, L) In each Bland-Altman plot, the red solid line shows the mean difference, the red dashed lines represent the 95% limits of agreement, and the black dashed line shows the zero mean difference. J, The enlarged portion shows the Bland-Altman plot after removing 27 outliers (those with kurtosis > 3.5 for original raw or estimated raw images).

Scatterplots and Bland-Altman plots of sum average, fractal dimension, and kurtosis for comparisons of original raw images with estimated raw images and with processed images (625 image pairs in the testing dataset). (A, C, E, G, I, K) In each scatterplot, the black dashed line represents the unity line, and the red and green solid lines represent the regression lines fitted with nonzero and zero intercepts, respectively. (B, D, F, H, J, L) In each Bland-Altman plot, the red solid line shows the mean difference, the red dashed lines represent the 95% limits of agreement, and the black dashed line shows the zero mean difference. J, The enlarged portion shows the Bland-Altman plot after removing 27 outliers (those with kurtosis > 3.5 for original raw or estimated raw images).

To demonstrate the re-created image spatial resolution, we provided examples of microcalcifications and spiculated masses in Figures E3 and E4 (supplement). For the microcalcifications shown in Figure E3 (supplement), our estimated raw images were unable to show these small structures. In Figure E4 (supplement), our estimated raw images captured the major patterns of the spiculated masses, though not as well as raw images had. These results indicate that our estimated raw images did lose certain high resolution texture information of the original raw images due to loss of spatial resolution. It is notable that the microcalcifications and spiculated masses are viewed by radiologists in practice via the processed images, which are the enhanced images, not via the raw images. In any case, the lost texture information of our estimated raw images for these structures can always be complemented by the processed images, which are routinely available. Nonetheless, attention is still needed when using our estimated raw images to compute texture features that require high resolution.

Discussion

In this study, we developed a deep learning approach for re-creating raw digital mammograms from processed mammograms. The project was undertaken to enable retrospective comparative quantitative analysis of the mammographic density and texture features in a large research cohort of patients in our institution. Because raw images are preferred over processed images for feature extraction, but are not routinely stored, the re-created images can aid in temporal breast density and texture analysis by creating a readily comparable dataset (5,1012). Our re-created raw images were strongly correlated with the original images in four image error and similarity metrics, in breast density percentages and grades, and in the majority of 29 widely used breast texture features.

The success of our deep learning approach was mainly attributed to our extension of the U-Net from image segmentation to image estimation. The classic U-Net (14) and its variants excel in image segmentation (1517). They use a special U-shaped network architecture in which a contracting path (the left half of the U) extracts the global salient image features, and an expanding path (the right half of the U) recovers local image details via the skip connections from the contracting path. Mathematically, the U-Net architecture builds a powerful nonlinear regression function to fit the target, the discrete-valued segmentation map. For our purpose of image estimation, we used this nonlinear regression, but we replaced the target with a continuous-valued image by simple yet effective modifications. Specifically, we changed the loss function to the mean square error from the cross entropy of classification and the final activation function to the continuous ReLU function from the categorical softmax function.

Our study had some limitations. First, our deep learning approach only considered digital mammograms generated from a single machine manufacturer (Hologic). Accordingly, our approach may not be generalizable to mammograms from other manufacturers. The raw-to-processed image conversion algorithms likely differ among manufacturers (9), but from the perspective of transfer learning (35), our trained model may be a good starting point to save time or reduce sample size in training for mammograms from a different manufacturer. Second, we excluded a large number of mammograms from patients with a visible presence of breast surgery, implants, marker clips, or implantable devices that could result in invalid feature assessments because of the lack of a good automated algorithm for removing these objects. Third, we did not train the proposed model separately for the MLO and CC views due to the limited sample size for deep learning (1891 and 1822 pairs for MLO and CC, respectively) (21). If a large sample is available, the model may be trained separately for the two views to improve image re-creation, because the views may have different texture patterns (36). Finally, the re-created raw images had lower spatial resolution than (approximately one-seventh of) the original raw images. This was mainly caused by the image downsizing for training the U-Net because of limited GPU memory and sample size. Attention is needed to use our estimated raw images for computing texture features that require high resolution. However, our proposed deep learning framework is rather general, and it is likely to re-create the original spatial resolution if adequate resources are available.

In conclusion, our proposed deep learning approach achieved good performance in inverting digital mammograms from the routinely stored for-presentation (processed) format to the rarely archived for-processing (raw) format. However, our approach is expected to be improved with larger sample sizes and more computational resources, as well as separate training for the MLO and CC views, and to be generalized to different manufacturers and more challenging image cases with foreign objects.

Acknowledgments

Acknowledgments

We are grateful to Scientific Publications, Research Medical Library, The University of Texas MD Anderson Cancer Center, for editing assistance.

Supported in part by grants from National Institutes of Health/National Cancer Institute Cancer Center Support Grant (P30 CA016672), Little Green Book Foundation, Center for Global Early Detection at MD Anderson, and McCombs Institute at MD Anderson.

Disclosures of Conflicts of Interest: H.S. disclosed no relevant relationships. T.C. Activities related to the present article: institution received National Institutes of Health (NIH)/National Cancer Institute (NCI) Cancer Center Support Grant (P30 CA016672); institution supported by Little Green Book Foundation, Center for Global Early Detection at MD Anderson, and McCombs Institute at MD Anderson. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. P.W. Activities related to the present article: institution received NIH/NCI Cancer Center Support Grant (P30 CA016672). Activities not related to the present article: disclosed no relevant relationships. K.A.D. disclosed no relevant relationships. M.D.L. disclosed no relevant relationships. E.O.C. disclosed no relevant relationships. A.S. disclosed no relevant relationships. T.W.M. Activities related to the present article: institution received NIH/NCI Cancer Center Support Grant (P30 CA016672). Activities not related to the present article: author is paid medical consultant for Hologic and Merit Medical. Other relationships: disclosed no relevant relationships. L.C.S. Activities related to the present article: institution received NIH/NCI Cancer Center Support Grant (P30 CA016672); institution received support from Little Green Book Foundation, Center for Global Early Detection at MD Anderson, and McCombs Institute at MD Anderson. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. J.W.T.L. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: author paid by Fujifilm and GE Healthcare for lectures; author has stock/stock options in Subtle Medical (start-up company, stock/stock options have no monetary value at this time). Other relationships: disclosed no relevant relationships. J.B.D. Activities related to the present article: institution supported by Little Green Book Foundation (has supported breast cancer mammography clinical research for the early detection, the MERIT program). Activities not related to the present article: employed by MD Anderson. Other relationships: disclosed no relevant relationships. S.M.H. disclosed no relevant relationships. O.O.W. Activities related to the present article: institution received NIH/NCI Cancer Center Support Grant (P30 CA016672); institution received support from Little Green Book Foundation (sponsor of the patient cohort retrospectively used in the study). Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships.

Abbreviations:

BI-RADS
Breast Imaging Reporting and Data System
CC
craniocaudal
DICOM
Digital Imaging and Communications in Medicine
MAPE
mean absolute percentage error
MLO
mediolateral oblique
nMAE
normalized mean absolute error
PACS
picture archiving and communication system
sMAE
scaled mean absolute error
SSIM
structural similarity index
VDG
Volpara density grade
VDP
Volpara density percentage

References

  • 1.Tabar L, Yen MF, Vitak B, Chen HHT, Smith RA, Duffy SW. Mammography service screening and mortality in breast cancer patients: 20-year follow-up before and after introduction of screening. Lancet 2003;361(9367):1405–1410. [DOI] [PubMed] [Google Scholar]
  • 2.Boyd NF, Guo H, Martin LJ, et al. Mammographic density and the risk and detection of breast cancer. N Engl J Med 2007;356(3):227–236. [DOI] [PubMed] [Google Scholar]
  • 3.Wang AT, Vachon CM, Brandt KR, Ghosh K. Breast density and breast cancer risk: a practical review. Mayo Clin Proc 2014;89(4):548–557. [DOI] [PubMed] [Google Scholar]
  • 4.Gastounioti A, Conant EF, Kontos D. Beyond breast density: a review on the advancing role of parenchymal texture analysis in breast cancer risk assessment. Breast Cancer Res 2016;18(1):91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wang C, Brentnall AR, Cuzick J, Harkness EF, Evans DG, Astley S. A novel and fully automated mammographic texture analysis for risk prediction: results from two case-control studies. Breast Cancer Res 2017;19(1):114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sickles EA, D’Orsi CJ, Bassett LW. ACR BI-RADS Mammography . In: ACR BI-RADS Atlas. Breast Imaging Reporting and Data System. 5th ed. Reston, Va:American College of Radiology,2013;134–136. [Google Scholar]
  • 7.Keller BM, Chen J, Conant EF, Kontos D. Breast density and parenchymal texture measures as potential risk factors for estrogen-receptor positive breast cancer. In: Aylward S, Hadjiiski LM, eds.Proceedings of SPIE: medical imaging 2014—computer-aided diagnosis.Vol 9035.Bellingham, Wash:International Society for Optics and Photonics,2014;90351D. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Nielsen M, Vachon CM, Scott CG, et al. Mammographic texture resemblance generalizes as an independent risk factor for breast cancer. Breast Cancer Res 2014;16(2):R37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.International Atomic Energy Agency. Quality Assurance Programme For Digital Mammography. Vienna, Austria:International Atomic Energy Agency,2011. [Google Scholar]
  • 10.Gastounioti A, Oustimov A, Keller BM, et al. Breast parenchymal patterns in processed versus raw digital mammograms: A large population study toward assessing differences in quantitative measures across image representations. Med Phys 2016;43(11):5862–5877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wang J, Azziz A, Fan B, et al. Agreement of mammographic measures of volumetric breast density to MRI. PLoS One 2013;8(12):e81653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Burton A, Byrnes G, Stone J, et al. Mammographic density assessed on paired raw and processed digital images and on paired screen-film and digital images across three mammography systems. Breast Cancer Res 2016;18(1):130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge, Mass:MIT Press,2016. [Google Scholar]
  • 14.Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, eds.Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Cham, Switzerland:Springer, 2015; 234–241. [Google Scholar]
  • 15.Aresta G, Araújo T, Kwok S, et al. BACH: Grand challenge on breast cancer histology images. Med Image Anal 2019;56(122):139. [DOI] [PubMed] [Google Scholar]
  • 16.Bakas S, Reyes M, Jakab A, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv:1811.02629 [preprint] https://arxiv.org/abs/1811.02629. Posted November 5,2018.Accessed May 10, 2020. [Google Scholar]
  • 17.Porwal P, Pachade S, Kokare M, et al. IDRiD: Diabetic Retinopathy - Segmentation and Grading Challenge. Med Image Anal 2020;59101561. [DOI] [PubMed] [Google Scholar]
  • 18.Li Q, Racine JS. Nonparametric econometrics: theory and practice. Princeton, NJ:Princeton University Press,2007. [Google Scholar]
  • 19.Shen L, Margolies LR, Rothstein JH, Fluder E, McBride R, Sieh W. Deep Learning to Improve Breast Cancer Detection on Screening Mammography. Sci Rep 2019;9(1):12495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Jacobsen N, Deistung A, Timmann D, Goericke SL, Reichenbach JR, Güllmar D. Analysis of intensity normalization for optimal segmentation performance of a fully convolutional neural network. Z Med Phys 2019;29(2):128–138. [DOI] [PubMed] [Google Scholar]
  • 21.Lehman CD, Yala A, Schuster T, et al. Mammographic Breast Density Assessment Using Deep Learning: Clinical Implementation. Radiology 2019;290(1):52–58. [DOI] [PubMed] [Google Scholar]
  • 22.Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15(56):1929–1958.https://jmlr.org/papers/v15/srivastava14a.html. [Google Scholar]
  • 23.Kingma DP, Ba JL. Adam: A Method for Stochastic Optimization. In: International Conference on Learning Representations,San Diego, CA.arXiv:1412.980 [preprint] https://arxiv.org/abs/1412.6980. Posted December 22,2014. Accessed May 10, 2020. [Google Scholar]
  • 24.He K, Zhang X, Ren S, Sun J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In: 2015 IEEE International Conference on Computer Vision (ICCV),Santiago, Chile,December 7–13, 2015.Piscataway, NJ:IEEE,2015;1026–1034. [Google Scholar]
  • 25.Cortez P. Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool. In: Perner P, ed. Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2010. Lecture Notes in Computer Science, vol 6171. Berlin, Germany:Springer, 2010; 572–583. [Google Scholar]
  • 26.Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 2004;13(4):600–612. [DOI] [PubMed] [Google Scholar]
  • 27.Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1(8476):307–310. [PubMed] [Google Scholar]
  • 28.Yang Z, Zhou M. Weighted kappa statistic for clustered matched-pair ordinal data. Comput Stat Data Anal 2015;82(1):18. [Google Scholar]
  • 29.Lorenz DJ, Levy S, Datta S. Inferring marginal association with paired and unpaired clustered data. Stat Methods Med Res 2018;27(6):1806–1817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hoffman EB, Sen PK, Weinberg CR. Within‐cluster resampling. Biometrika 2001;88(4):1121–1134. [Google Scholar]
  • 31.Krouwer JS. Why Bland-Altman plots should use X, not (Y+X)/2 when X is a reference method. Stat Med 2008;27(5):778–780. [DOI] [PubMed] [Google Scholar]
  • 32.Follmann D, Proschan M, Leifer E. Multiple outputation: inference for complex clustered data by averaging analyses from independent data. Biometrics 2003;59(2):420–429. [DOI] [PubMed] [Google Scholar]
  • 33.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33(1):159–174. [PubMed] [Google Scholar]
  • 34.Zheng Y, Keller BM, Ray S, et al. Parenchymal texture analysis in digital mammography: A fully automated pipeline for breast cancer risk assessment. Med Phys 2015;42(7):4149–4160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Pan SJ, Yang Q. A Survey on Transfer Learning. IEEE Trans Knowl Data Eng 2010;22(10):1345–1359. [Google Scholar]
  • 36.Gupta S, Markey MK. Correspondence in texture features between two mammographic views. Med Phys 2005;32(6):1598–1606. [DOI] [PubMed] [Google Scholar]

Articles from Radiology: Artificial Intelligence are provided here courtesy of Radiological Society of North America

RESOURCES