Abstract
Architectural distortion (AD) is one of the most common findings on mammograms, and it may represent not only cancer but also a lesion such as a radial scar that may have an associated cancer. AD accounts for 18–45% missed cancer, and the positive predictive value of AD is approximately 74.5%. Early detection of AD leads to early diagnosis and treatment of the cancer and improves the overall prognosis. However, detection of AD is a challenging task. In this work, we propose a new approach for detecting architectural distortion in mammography images by combining preprocessing methods and a novel structure fusion attention model. The proposed structure-focused weighted orientation preprocessing method is composed of the original image, the architecture enhancement map, and the weighted orientation map, highlighting suspicious AD locations. The proposed structure fusion attention model captures the information from different channels and outperforms other models in terms of false positives and top sensitivity, which refers to the maximum sensitivity that a model can achieve under the acceptance of the highest number of false positives, reaching 0.92 top sensitivity with only 0.6590 false positive per image. The findings suggest that the combination of preprocessing methods and a novel network architecture can lead to more accurate and reliable AD detection. Overall, the proposed approach offers a novel perspective on detecting ADs, and we believe that our method can be applied to clinical settings in the future, assisting radiologists in the early detection of ADs from mammography, ultimately leading to early treatment of breast cancer patients.
Keywords: Architectural distortion, Computer-aided detection, Mammography, Architecture enhancement, Convergence map, Deep learning, Attention mechanism, Structure fusion attention model
Introduction
Breast cancer is the most common cause of cancer death for women worldwide, and early-stage cancer detection is always critical for a better prognosis [1]. There are four common abnormalities that may present in breast cancer, including masses, microcalcifications, architectural distortions (ADs), and asymmetries, according to the BI-RADS [2]. Architectural distortion ranks as the third most frequent mammographic finding following irregular mass and calcification, which are the most prevalent findings for non-palpable breast cancer on mammograms [3]. AD is a type of finding in digital mammography that is difficult to detect, appearing as an abnormal arrangement of tissue strands, often in a radial or spiculated pattern [3]. An example of AD and that of non-AD lesions are shown in Fig. 1. Figure 1a shows the typical AD with a feature of radiating pattern, while Fig. 1b is a normal mammogram. From Fig. 1a and b, it can be observed that the features of AD and non-AD are almost indistinguishable. Even though we know that a typical AD would have radiating features, it is still difficult for a junior radiologist to identify them, and sometimes even missed by a senior radiologist. In fact, architectural distortion is often overlooked in clinical settings due to its nature of resembling background parenchymal regions and vascular structures [4] and is considered the most commonly missed anomaly in mammograms with breast cancer, accounting for 18–45% of missed cancer [5, 6]. Therefore, AD is not only an important finding indicating the presence of an underlying malignancy but also a challenging and often ambiguous imaging feature that requires careful interpretation.
Fig. 1.
Examples of AD and non-AD. a Typical AD in mammography. Patient age, 53; BI-RADS, 4; subsequent pathology, malignant. b Normal mammogram. Patient age, 43. Normal cases are from screening exams for a patient with no evidence of malignancy on follow-up mammography over a 4-year period. The features of AD in a manifest a spiculated mass radiating from a point. The normal mammogram appears almost identical, but its texture is not as pronounced
Architectural distortion is often detected on mammography as an abnormality that may indicate the presence of breast cancer. Mammography has long been used as a standard screening tool for breast cancer detection and has been shown to reduce rates of advanced breast cancers [7]. Architectural distortion in mammography can be caused by both benign and malignant lesions. Benign causes include radial scars, complex sclerosing lesions, sclerosing adenosis, fat necrosis, postprocedural changes, and rare spiculated benign lesions like breast fibromatosis and granular cell tumors [3]. On the other hand, the major cancer types, invasive ductal carcinoma (IDC) and ductal carcinoma in situ (DCIS), can present with architectural distortion in a star-shaped pattern on mammography [3]. Though AD has various types of causes, it is considered a highly suspicious mammographic finding and is often associated with a high risk of malignancy. Studies have shown that the positive predictive value of AD for cancer is approximately 74.5% [7]. Therefore, early detection of architectural distortion contributes to the early diagnosis of breast cancer, which can significantly improve the chances of successful treatment and reduce the mortality rate.
Detecting architectural distortion on mammography using computer-aided detection (CAD) has long been a challenging task [8]. The traditional way to approach the challenging task includes thresholding and image cropping. Global thresholding is a useful method as it eliminates most of the background noise, which can assist in lesion detection in mammography [9, 10]. After thresholding, image cropping based on preprocessing methods such as mean curvature [11], shape index [12], and Gabor filters [13] is applied to locate regions of interest (ROIs). The motivation behind cropping the ROIs before classification is that AD can sometimes be subtle and difficult to identify when viewed in the context of the entire mammogram [14]. Among the image preprocessing methods, Gabor filters are one of the most important methods as they are designed to respond to specific frequency and orientation patterns in the image, allowing for the extraction of texture information from images. Once ROIs have been extracted, concentration index [11, 15], convergence measure [16], and phase portrait [13] are applied to identify the typical AD pattern. These traditional methods are designed to detect specific typical AD features, but since not every AD has typical features, these methods cannot achieve high sensitivity. Additionally, these methods can be interfered with by the noise inherent in mammograms. As a consequence, the use of conventional machine learning algorithms in all of the studies has resulted in many false positives [14], and the result achieved was 5.3 to 10.3 false positives per image at 80% sensitivity [17–19]. Overall, the low sensitivity and high false positive rates of traditional methods are problems that must be addressed.
Due to the ability to identify complex patterns and relationships in mammograms, deep learning technology has been applied to detect AD in recent years. For instance, Cai et al. [20] utilized a SE-DenseNet combining DenseNet and Squeeze-and-Excitation (SE) block and took advantage of the transfer learning approach. The results showed improved classification of architectural distortion in mammography images, reaching an accuracy of 0.982. The best results to date in detecting AD were done by Ben-Ari et al. [21]. By using a separately trained classifier and regressor with the aid of the transfer learning method, the results reached a sensitivity of 0.83 with only 0.88 false positive (FP) per image. The use of deep learning methods in the detection of AD has been proven to be much more effective than traditional methods [14]. While this result represented a substantial improvement over previous studies in detecting architectural distortion, the sensitivity was still insufficient for clinicians to be confident with the results in clinical settings. Furthermore, overfitting, which refers to the phenomenon that a model learning too much from training datasets eventually performs poorly on a test dataset, is also a major problem for deep learning methods. Though numerous methods such as transfer learning and data augmentation have been applied to address the problem of insufficient data, high false positive rates and overfitting still persist as the main challenge nowadays [22]. Therefore, innovative techniques and algorithms that can effectively tackle these issues and improve the performance of machine learning models are pivotal.
In past studies regarding architectural distortion, some studies focused on the classification of AD or non-AD on ROI images, while others focused on detection of AD locations in full mammograms. Classification only provides a binary answer of AD or non-AD and does not provide information about the lesion’s location, while detection, on the other hand, can identify both the location and the type of the abnormalities, providing valuable information for clinicians. This allows clinicians to have a clear basis for diagnosis, rather than just a probability value obtained from classification. Moreover, the results of the detection method allow us to adjust confidence score thresholds, with fewer false negatives under higher sensitivity with lower confidence score thresholds. Although false positives may increase under higher sensitivity as well, clinicians are more concerned about false negatives in clinical practice, since underdiagnosis accompanied by delayed treatment is much worse than overdiagnosis with cautious follow-ups. Therefore, the adjustment of score threshold provides additional information to clinicians. In contrast, classification cannot provide the same kind of information and thus is less helpful in clinical settings. Therefore, the motivation for this study is to develop an AD detection method that can help clinicians identify architectural distortion accurately.
In order to address the abovementioned problems and challenges in traditional methods and deep learning methods, we developed a deep learning–based AD detection model combined with the use of a structure-focused weighted convergence preprocessing. The textures of mammograms are extracted by the architecture-enhanced magnitude and orientation maps in advance, and the weighted orientation map is obtained based on the architecture enhancement map. The original image, architecture enhancement map, and weighted orientation map are combined into a structure-focused map and inputted into the deep learning model, providing not only textures but also suspicious locations of ADs to address the problems of high false positive rates and underdetection of atypical ADs, which refers to the AD that does not have the characteristic spiculated pattern and requires global information from the mammogram to be identified [23]. This method performs better than previous works with lower false positives per image and higher sensitivity and may aid clinicians to early detect breast cancer.
This paper is divided into five main sections: “Related Works,” “Methods and Materials,” “Results,” “Discussion,” and “Conclusion”. “Related Works” discusses previous research related to this paper, including traditional and deep learning methods. “Methods and Materials” elucidates the datasets used, preprocessing, training model architecture, postprocessing, and evaluation methods. “Results” and “Discussion” present experimental results, ablation studies, and comparisons with previous research. “Conclusion” summarizes the research and its limitations.
Related Works
The early works to tackle the problems of detecting ADs in mammography were based on the traditional preprocessing methods. The most used traditional method is Gabor filter. The capability of Gabor filter to extract texture patterns allows more accurate detection of ADs due to the fact that a typical AD exhibits a radial pattern of textures. Combined with methods such as curvilinear structure extraction, phase portrait analysis, and angular dispersion, the early attempt achieved a sensitivity of 0.84 at 7.8 FPs per image [13]. Similarly, the result achieved by means of Gabor filters combined with curvilinear structure extraction, phase portrait analysis, and texture analysis in terms of AUC is 0.76 with an artificial neural network (ANN) based on radial basis functions (RBFs), and the value of false positives per image at 80% sensitivity is 10.5 [24]. However, having over 10 false positives on each image can be quite troublesome for clinical applications, as it can easily interfere with the radiologist’s judgment of the ground truth. Further optimization of this method with techniques such as the fractal dimension, the entropy of the angular spread of power, Laws’ measures, and Haralick’s features reduced FPs per image at 80% to 5.8 [25]. Another implementation of Gabor filter in detecting ADs was presented by Jasionowska et al. [26], which involved detecting ROIs with potential AD using Gabor filter analysis and recognizing ADs through 2D Fourier transform. They evaluated their approach on 33 mammograms containing ADs from Digital Database for Screening Mammography (DDSM) and achieved a sensitivity of 68% with 0.86 FP per image. The use of hand-crafted feature extraction in these studies demonstrated the effectiveness of Gabor filter in detecting architectural distortions in mammograms. Though these methods were able to detect characteristic radiating pattern of ADs, they still perform poorly overall due to the fact that most ADs have atypical patterns, leading to low sensitivity and high false positive rates.
Apart from the use of Gabor techniques combined with phase portrait analysis, other studies using traditional methods focused on the concentration index that can indicate the possibility of the presence of AD on mammograms [12, 27]. The concept of concentration index originates from Hasegawa et al. [28], initially used to detect the convergent fold pattern of the stomach wall in the early stage of cancer. Subsequently, combined with the mean curvature and shape index for extracting the mammary gland region, the concentration index was adapted into the detection of AD and the results showed sensitivity of 0.56 at 1.5 FP per image [12]. Further optimization of this method with direction analysis led to a much improved sensitivity of 0.81 at 2.6 FPs per image [29]. In Yamazaki et al. [27], the Gabor filter and Iris filter were applied before calculating the concentration index, and the result showed 4.83 FPs per image (FPI) at 85.7% sensitivity. The adaptive Gabor filters composed of different Gabor parameters were used to better detect the mammary gland before the calculation of the concentration index, leading to a much better result of 1.06 FP per image at 82.45% sensitivity [30]. Similar to the concentration index, the convergence measure proposed by Palma et al. [16] was applied to digital breast tomosynthesis (DBT) where the 2D detection was obtained by calculating the convergence measure at each pixel of a slice and was aggregated into 3D bounding boxes afterwards. The calculation time of the convergence maps was further reduced by implementing the faster algorithm, and the detection results showed 1.68 FP per case at sensitivity of 80% on private DBT datasets [31]. Overall, the traditional methods based on Gabor filter combined with phase portrait, convergence measures, or convergence index have been widely used in previous studies and have been shown to achieve some improvement in sensitivity and false positives. However, those traditional methods intend to identify the textures of typical AD patterns, having difficulties recognizing atypical patterns. Conventional methods have limitations in the sensitivity of detecting ADs on mammograms, typically around 80% due to the unidentified atypical ADs. This sensitivity level is roughly equivalent to that of the average radiologist [32], but with much more FPs per image. The problem of not being able to detect atypical AD necessitates more advanced methods to tackle this problem.
Recently, with the emergence of deep learning techniques, there has been a shift towards using end-to-end learning approaches that learn features automatically from data without the need for manual feature engineering. The performance of deep learning models has surpassed almost all the traditional methods [33]. For instance, Cai et al. [20] utilized a SE-DenseNet combining DenseNet and SE block to train the image net model and then used breast mass dataset labeled benign or malignant to further fine-tune the model, reaching accuracy of 0.982 in AD classification. Additionally, Rehman et al. [34] proposed a depth-wise convolutional neural network to classify AD’s ROIs into malignant and benign classes, achieving 0.98 accuracy. Moreover, Ben-Ari et al. [21] proposed a novel way to detect the ADs in mammography. The main concept was that a region proposal based on a sparse sampling from the parenchymal region is applied before training. A classifier to determine the AD ROIs and a regressor similar to faster region-based convolutional neural network (R-CNN) were separately trained, and the results showed significantly fewer FPs, reaching sensitivity of 0.83 with only 0.88 FP per image. Despite the significant progress in adapting deep learning methods into classification of ADs, there is still room for improvement in the field of AD detection. Even with 80% sensitivity, some ADs may still be missed, necessitating the development of novel method to increase the sensitivity and capture those overlooked ADs.
Although the deep learning methods outperform traditional approaches in many areas, traditional methods still have advantage in detecting typical AD lesions; thus, combining traditional methods with deep learning approaches appears to yield even better results. The concept of the convergence maps in combination with the deep learning methods was implemented by Li et al. [35], who applied a five-channel image composed of the original image, the Gabor magnitude map, the x- and y-component of the Gabor angle field, and the convergence map as the model input. The model was also trained and tested on private DBT datasets and obtained 1.66 FP per DBT volume at 80% sensitivity. Overall, from the early works of data preprocessing with the aid of thresholding, histogram equalization, Gabor filter, and convergence calculation to the recent state-of-the-art deep learning methods, though mo st of the studies have shown the effectiveness in their respective areas, the detection of AD in mammogram still remains a challenging task. The performance and detailed description of previous studies are summarized in Table 1.
Table 1.
The summary of the techniques and results in the detection of ADs
| Category | Year | Author | Method | Sensitivity | FPI | Datasets |
|---|---|---|---|---|---|---|
| Traditional | 2005 | Matsubara et al. [12] | Shape index, curvedness, and concentration index | 0.56 | 1.5 | Private |
| 2009 | Banik et al. [24] | Gabor filters with phase portrait analysis and ANN | 0.8 | 10.5 | Private | |
| 2009 | Palma et al. [31] | Faster implementation of convergence map | 0.8 | 1.68 | Private DBT | |
| 2010 | Banik et al. [25] | Fractal dimension, the entropy of the angular spread of power, Laws’ measures, and Haralick’s features | 0.8 | 5.8 | Private | |
| 2010 | Jasionowska et al. [26] | Gabor filters and Fourier transform | 0.68 | 0.86 | DDSM | |
| 2014 | Yoshikawa et al. [30] | Adaptive Gabor filter | 0.83 | 1.0 | DDSM | |
| 2015 | Matsubara et al. [29] | Direction analysis of linear structure | 0.81 | 2.6 | Private | |
| 2016 | Yamazaki et al. [27] | Gabor filter, Iris filter, and concentration index | 0.857 | 4.83 | Private | |
| Deep learning | 2017 | Ben-Ari et al. [21] | Domain-specific R-CNN | 0.83 | 0.88 | DDSM |
| 2022 | Li et al. [35] | Convergence map with faster R-CNN | 0.8 | 1.66 | Private DBT |
The methods are arranged in chronological order
FPI false positives per image, DBT digital breast tomosynthesis, ANN artificial neural network, R-CNN region-based convolutional neural network, DDSM Digital Database for Screening Mammography
Recently, the transformer-based models as vision backbones have been a commonly used backbone in image classification and object detection, and it has been proven to be highly efficient and effective in extracting local and global dependencies in spatial dimensions [36]. Moreover, the spatial attention module in convolution block attention module (CBAM) proposed by Woo et al. [37] also serves as an effective way to extract attention across spatial dimensions. Judging from this, we developed a structure fusion attention model (SFAM) combining the transformer and spatial attention mechanism with the use of the proposed structure-focused weighted convergence to extract textures and suspicious AD locations from the original mammograms. We defined the top sensitivity as the maximum sensitivity that a model can achieve under the acceptance of the highest number of false positives. Our method outperforms previous methods in top sensitivity and false positives, which can be an evidence that our method can be a better way to solve the problem of detecting ADs. We will discuss our method in the following section.
Methods and Materials
The overview of the proposed computer-aided detection model is shown in Fig. 2. The method includes several steps of preprocessing, detection, and combination. The following are the procedures that outline our approach:
Structure-focused weighted convergence preprocessing: To manifest suspicious AD locations, we proposed the structure-focused weighted convergence preprocessing consisting of two components: architecture enhancement and weighted orientation augmentation. After applying architecture enhancement to the original image, a magnitude map, manifesting structures in the mammography, can be obtained; on the other hand, an orientation map, related to the direction of the architecture, can also be derived. The weighted orientation augmentation we proposed is computed based on the orientation map, indicating suspicious locations of ADs. Ultimately, the original image, architecture enhancement map, and weighted orientation augmentation are combined into a structure-focused map. This three-channel map takes advantage of both the architecture enhancement and the weighted orientation augmentation, serving as a map that highlights the possible locations of ADs from different aspects.
Structure fusion attention model: Both the original image and the structure-focused map are used as the model input. The proposed structure fusion attention model combines the state-of-the-art Swin Transformer and spatial attention block to achieve better feature extraction performance. Combined with the object detection architecture based on the faster R-CNN, the proposed structure fusion attention model can fuse the features from different channels and learn the features across the spatial dimension, detecting ADs with high sensitivity and low false positives.
Dual inclusive non-maximum suppression (NMS) combination: The structure fusion attention model can identify abnormalities from the original image and further detect subtle structural distortions from the structure-focused map. Therefore, the output bounding boxes of images from both the original image and structure-focused map are pruned first by the soft non-maximum suppression (Soft-NMS) to retain more possibilities and higher achievable sensitivity, then are combined through weighted box fusion to merge the boxes, further reducing false positives. This is similar to what radiologists do in practical scenarios. When determining the presence of a disease like AD, radiologists rely not solely on the image itself but consider other relevant information from different image modalities as well.
Fig. 2.
Overview of the proposed computer-aided detection model for detecting ADs
Figure 2 a–c Overview of the proposed computer-aided detection models for detecting ADs.
Datasets
In this study, the DDSM and Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) are used to train and test our proposed detection framework.
The DDSM dataset is a publicly available dataset of mammography images, which includes digital mammograms as well as associated patient data and radiologist interpretations [38]. The images in the DDSM dataset are annotated with information about abnormalities, such as masses and microcalcifications. The dataset includes 10,480 original images from 2620 patients, containing images categorized into benign, malignant, and malignant without callback. The 10,480 images in mediolateral-oblique (MLO) and craniocaudal (CC) views contain 230 images of AD from 130 patients.
The CBIS-DDSM dataset is a collection of mammographic images and associated clinical information for use in the development and evaluation of CAD and diagnosis systems for breast cancer [39, 40]. It contains a standardized version of cases from the DDSM, which is also a publicly available dataset of mammograms. The dataset includes 3568 original images from 1645 patients, containing images with masses, calcifications, and AD. The 1592 images with masses from 892 patients in MLO and CC views contain 150 images of AD from 95 patients. These images labeled with AD are used in this study. There are 150 images of AD with 12 mammograms having 2 ADs per image while the others having 1 AD per image. Due to the absence of normal images in CBIS-DDSM, normal images from DDSM are added to the test set for evaluation.
The detailed descriptions of the CBIS-DDSM and DDSM datasets are shown in Table 2, and the datasets used for experiments are described in Table 3. The datasets are split into training, validation, and testing sets in approximately a 4:1:1 ratio. The DDSM dataset has a larger amount of data, but the mammograms in the dataset often contain an ambiguous location of bounding boxes and noise [41]. This may significantly influence the training process and leads to instability of the prediction outcomes. On the other hand, the CBIS-DDSM has a much more accurate bounding boxes for it is relabeled and optimized from DDSM. Therefore, the CBIS-DDSM dataset is used to test the performance of our model, while the DDSM dataset is used for comparison with previous works.
Table 2.
The detailed descriptions of CBIS-DDSM and DDSM datasets
| Datasets | No. of patients | No. of images | No. of ADs | Descriptions |
|---|---|---|---|---|
| CBIS-DDSM | 1645 | 3568 | 150 |
DICOM format Images are associated with cropped images and masks for ROIs |
| DDSM | 2620 | 10,480 | 230 |
LJPEG format Images are marked with rough contours of ROIs |
Table 3.
The descriptions of datasets used for experiments
| Datasets | Train | Valid | Test | Descriptions |
|---|---|---|---|---|
| CBIS-DDSM | 99 | 25 | 26 (+ 32 DDSM normal images) |
The dataset only contains AD images Normal images from DDSM are added to the test set for evaluation |
| DDSM | 244 | 61 | 64 | Both AD and normal images are used in training, validation, and testing |
Structure-Focused Weighted Convergence Preprocessing
Due to the difficulty in detecting AD, in this work, we design the structure-focused weighted convergence preprocessing to highlight suspicious AD locations during preprocessing. The preprocessing consists of two stages: architecture enhancement and weighted orientation augmentation. Since the mammograms in CBIS-DDSM and DDSM datasets often have low contrast, masses with obscured margin, and non-enhancing lesions, the architecture enhancement is necessary to improve image clarity and details, manifesting the structure elements in mammograms. The weighted orientation augmentation aims to identify potential radiating structures in typical ADs. The two methods are described thoroughly in the following sections.
Architecture Enhancement
Due to the presence of considerable noise in both CBIS-DDSM and DDSM dataset mammography images, it is beneficial to extract textures from the images through preprocessing before conducting further AD detection. The breast structure consists of various components such as glands, adipose tissue, vessels, and fibrous tissue. We aim to extract structures in mammograms, highlighting the abovementioned structures and simultaneously reducing the background values, assisting the subsequent AD detection process.
There are various methods available for extracting features in image preprocessing, including Gabor, Gaussian, histogram equalization, contrast limited adaptive histogram equalization (CLAHE), and more. The effectiveness of the four methods had been tested, and the Gabor method was selected due to its superior performance in our experiments. Moreover, we chose the Gabor method because it not only enhances the image but also provides orientation information. This orientation information can further assist in finding AD through the weighted orientation convergence preprocessing, which will be discussed in the next section. The enhancement formula is as follows:
| 1 |
| 2 |
where x and y are the pixel coordinates, and are the standard deviations of the function, represents the wavelength of the Gabor transformation, and is the orientation.
The architecture enhancement method can identify line features in that direction given a specific θ. In previous research [13, 30, 42], a combination of filters in different orientations was used to extract the radiating pattern of AD as a typical AD often has a radiating pattern. The orientation ranges from 0 to 179° in increments of 1°. After applying the architecture enhancement to an image, two outputs are generated: the architecture enhancement map and the orientation map. Letting M(p,q) and O(p,q) represent the magnitude and orientation at coordinate (p,q), the architecture enhancement map and orientation map can be obtained by
| 3 |
| 4 |
where represents the Gabor filter at a specific orientation (). The equations represent the convolution of the Gabor filter of different orientations with the image, followed by max and argmax operations to obtain the filtering results. The architecture enhancement map is obtained by selecting the maximum value from filtering results at different orientations for each pixel, and the orientation map is determined by identifying the orientation that provides the maximum magnitude for each pixel.
The parameters used in this study were selected based on experience for the most part, but also with several rationales. First of all, the Gabor hyperparameters λ and σ are dependent. Previous studies choose σ/λ to be 0.56 based on the bandwidth to extract textures [43]. This parameter selection is feasible for extracting textures of specific orientation. However, as for the detection of an object, such as AD, the ratio is insufficient and not feasible. Since the architecture enhancement process involves applying multiple convolution operations of the Gabor filter of different orientations, the parameter setting was much more complicated. We selected λ and σ to maximize the contrast of the textures of ADs to the background. This process involves holding a single value of σ while changing the other value of λ, and vice versa. The aspect ratio between and was set to the default value 0.5. Eventually, and were set to 9.385 and 18.75, respectively; the wavelength () was set to 20. In addition, as for the selection of kernel size, the size of the kernel depends mainly on the size of the object we want to detect. When employing a smaller-sized kernel, the images tend to identify specific features, whereas larger-sized kernels are effective in handling larger objects. We selected the Gabor filter kernel size to fit the size of the AD textures relative to the entire image. Empirically, the average AD size in the CBIS-DDSM dataset is about 400 × 400, and the average length of the radiating textures is around 50. We experimented on different kernel sizes and found out that the kernel size of 51 extracted the texture within ADs the best, and thus, the kernel size of the Gabor transformation was set to 51. Finally, we set the phase offset to its default value 0.
Weighted Orientation Augmentation
The architectural enhancement map provides structure information of the original image and manifests the lines in different directions. Since it only captures lines and textures and the extracted textures are susceptible to various tissue interferences, we introduced weighted orientation augmentation to extract ADs from normal tissue. The concept of weighted orientation augmentation is based on the fact that a typical AD shows a radial stellate pattern in mammography. Before introducing the proposed weighted orientation augmentation method, the original convergence map proposed by Palma et al. [16] is worth mentioning. The convergence at a specific pixel is calculated by the proportion of pixels in an annular disk that points to the center. The illustration of the calculation of convergence is shown in Fig. 3.
Fig. 3.

Nested circle used for calculation of convergence. Points in the annular region are calculated. The outer circle has radius r while the inner circle has radius where . The arrow indicates the orientation at the specific point. In this example, the point q converges while the point q′ does not. The outer radius r was set to 200, and the inner radius was set to 100 in this study. The selection of these numbers was based on the observation of a typical AD pattern in CBIS-DDSM mammography empirically
Before calculating convergence, we first define the random variable K, which represents whether a point q points towards the center c or not, as follows:
| 5a |
where denotes the angle between the orientation at point q and , and αr denotes the inner radius of the annular disk. By definition, for point q in Fig. 3, shown as a dash line is clearly less than ; thus, K is equal to 1 for point q. On the other hand, for point q′, K is equal to 0.
Based on the definition of K, we define S indicating the extent of convergence at a specific point c as follows:
| 6 |
where denotes the domain of pixels between the inner and outer radius centered on point c. The convergence is computed for every pixel on the orientation map to obtain the convergence map. In previous studies, the concept of convergence maps was applied to private datasets to validate its availability of detecting ADs [31, 35], effectively providing potential locations of ADs. However, no studies related to convergence maps have been done on the DDSM and CBIS-DDSM datasets. Since the images in DDSM and CBIS-DDSM datasets have large amounts of noise originating from the glandular, fibrous, and vascular tissue, the resulting convergence map appears to be interfered by these structures, necessitating preprocessing works to reduce the noise. Moreover, although the convergence map can effectively provide potential locations for ADs, it lacks the ability to quantify the degree of pointing to the center. Therefore, a method taking the degree of deviation into account called concentration index appears to be applicable [30]. Inspired by the concentration index, the proposed weighted orientation augmentation method takes advantage of both methods, functioning as a novel way to extract the converging area while preserving the angle information.
The proposed weighted orientation augmentation method is based on the original convergence map. Let K′ represent the modified version of K in the original definition of convergence map, which is defined as
| 5b |
| 7 |
where denotes the angle between the orientation at point q and , and denotes the architecture enhancement magnitude at point q. The major difference between this definition and the original one is that the value of K′ defined here takes the angle and the magnitude into account. Based on the definition of K′, we define C indicating the weighted orientation at a specific point c as follows:
| 8 |
The symbol used in Eq. (9) has the same definition as in the definition of S. The weighted orientation C is computed for every pixel on the orientation map to obtain the weighted orientation map. Since a typical AD usually has a radiating pattern and the radiating lines have higher pixel intensity in the architecture enhancement map, weighting the convergence with the magnitude appears to be reasonable. Therefore, the weighted orientation C can be beneficial in locating ADs more accurately than the original convergence S.
The straightforward calculation of the original convergence map pixel by pixel has complexity of as described in Palma et al. [16], where denotes the number of pixels in an image, and R denotes the outer radius of the annular ring. Practically, this requires over 1 h per image and is intolerable for a large number of images. The faster implementation of the convergence map proposed by Palma et al. [31] was thus applied. Though the definition of the random variable K has been changed, the complexity of the algorithm remains . In this study, the architecture enhancement map underwent 30 times down-sampling, and the calculation time was further reduced to approximately 1 s per image. To prevent the weighted orientation augmentation from being interfered by noise, we perform preprocessing to remove the background before calculating the weighted orientation map. The preprocessing consists of two stages. In the first stage, Otsu’s [44] method is used to primarily segment breast regions with a threshold, restricting the following calculation to those regions. Subsequently, histogram equalization is applied to find the threshold and further extract suspicious AD regions. It is noteworthy that straightforward thresholding at a predetermined pixel value may lead to unstable results due to the variation of pixel value distribution among different images. Thus, histogram equalization was implemented to preserve useful information, ensuring a more stable result in the subsequent thresholding process. The 90th percentile of the equalized pixel value histogram was used as the threshold, which was determined empirically. Overall, the thresholding method preliminarily removes areas with lower pixel values in the background, aiding subsequent procedures to focus on suspicious AD regions.
After applying the abovementioned methods including Otsu’s method, histogram equalization, threshold determination, and weighted orientation calculation, we finally produce the weighted orientation map. An illustration of the effect of weighted and unweighted maps is shown in Fig. 4, which highlights the effectiveness of the proposed weighted orientation map with thresholding in extracting potential AD locations.
Fig. 4.
Illustrations of the difference between a original convergence map, b weighted orientation map, and c the proposed weighted orientation map with thresholding. The red bounding boxes indicate the AD ground truths in each image. Compared to the original method, our proposed weighted orientation map effectively reduces background noise, leading to a more efficient identification of potential AD locations
The overall flowchart of structure-focused weighted convergence preprocessing is illustrated in Fig. 5. The original image is first processed through architecture enhancement to generate a magnitude map and an orientation field. Subsequently, our proposed weighted orientation augmentation is applied to the orientation field and magnitude map in the architecture enhancement map, producing a map that indicates possible locations of ADs. It is evident that the locations of AD ground truths are effectively emphasized in the weighted orientation map from Fig. 5. Finally, the original image, the architecture enhancement map, and the weighted orientation map are combined into a three-channel structure-focused weighted convergence map.
Fig. 5.
Examples of an original image, an architecture enhancement map, and a weighted orientation map. The texture surrounding the AD area in the architecture enhancement map shows radiating pattern. The weighted orientation map provides an accurate result in detecting ADs showing a high response in the center of the AD and low response in the background
Structure Fusion Attention Model
The structure-focused weighted orientation map, based on architecture enhancement and weighted orientation map, can effectively extract typical AD features by detection of spiculated lesion. However, ADs without radiating pattern were not encompassed by the weighted orientation framework we designed. In order to handle these atypical patterns and reduce false positives, we proposed the SFAM to further enhance the overall sensitivity of AD detection. In this approach, the output of the structure-focused weighted convergence preprocessing serves as the input for the model. This method helps highlight suspicious locations extracted locally in the first stage. Subsequently, the local information from the first stage and the global information of the original image are integrated into the final prediction. The details of the structure fusion attention model will be elaborated in this section.
Swin Transformer Backbone
In order to detect ADs, we chose Swin Transformer as the backbone of our structure fusion attention model. The Swin Transformer proposed by Liu et al. [36] brought the concept of shifted-window attention into the Vision Transformer, aiming to address limitations of the traditional transformer architecture when applied to vision tasks. The architecture in our model consists of four consecutive structure fusion attention modules incorporating the Swin Transformer blocks. The Swin Transformer block is based on the self-attention computation, which can be described as
| 9 |
| 10 |
where Q, K, and V refer respectively to the “query,” “key,” and “value” components obtained from the input sequence of the transformer; B is the relative position bias; Wq, Wk, and Wv are the weight parameters; and d is the dimension of Q, K, and V.
The Swin Transformer block consists of multiple head self-attention (MSA) with a shifted-window mechanism, followed by a two-layer multilayer perceptron (MLP) in between. LayerNorm (LN) is applied before each computation in parallel with a residual connection for each module. Assuming and to be the output features of the multi-head self-attention module and the MLP module for block m, the Swin Transformer blocks are computed as follows:
| 11 |
| 12 |
| 13 |
| 14 |
where W-MSA and SW-MSA represent the regular and shifted-window configuration of the multiple head self-attention.
The shifted-window partitioning method is designed to achieve inter-window relations while preserving the efficient computation of non-overlapping windows, containing the first module using a regular window partitioning strategy, and a windowing configuration shifted from the preceding layer. This method serves as an optimal way to connect patches to other patches that were originally in different windows, thus extracting cross-window spatial relationships and reducing the computational cost compared to the standard self-attention mechanism. Moreover, by hierarchically processing the image at different scales, the Swin Transformer can effectively capture both local and global spatial dependencies.
Structure Fusion Attention Mechanism
While Swin Transformer achieves good performance, there is still room for improvement in our task to detect ADs. In our preprocessed structure-focused weighted orientation map, the input of the three channels is composed of different components, representing the original image, the architecture enhancement map, and the weighted orientation map, respectively. In contrast, the three channels in the original design of the Swin Transformer do not correspond to these three features. Therefore, we proposed the structure fusion attention model incorporating the spatial attention into Swin Transformer to fuse the features from different channels. The spatial attention mechanism that we adopted was inspired by Woo et al. [37], where they proposed the spatial attention block to fuse channels and capture the interdependence of spatial locations within feature maps, incorporating the attention mechanism into convolutional neural network (CNN). We combined this spatial attention block with the Swin Transformer block in parallel, using the spatial attention block to fuse and capture the relative relationship information from the features of the original image, the architecture enhancement map, and the weighted orientation map. The original image with mammary gland and duct features, the architecture enhancement map with textures and contrasts accentuated, and the weighted orientation map with suspicious AD lesions manifested are intermingled into the spatial attention map. The fused features from the spatial attention block are then combined with the features extracted from the Swin Transformer block to obtain the final output of the structure fusion attention module, which represents the features extracted from the preprocessed images in the three channels. The overall architecture of the structure fusion attention module is illustrated in Fig. 6.
Fig. 6.
Overall architecture of the proposed structure fusion attention module. The structure fusion attention module consists of global pooling, convolution, and sigmoid function. The 2D attention map is multiplied back to the feature map. The structure fusion attention feature is then added back to the feature map obtained from the Swin Transformer block to obtain the output feature
The spatial attention block contains a global max pooling and an average pooling operation applying to the feature maps, reducing the spatial dimensions of the input feature map F from C × H × W to two 2D maps, Fmax and Favg, with a dimension of 1 × H × W. After concatenation, the resulting features are passed through a convolution layer and a sigmoid function, generating a 2D spatial attention map. In brief, the spatial attention field (.) is computed as
| 15 |
where (.) represents the sigmoid function defined in a study of Han and Moraga [45], denotes the convolution operation, and Avg and Max denote the average-pooling and max-pooling operation, respectively. After the computation of the spatial attention field, we can obtain a map that fuses information from all three channels. By selectively emphasizing informative spatial locations and suppressing irrelevant channels, it can help the model focus on more AD-related features and further improve the performance of the Swin Transformer model.
Given an input feature for stage n, the overall attention mechanism encompassing the Swin Transformer block and the spatial attention block for each stage can be described as follows:
| 16 |
| 17 |
| 18 |
where and denote the feature map computed by the spatial attention block and the Swin Transformer block, respectively; Swin represents a Swin Transformer stage with [2, 6] blocks in each stage; ⊗ denotes element-wise multiplications; and ⊕ represents element-wise additions. Considering that Swin Transformer has been shown to outperform the CNN model in several different tasks and that Swin Transformer also intends to capture the local and global dependencies as does the spatial attention module, combining the spatial attention with shifted-window attention appears to be a novel possibility. In each stage, a reweighted feature map obtained from the spatial attention block is added back to the original feature map with element-wise sum so that the cross-channel structure fusion features are preserved and proceeded to the next stage. The features are then extracted into the feature pyramid network of Faster R-CNN architecture. Overall, the spatial attention blocks allow the model to intermingle the features from different channels and learn the features across the spatial dimension, further boosting the model performance.
Dual Inclusive NMS Combination
The structure fusion attention model can detect abnormalities based on the original image, where the whole image with mammary glands and ducts are preserved; the model can also identify subtle structural distortions in the structure-focused map, where the architectures and orientations are manifested. To combine valuable information from both images, the output bounding boxes from both the original image and structure-focused map are combined by the dual inclusive NMS combination. This process is analogous to how clinicians work in practical scenarios, where clinicians do not solely rely on the image itself but also consider other relevant information, such as adjusting brightness, contrast, window width, and window level, when detecting ADs. The dual inclusive NMS combination consists of two stages: The predictions from both images are pruned first by the Soft-NMS separately to preserve more potential detections, and then merged by the weighted boxes fusion to provide more accurate predictions. The details of the two stages are described in this section.
Soft-NMS [46] with an intersection over union (IoU) threshold of 0.1 is used to prune the predicted bounding boxes in the first stage. The utilization of Soft-NMS instead of NMS is because NMS directly discards less probable prediction bounding boxes by setting the confidence score of them to zero. In contrast, Soft-NMS retains these boxes by reducing their confidence scores, which preserve all boxes without overlooking any suspicious predictions just like a mammography expert. The procedure of Soft-NMS is briefly described as follows. For a list of bounding boxes (B) and scores (S) in decreasing order, the Soft-NMS starts by selecting the detection with the maximum score (M), appends it to the final detection (D), and rescores the nearby bounding box (bi) with the formula
| 19 |
where si represents the score of the nearby bounding box (bi), and σ represents the hyperparameter of the Gaussian penalty function. Since the confidence scores are set based on a continuous Gaussian function, no bounding boxes are discarded. By preserving more predictions in both the original image and the structure-focused image in the first stage by applying this method, our dual inclusive NMS combination is able to maximize the achievable sensitivity and enhance the overall performance.
In the second stage, the predictions from both the original images and the structure-focused map are aggregated. The aggregation of the predictions is based on the weighted boxes fusion [47]. Weighted boxes fusion (WBF) takes into account the confidence score as an additional weight factor for each bounding box [47]. The idea is to give more weight to boxes that are more likely to be correct detections and less weight to those that are less reliable, and to fuse the bounding boxes with coordinates based on those weights, thus obtaining the final prediction boxes without losing information. The weighted boxes fusion starts by selecting the detection with the maximum score (M), appends it to the final detection (D), and incorporates N nearby bounding boxes (bi) with an IoU over a threshold (Nt) with the formula
| 20 |
| 21 |
| 22 |
where x1,2 and y1,2 represent the coordinates, and s denotes the confidence score of the fused bounding box. By applying the weighted sums of the coordinates of the box cluster, though some boxes are discarded, their information is merged into the fused box. Moreover, this allows for combination of the predictions from both the original image and the structure-focused image while reducing false positives in a decent and succinct manner.
Overall, the dual inclusive NMS combination includes Soft-NMS which encompasses more possible locations of ADs in the first stage, and weighted boxes fusion which fuse the predictions in the second stage. This approach maximizes the preservation of suspicious AD locations and makes a final decision in different image types as do radiologists, which helps improve the sensitivity of AD detection. Moreover, the results are merged in the second stage to reduce false positives. Therefore, our dual inclusive NMS combination can achieve optimal model performance.
Results
Training Environment and Experimentation Setting
All the preprocessing and postprocessing are programmed with Python 3.10.8. The deep learning model was implemented with PyTorch 1.13.1 under CUDA 11.8. The developed environment was Linux Ubuntu 22.04 LTS operating system with an Intel® Core™ i9-9820X CPU and 64 GB RAM. All programs are executed on NVIDIA GeForce RTX 2070 SUPER with 8 GB VRAM. The training process operates entirely on the framework of MMDetection. MMDetection is an open-source object detection toolbox built upon PyTorch and a prototype for experiments with various object detection algorithms. We chose to utilize MMDetection for training because it provides standardized modules of object detection algorithms, such as backbones, neck structures, and different detectors, enabling fair and unbiased comparisons.
We trained the deep learning model with an Adam optimizer. The batch size was set to 1. The initial learning rate was set to 0.0001. The learning rate scheduler reduces the learning rate at epoch 8 and epoch 11 with a decay rate of 0.1. The AD samples are scarce compared to the background in CBIS-DDSM mammograms, and thus, the foreground–background imbalanced problem needs to be addressed. A random sampler sampling 256 per epoch with positive fraction of 0.5 is used during the training stage in order to handle the imbalanced classes of ADs and background. The training process stopped at epoch 50, and the model with the best performance on the validation dataset was selected and evaluated on the test set. The details of the experimentation settings are shown in Table 4. For all stages of training, we employed identical training hyperparameters. To address overfitting, we applied data augmentations such as resizing, center cropping, and normalization. The input sizes are all reduced to 1024 × 1024 pixels.
Table 4.
List of hyperparameters used in training
| Stage | Hyperparameter | Value |
|---|---|---|
| AdamW optimizer | Initial learning rate | 0.0001 |
| Weight decay | 0.05 | |
| Step learning rate scheduler | Decay step | 8,11 |
| Decay rate | 0.1 | |
| Warm-up iterations | 500 | |
| Warm-up ratio | 0.001 | |
| Training | Batch size | 1 |
| Number of epochs | 50 |
Evaluation Method
The testing dataset includes 26 images of AD which was previously split from the 150 AD images from the CBIS-DDSM dataset in an approximately 4:1:1 ratio, combining with 32 normal images which were randomly selected from the DDSM dataset.
The traditional evaluation method was usually based on IoU, which is a strict standard for detection. However, in a practical scenario of detection of ADs, accurate locations of ADs are often of little interest as stated in a study of Ben-Ari et al. [21]. Radiologists care more about whether the ground truth AD is detected or not. Though IoU is useful to evaluate the performance on natural images, the symmetric overlap ratio proposed by Ben-Ari et al. [21] is more appropriate to evaluate the model performance on medical images. The symmetric overlap ratio described in a study of Ben-Ari et al. [21] is defined as
| 23 |
where g denotes the ground truth annotation, and denotes the prediction bounding boxes. Taking the maximum value of the intersection over either the ground truth or the prediction is a method much suitable to the clinical settings. From a medical perspective, especially in detecting small lesions like ADs, a prediction can be considered as a successful detection as long as it is slightly in contact with the location of a lesion. Based on the definition of , we can determine whether a prediction is a true positive (TP), true negative (TN), or an FP, aiding in further evaluating the model performance. Since we add normal images into testing on CBIS-DDSM datasets, the false positives per image can be categorized into FPD and FPT, which are defined as
| 24 |
| 25 |
which are also defined in a study of Ben-Ari et al. [21]. The predicted lesions with is equivalent to FPs, where is the threshold of the symmetric overlap ratio we selected. The FPD defined here represents the calculation of the average false positives per image in AD images only, while the FPT represents the calculation of the average false positives per image in both AD and normal images, which is commonly used in others’ works in detecting ADs.
In addition to false positives, we also use precision and recall to evaluate the model performance. The precision and recall are defined as
| 26 |
| 27 |
where TP is the number of true positives, FP is the number of false positives, and FN is the number of false negatives on all images including AD and normal images. Based on the definition of precision and recall, we can derive the average precision (AP) and average recall (AR) as a more accurate and appropriate measure to evaluate the performance. The AP is defined as
| 28 |
| 29 |
The precision-recall curve is sampled at every distinct recall value (r1, r2, …) whenever the precision value decreases to obtain AP. The is defined as the maximum precision achieved for any recall value greater than or equal to . While the average precision is defined on the precision-recall curve, the average recall is the recall averaged over the recall-IoU curve. By shifting the IoU threshold to different values, the recall will change accordingly, generating a recall-IoU curve. The AR is thereby defined in a study of Hosang et al. [48] as
| 30 |
where o represents the IoU threshold and Recall(o) is the Recall at the corresponding IoU threshold for IoU ∈ [0.5, 1.0].
In our study, we select as the threshold of the symmetric overlap ratio. Based on this cutoff, we define AP0.5, AR0.5, and Recall0.5 indicating the AP, AR, and recall of the model on a symmetric overlap ratio threshold of 0.5, respectively. Additionally, we use SensitivityTop to represent the top sensitivity, which refers to the highest sensitivity that the model can achieve by shifting the threshold of the confidence score lower. Moreover, when shifting the threshold of the confidence score higher, the false positives will be lower and the sensitivity will also decrease. We select FPD and FPT at either sensitivity of 80% or at the top sensitivity that the model can achieve to evaluate the performance of our model.
Weighted Orientation Augmentation
Our proposed structure-focused weighted convergence preprocessing highlights suspicious AD locations during preprocessing. To evaluate the performance of the proposed structure-focused weighted convergence preprocessing, our weighted orientation method is compared with the original convergence map, and the experiment is based on the native Swin Transformer. Table 5 shows the comparison among different approaches, which is further divided into two subtables, comparing the overall performance of original convergence map and weighted orientation map without and with thresholding in the upper and the lower part of the table, respectively.
Table 5.
Performance of convergence map with or without weighted orientation and thresholding
| Preprocessing method | AP0.5↑ | AR0.5↑ | Recall0.5↑ | FPD80%↓ | FPT80%↓ | FPDTop↓ | FPTTop↓ | SensitivityTop↑ |
|---|---|---|---|---|---|---|---|---|
| Convergence map | 0.5068 | 0.4467 | 0.6538 | 1.4615 | 0.9138 | 1.6538 | 1.0517 | 0.8077 |
| Weighted orientation map | 0.6308* | 0.6402* | 0.8077* | 0.3462* | 0.2069* | 0.9231 | 0.5862 | 0.8462 |
| Convergence map + thresholding | 0.6162 | 0.5518 | 0.7692 | 0.4231 | 0.3276 | 0.8077 | 0.6034 | 0.8462 |
| Weighted orientation map + thresholding (proposed method) | 0.4716 | 0.5264 | 0.7308 | 1.1154 | 0.7414 | 1.6154 | 1.1552 | 0.8846* |
FPD80% and FPT80% indicate the number of FPD and FPT at 80% sensitivity, respectively, while FPDTop and FPTTop indicate the number of FPD and FPT at top sensitivity. Italic numbers indicate the best in each group, while numbers with an asterisk on the right indicate the best among the four methods
To begin with, we first compared the original convergence map and the proposed weighted orientation map without thresholding in the upper part of Table 5. Compared to the original convergence map, the weighted orientation map has only 0.3462 FPD and 0.2069 FPT at 0.8 sensitivity, and the top sensitivity reaches 0.8462 with 0.9231 FPD and 0.5862 FPT, which is a significant improvement in lowering false positives and enhancing top sensitivity. Moreover, the AP, the AR, and the Recall0.5 of the weighted orientation map all perform better than those of the original convergence map. The result indicates that our proposed weighted orientation map performs better than the original convergence map in almost all evaluation criteria.
Due to the presence of noise in the background, we incorporated thresholding into our weighted orientation method, which is compared to the original convergence map with thresholding, and the results are shown in the lower part of Table 5. Combining thresholding and the weighted orientation method leads to a higher FPs than applying the convergence map with thresholding, with 1.1154 FPD and 0.7414 FPT at 0.8 sensitivity by weighted orientation map compared to 0.4231 FPD and 0.3276 FPT at 0.8 sensitivity by convergence map, respectively. Moreover, the AP, the AR, and the Recall0.5 of the weighted orientation map with thresholding all appear to be lower than those of the convergence map with thresholding. However, the weighted orientation map with thresholding reaches higher top sensitivity compared to the convergence map with thresholding. Similar results can be seen in the comparison between the weighted orientation map with thresholding and that without thresholding. The weighted orientation map with thresholding method presents with higher FPs and lower AP0.5, AR0.5, and Recall0.5 than the weighted orientation map alone. Though the weighted orientation map with thresholding method appears to have worse performance in the abovementioned indicators, it reaches higher top sensitivity compared to its counterpart without thresholding. Overall, the results indicate that our proposed weighted orientation map with thresholding performs better in top sensitivity but worse in other criteria than applying the weighted orientation map or thresholding alone.
Overall, we have two brief conclusions. First, the weighted orientation method alone not only presents with higher top sensitivity, but also reduces the false positives. Second, though combining thresholding and the weighted orientation method leads to slightly higher FPs than applying these methods alone, it can reach higher top sensitivity. The free-response receiver operating characteristic (FROC) curve in Fig. 7 clearly demonstrates the abovementioned observations. Both the weighted orientation map (green line) and convergence map with thresholding (yellow line) methods have lower FPs than combining the two methods (red line), while combining the two methods (red line) leads to a significant raise in top sensitivity. This may be due to the fact that thresholding and weighted methods both help the model to focus on some suspicious regions. Though these regions may be non-overlapping at times, leading to a higher FPs, combining the two methods compensates for the missed regions by enhancing the performance of the model on lesions easier to detect, therefore boosting the performance in top sensitivity. Additionally, this could potentially be attributed to issues with the model itself rather than the preprocessing method. If we adjust our model, the issue of elevated false positive rate could be resolved. We will demonstrate the result in the section “Structure Fusion Attention Model.” Moreover, in clinical practice, clinicians are more concerned about false negatives than false positives, meaning that we put emphasis on sensitivity more. Therefore, the proposed weighted orientation method with thresholding that reaches top sensitivity of 0.8846, which is higher than other methods, is more beneficial and valuable to clinicians and, thus, is selected as our proposed preprocessing method.
Fig. 7.

The FROC curve of different preprocessing methods. The FROC curve of the weighted orientation with thresholding, which is in red, reaches the highest top sensitivity. FPT represents the calculation of the average false positives per image in both AD and normal images
Structure Fusion Attention Model
In the section “Weighted Orientation Augmentation,” given that the experiments mentioned above have demonstrated the effectiveness of different preprocessing maps, we selected the map that achieves the highest sensitivity, namely weighted orientation map with thresholding. However, the native Swin Transformer cannot effectively address the issue of excessive false positives in AD detection. Therefore, we proposed a novel model backbone, namely structure fusion attention model. Our proposed model excels other model backbones at locating ADs within weighted orientation map with thresholding. Our study includes several popular models, VGG19 [49], ResNet [50], ResNest [51], and MobileNetV2 [52], to compared them with our proposed structure fusion attention model. Moreover, the Transformer-based models, such as the Pyramid ViT [53] and the Swin Transformer model, also serve as adequate models for comparison, while our proposed SFAM incorporates the concept of attention mechanism as did those Transformer-based models. The abovementioned models are all well-known models that have been widely used in medical image object detection [54, 55].
We trained the model on the structure-focused images which are enhanced based on the weighted and thresholding method. The structure-focused map as well as the original image are inputted into the models after the structure-focused weighted convergence preprocessing. The results of the comparison between our proposed SFAM and other models are shown in Table 6. We observed that the top sensitivity of the MobileNet model is the lowest among all with much more false positives than other models. We also observed that the VGG19 model reaches the highest top sensitivity; however, it also results in a significant number of false positives. The FPTTop reaches 13.5862, meaning that for every image, there are about 13 more FPs other than the ground truth detection, which is confusing for clinicians to determine possible locations of ADs. As shown in Fig. 8, the black line represents the FROC curve of the VGG19 model, manifesting much more FPT than other models. Therefore, even though it achieves high sensitivity, the excessive false positives diminish its practical utility.
Table 6.
Comparison of the performance of different detection models
| Model | Parameters | GFLOP | Precision/recall | False positives/top sensitivity | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| AP0.5↑ | AR0.5↑ | Recall0.5↑ | FPD80%↓ | FPT80%↓ | FPDTop↓ | FPTTop↓ | Sens.Top↑ | |||
| VGG19 | 37.424 M | 533.955 | 0.0804 | 0.3609 | 0.5385 | 9.1538 | 9.4138 | 13.3846 | 13.5862 | 0.9231 |
| MobileNetV2 | 19.20 M | 358.028 | 0.0812 | 0.2297 | 0.3462 | – | – | 17.3077 | 12.9483 | 0.5385 |
| ResNet50 | 41.348 M | 211.281 | 0.4782 | 0.5087 | 0.6538 | – | – | 1.2692 | 1.1724 | 0.6538 |
| ResNet101 | 60.34 M | 289.179 | 0.4996 | 0.6231 | 0.8077 | 3.0000 | 1.6552 | 3.0000 | 1.6552 | 0.8077 |
| ResNeSt101 | 68.559 M | 454.332 | 0.5509 | 0.5514 | 0.7692 | 1.1154 | 0.7414 | 1.5000 | 1.2241 | 0.8462 |
| Pyramid ViT | 29.833 M | 151.62 | 0.3105 | 0.5191 | 0.7692 | 1.6154 | 1.2241 | 2.1923 | 1.6724 | 0.8846 |
| Swin Transformer | 44.746 M | 215.528 | 0.4716 | 0.5264 | 0.7308 | 1.1154 | 0.7414 | 1.6154 | 1.1552 | 0.8846 |
| SFAM (proposed method) | 44.747 M | 215.545 | 0.5998 | 0.7137 | 0.8846 | 0.7692 | 0.5690 | 1.4231 | 1.0862 | 0.9231 |
Italic numbers indicate the best in each group
FPD80% and FPT80% indicate the number of FPD and FPT at 80% sensitivity, respectively, while FPDTop and FPTTop indicate the number of FPD and FPT at top sensitivity. The Swin Transformer and Pyramid ViT employ tiny patches, resulting in a comparable number of parameters with other models. Note that the FPD80% and FPT80% of ResNet50 do not exist due to the top sensitivity of ResNet50 not reaching 80%
Fig. 8.

The FROC curve of different model backbones. To obtain a clean visualization of the graph, the size of the x-axis on the right end has been fixed. As a result, the curves that extend beyond the chart area, which are the curves of VGG19, MobileNet, ResNet101, and Pyramid ViT, are not displayed, and the curves that extend beyond the chart and reach higher sensitivity are marked by an asterisk on the right. Though the top sensitivity of the VGG19 reaches 0.9231, the FPT of the VGG19 surpasses that of the other models at the same sensitivity level, as shown in the black line. The proposed SFAM, which is also a transformer-based model, performs the best among all, reaching 0.9231 top sensitivity while preserving the lowest FPs
We also observed that the Transformer-based models, namely the Pyramid ViT and the Swin Transformer models, have better performance in FPs and top sensitivity. However, these models still provide too many false positives, and this may be troublesome in a practical scenario. The best result is obtained by our proposed structure fusion attention model. Our method achieves top sensitivity of 0.9231 with 1.4231 FPD and 1.0862 FPT, and the FPD and FPT at sensitivity of 0.8 are also significantly reduced to 0.7692 and 0.5690, respectively. The results also indicate that the proposed structure fusion attention network performs better than other novel models not only in FPs and top sensitivity, but also in AP0.5, AR0.5, and Recall0.5. Therefore, we believe that our method has the best ability to extract possible AD locations from the structure-focused maps.
A more evident comparison of the performance of different models is shown in Fig. 8 as the FROC curve demonstrating the relative relationship of the sensitivity and false positives. We hypothesize that our proposed SFAM network performs better than other models due to the fact that the structure fusion attention model incorporates the spatial attention mechanism into the transformer-based architecture, allowing the model to capture and intermingle the relative relationship information from different channels, leading to a more comprehensive extraction of features from the structure-focused map. High sensitivity is particularly applicable in clinical practice, as our model can help clinicians to avoid missing any potential AD lesions. Additionally, the low number of false positives reduces the burden on clinicians by minimizing interference from non-ADs in their judgments. Therefore, our method is expected to be highly beneficial to clinical settings.
Ablation Study
We have demonstrated that the performance of our proposed method composed of preprocessing utilizing weighted convergence map with thresholding and structure fusion attention model to train the model. To determine whether our proposed method really works, an ablation study pertaining to the usage of structure-focused convergence preprocessing and the proposed structure fusion attention model was implemented. The results of the ablation study are shown in Table 7, while the FROC curves with different components added are shown in Fig. 9. The detection method based on the Swin Transformer model was selected as the baseline. Different training pipelines were compared within the ablation study. The four elements, namely architecture enhancement, convergence map, weighted orientation, and the SFAM, were introduced sequentially. Initially, architecture enhancement maps were added into the training process with original images. Subsequently, we further added convergence maps into training process. We trained the model with a map that has three channels. When appending convergence maps, we placed the original image in the first channel, the architecture-enhanced image in the second channel, and the convergence map in the third channel. It is important to note that when adding the weighted orientation method, we directly replaced the convergence map with the weighted orientation map in the third channel, since our proposed weighted orientation is based on the original convergence method.
Table 7.
The results of the ablation study on our proposed methods
| Original image | Structure-focused map | Model | Evaluation metrics | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Architecture enhancement | Convergence | Weighted orientation | SFAM | FPD80%↓ | FPT80%↓ | FPDTop↓ | FPTTop↓ | SensitivityTop↑ | |||
| ✓ | ✘ | ✘ | ✘ | ✘ | – | – | 1 | 0.6034 | 0.7308 | ||
| ✓ | ✓ | ✘ | ✘ | ✘ | 1.6923 | 0.7586 | 1.6923 | 0.7586 | 0.8077 | ||
| ✓ | ✓ | ✓ | ✘ | ✘ | 1.4615 | 0.9138 | 1.6538 | 1.0517 | 0.8077 | ||
| ✓ | ✓ | ✓ | ✓ | ✘ | 1.1154 | 0.7414 | 1.6154 | 1.1552 | 0.8846 | ||
| ✓ | ✓ | ✓ | ✓ | ✓ | 0.7692 | 0.5690 | 1.4231 | 1.0862 | 0.9231 | ||
Italic numbers indicate the best in each group
The four components, architecture enhancement, convergence map, weighted orientation, and the SFAM, are added successively
Fig. 9.

FROC curve of all methods included in the ablation study. The results demonstrate that the addition of the four components, architecture enhancement, convergence map, weighted orientation, and the SFAM, incrementally elevates the model performance. The FROC curve of the proposed method which incorporates all components perspicuously has the best performance in top sensitivity and false positives per image
In the first part of the comparison, we add the architecture enhancement map into training. The results show that the model trained on both the original image and the architecture enhancement map performs better in not only top sensitivity but also the FPD and FPT at 0.8 sensitivity. Though FPD and FPT at top sensitivity appear to be higher, the direct comparison of these indicators among different methods is meaningless since the top sensitivity is completely on a different baseline. Moreover, adding the pre-computed maps encompassing the architecture enhancement maps and the convergence maps further elevates the model performance compared to the model trained on the architecture enhancement maps. This result is intuitive since the convergence preprocessing method is capable of accentuating radiating patterns. Furthermore, the weighted orientation map, replacing the original convergence map, and the proposed structure fusion attention model are added successively, further reducing the number of FPs and increasing the top sensitivity compared to the baseline method. When they were used simultaneously, the FPT at 0.8 sensitivity per image was reduced to 0.5690 and the top sensitivity was elevated to 0.9231, which was a significant improvement from the baseline method. Overall, the ablation study clearly demonstrates that removing the components of the structure-focused map and SFAM leads to higher FPs and lower top sensitivity, while introducing both methods can significantly boost the model performance. Therefore, we conclude that both the structure-focused map and SFAM components contribute to the elevation in model performance.
Comparison with Previous Works
To compare our model with models proposed in previous works, we train another model on the DDSM dataset with the same image preprocessing and postprocessing methods. We applied the structure-focused weighted convergence preprocessing as well as the SFAM network to train our model and to test the model performance on the DDSM dataset. In order to compare with the previous studies in a fair way, we conducted an experiment that included normal images into the training process. The results and the comparison with previous works are shown in Table 8. The best result in terms of FPT was recorded by Ben-Ari et al. [21], with 0.88 FPD and 0.46 FPT at 0.83 sensitivity, while our results on the DDSM dataset reach the best 0.85 sensitivity with 0.83 FPD and 0.92 FPT. Though the exact data distribution may differ between our experiment and their experiments, the result is still noteworthy that the performance of our model is much better in terms of top sensitivity as well as FPD, while FPT slightly elevates. Since the sensitivity is different between our results and the results from Ben-Ari et al. [21], the direct comparison of false positives is meaningless. There are still some noteworthy observations. First of all, our results have higher FPT than FPD while the results from Ben-Ari et al. [21] are quite the opposite, indicating that our model detects ADs on images with ADs better than their previous model, but performs slightly worse on normal images. We attribute this result to the difference in training method between our and Ben-Ari et al.’s method. Our method intends to detect as many ADs as possible, applying the weighted convergence map to locate the region of interest initially. While this may increase the FPT for difficulty differentiating normal images from AD images, our model has the ability to achieve higher sensitivity of detection in AD images, which can be ascribed to the proposed structure-focused weighted orientation preprocessing highlighting suspected AD lesions. Higher sensitivity is much more important for clinicians as they do not want to misdiagnose any potential breast cancers. Therefore, though our method slightly increases FPT, it is still a more clinically applicable way to enhance the model performance on top sensitivity.
Table 8.
Comparison of our proposed methods with previous studies
| Methods | Performance | Test sets | Sources | |||
|---|---|---|---|---|---|---|
| Sensitivity↑ | FPD↓ | FPT↓ | ||||
| Proposed method | Structure-focused map + native Swin Transformer | 0.88 | 1.62 | 1.16 | 26 AD & 32 N | CBIS-DDSM |
| Structure-focused map + SFAM | 0.92* | 1.42 | 1.08 | 26 AD & 32 N | CBIS-DDSM | |
| Proposed method | Structure-focused map + SFAM | 0.85 | 0.83 | 0.92 | 40 AD & 32 N | DDSM |
| Yoshikawa et al. [30] | Adaptive Gabor filter | 0.83 | – | 1.0 | 40 AD & 160 N | DDSM |
| Ben-Ari et al. [21] | Domain-specific R-CNN | 0.83 | 0.88 | 0.46 | 52 AD & 84 N | DDSM |
| Matsubara [29] | Direction analysis of linear structure | 0.81 | – | 2.6 | 168 AD & 580 N | Private |
| Rangayyan et al. [19] | Gabor filter and orientation analysis | 0.8 | – | 3.7 | 106 AD & 52 N | Private |
Our proposed method reaches the highest sensitivity and the lowest FPD. Best results in the DDSM dataset are in italic. The number with an asterisk on the right indicates the best among all
Discussion
When visualizing the results of our model, we found out that our proposed method can detect ADs in mammograms with higher sensitivity, especially those with abundant fibroglandular tissue. An illustration is shown in Fig. 10. The baseline method is the Swin Transformer model trained only on the original images, while the proposed method is the structure fusion attention model trained on both the structure-focused maps and the original images. The comparison between the detection results of the two methods clearly demonstrates that our proposed method has the ability to detect ADs that are difficult to detect by the baseline method. The original method cannot detect ADs in examples 3 and 4 in Fig. 10 while our proposed method can detect ground truth AD with only an FP in each image. Moreover, example 4 in Fig. 10 has abundant fibroglandular tissue. Detecting ADs in it with the original method or even by an experienced radiologist can be a challenging task. In contrast, our proposed method still has the ability to detect AD in backgrounds filled with fibroglandular tissue.
Fig. 10.
Four examples of AD that were not identified by the baseline model but were successfully detected by our proposed model are illustrated. The blue boxes denote the regions marked as ground truth by a radiologist, while the red boxes represent the predictions by our proposed model. The subtlety rating, assigned by a mammographer interpreting the CBIS-DDSM dataset, ranges from 0 to 5, where 1 is subtle and 5 means obvious. The subtlety of example 1 and example 2 is 4 and 2, respectively. Both the subtleties of examples 3 and 4 are 1, indicating the difficulties of differentiating the AD from the background. Noted that the density of example 4 is extremely high, making detection particularly challenging. Our proposed model can detect AD not only on mammograms where the background and lesions are easily distinguishable such as examples 1 and 2 but also on mammograms that are more blurred and difficult to differentiate such as example 3 and even example 4
In young females, the populations that necessitate AD detection on mammography the most, breast tissue is denser and contains more glandular and fibrous tissue, appearing white on mammograms. This density can make it harder for radiologists to detect lesions, as they may blend with the surrounding tissue. Even an experienced radiologist may overlook the abnormal lesions at times, leading to underdiagnosis of breast cancer in clinical situation. Since our model can detect AD in dense tissue, it may be beneficial to clinical settings by proposing suspected AD locations, reducing the fatigue and burden of radiologists, and aiding radiologists to discover possible breast cancer in the early stage.
We have discussed the structure-focused weighted orientation preprocessing in the section “Structure Fusion Attention Model”. After the preprocessing, the three-channel enhanced images are inputted into the SFAM. The proposed method has demonstrated improved sensitivity in detecting ADs. Preliminarily, we attribute this improvement to the structure-focused map, which highlights suspicious AD locations, and the SFAM, which fuses the features from three channels and captures spatial interdependence from different locations. By visualizing the predictions from both the original images and the structure-focused maps, we observe that the preprocessing method definitely aids the model to discern ADs that are not detected in the original images. An illustration is shown in Fig. 11, where the AD is not detected on the original image, but detected on the structure-focused map. For instance, in example 1, the model did not detect ADs on the original image, but the weighted orientation preprocessing proposes potential AD locations so that the model can correctly identify the positions using proposed suspicious locations on the weighted orientation image. In example 2, the model also fails to detect ADs on the original image, but when applied to the weighted orientation images, which highlights two to three possible locations, our model can then find out the correct AD location from these suspicious positions. Note that the bounding box proposed in example 2 is much larger than the ground truth, which may be due to the high density of the surrounding fibroglandular tissue. In example 3, the predictions from the original image are misaligned with the ground truth on the original image, but our model can accurately identify the ground truth location of ADs on the weighted orientation map. The most intriguing thing is that our model did not directly target the area with the highest intensity on the weighted orientation map. Instead, the proposed model comprehensively evaluates information from different feature maps to make the final predictions.
Fig. 11.
The effect of structure-focused weighted orientation preprocessing on the detection of ADs. The upper row includes the predictions on the original image, while the lower row includes the predictions on the structure-focused map. The blue boxes denote the regions marked as ground truth by a radiologist, while the red boxes represent the predictions by our proposed model. By enhancing the architecture and applying the weighted orientation augmentation, the possible locations of ADs are manifested, aiding the model to detect ADs more accurately
We believe that both the original image, with mammary glands and ducts preserved, and the structure-focused map, with the architectures and orientations manifested, have valuable information, which allows the model to be more flexible to detect ADs. Training the model with input from both original images and structure-focused maps reaches higher sensitivity, which can be beneficial and more valuable in clinical situations. By combining the prediction bounding boxes from both the original image and the structure-focused map, the model can detect more ADs, achieving better sensitivity; at the same time, the predictions from both inputs are combined through dual inclusive NMS combination, preserving low FPs. Therefore, the proposed detection framework takes advantage of the preprocessing method and the postprocessing method, leading to better performance than previous methods.
Most ADs have the characteristic radiating patterns. However, ADs may display focal retraction or distortion at the parenchyma’s edge, featuring fewer common characteristics. Detecting this type of ADs becomes challenging when relying solely on local information. Typically, radiologists seek more comprehensive global information from an image to identify these atypical ADs. Based on this prior knowledge, our model comprises of two parts. The first part focuses on detecting features from local information using traditional algorithms, aiming to identify all ADs with radiating patterns. However, not all ADs exhibit such typical patterns. Often, atypical ADs cannot be detected at this stage. Therefore, we employ deep learning in the subsequent stage to assist in detecting atypical ADs without radiating patterns. In this approach, the output of the first stage serves as the input for the second stage. The two-stage method is designed to compensate for each other in detecting typical and atypical ADs, aiding the integration of the local and global features to make the final prediction. As a result, the ground truth AD can be detected in the final stage.
To discover how our proposed model focuses on suspicious AD location, we visualize the feature map of our proposed model and some popular models for comparison. The feature maps of the penultimate layer are selected and converted into a heatmap for visualization. The results are shown in Fig. 12. We select one mammogram with more fatty tissue and one with more fibrous vascular tissue as representatives, which are example 1 and example 2 in Fig. 12, respectively. We find out that our model demonstrates a high level of attention towards the AD location, whereas other models have poorer ability to focus on the ground truth AD locations. Moreover, our model focuses on the breast tissue area without distractions, unlike other models that pay attention to the edges of the breast, nipples, or the labels. As shown in the feature map on the full mammogram in Fig. 12, the feature maps of ResNet50, Pyramid ViT, and Swin Transformer model manifest a conspicuous red area in trifling structures that are of little importance to radiologists. In contrast, our model puts little emphasis on labels, which implies that our model is capable of ignoring artifacts and capturing useful information. Furthermore, for the ROI in example 1, we observe that the AD deformation from top left to bottom right is more pronounced on the left side. Therefore, our model also highlights the left side of the region. As for the ROI in example 2, we notice that the AD generally has textures in all directions, while the textures are most prominent in the vertical and horizontal directions, so that our model’s feature map extracts an L-shaped feature. Contrary to other models that have limited ability in highlighting AD ROIs, our proposed model is capable of extracting characteristic features of ADs from mammograms, and thus outperforms other models in detecting ADs.
Fig. 12.
The feature maps obtained from the last layer of different models. Full images with ground truth annotations in blue bounding boxes are shown in the upper row, while the ROIs of the ground truth ADs are shown in the lower row. Example 1 represents an AD in the mammogram with abundant fatty tissue; example 2 represents another AD in the mammogram with denser fibrovascular tissue. The proposed method focuses on the ground truth location the best, accounting for the best model performance
Conclusion
In conclusion, our study proposed a new approach for AD detection on mammography using a combination of structure-focused weighted convergence preprocessing methods and a novel structure fusion attention model. The results show that the proposed structure-focused map applying the weighted orientation map with thresholding method can significantly reduce false positives without sacrificing top sensitivity. The proposed SFAM fuses the features of the original image, the architecture enhancement map, and the weighted orientation map, outperforming other models in terms of FPD and FPT and reaching the best top sensitivity of 0.9231. Generally, the sensitivity of detecting ADs for an average radiologists is typically around 80% [32]. In comparison, our model’s sensitivity in detecting AD is superior to the average level of radiologists, and also with low false positives per image. These findings suggest that the combination of the proposed preprocessing methods and model architecture can lead to more accurate and reliable AD detection. In comparison to the previous works, our model achieves a sensitivity of 0.85 with a lower FPD than previous models, indicating improved performance in detecting abnormalities. The elevated detection sensitivity is possibly due to the inclusion of predictions from both the original images and the structure-focused maps, as discussed in “Discussion” section. The visualization of feature map further provides evidence of our model’s ability to focus on characteristic AD patterns. Since the largest open dataset we can currently access is restricted to DDSM and CBIS-DDSM datasets, further research can explore the generalizability of our findings to larger and more diverse datasets. Overall, since AD is an early-stage pattern that is difficult to find and also prone to misjudgment, it is always essential to develop better methods in detecting AD. We believe that our method has the potential in clinical settings, assisting radiologists in early identification of AD from mammography, relieving burden of radiologists, and ultimately enabling the early detection and treatment of potential breast cancer patients.
Author Contribution
All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Ting-Wei Ou and Tzu-Chieh Weng. The first draft of the manuscript was written by Ting-Wei Ou, and Tzu-Chieh Weng commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding
This work was supported by the National Science and Technology (Grant numbers NSTC 110–2221-E-002–122-MY3, NSTC 110–2221-E-002–123-MY3, and NSTC 111–2634-F-002–021).
Data Availability
The data that support the findings of this study were derived from the following resources available in the public domain: DDSM: http://www.eng.usf.edu/cvprg/mammography/database.html, and CBIS-DDSM: https://doi.org/10.7937/K9/TCIA.2016.7O02S9CY.
Declarations
Competing Interests
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Azamjah, N., Y. Soltan-Zadeh and F. Zayeri: Global Trend of Breast Cancer Mortality Rate: A 25-Year Study. Asian Pacific journal of cancer prevention: APJCP 20(7):2015, 2019. 10.31557/APJCP.2019.20.7.2015. [DOI] [PMC free article] [PubMed]
- 2.D'Orsi, C.J., E.A. Sickles, E.B. Mendelson, et al., Acr Bi-Rads Atlas: Breast Imaging Reporting and Data System; Mammography, Ultrasound, Magnetic Resonance Imaging, Follow-up and Outcome Monitoring, Data Dictionary. ACR, American College of Radiology, 2013.
- 3.Gaur, S., V. Dialani, P.J. Slanetz, et al.: Architectural Distortion of the Breast. AJR Am J Roentgenol 201(5):W662-W670, 2013. 10.2214/AJR.12.10153. [DOI] [PubMed] [Google Scholar]
- 4.Suleiman, W., M. McEntee, S. Lewis, et al.: In the Digital Era, Architectural Distortion Remains a Challenging Radiological Task. Clin Radiol 71(1):e35-e40, 2016. 10.1016/j.crad.2015.10.009. [DOI] [PubMed] [Google Scholar]
- 5.Burrell, H.C., D.M. Sibbering, A. Wilson, et al.: Screening Interval Breast Cancers: Mammographic Features and Prognosis Factors. Radiology 199(3):811-817, 1996. 10.1148/radiology.199.3.8638010. [DOI] [PubMed] [Google Scholar]
- 6.Burrell, H.C., A.J. Evans, A.R.M. Wilson, et al.: False-Negative Breast Screening Assessment. What Lessons Can We Learn? Clin Radiol 56(5):385–388, 2001. 10.1053/crad.2001.0662. [DOI] [PubMed]
- 7.Bahl, M., J.A. Baker, E.N. Kinsey, et al.: Architectural Distortion on Mammography: Correlation with Pathologic Outcomes and Predictors of Malignancy. AJR Am J Roentgenol 205(6):1339-1345, 2015. 10.2214/AJR.15.14628. [DOI] [PubMed] [Google Scholar]
- 8.Kamra, A., V. Jain, S. Singh, et al.: Characterization of Architectural Distortion in Mammograms Based on Texture Analysis Using Support Vector Machine Classifier with Clinical Evaluation. J Digit Imaging 29:104-114, 2016. 10.1007/s10278-015-9807-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ragab, D.A., M. Sharkas, S. Marshall, et al.: Breast Cancer Detection Using Deep Convolutional Neural Networks and Support Vector Machines. PeerJ 7:e6201, 2019. 10.7717/peerj.6201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Sheba, K. and S. Gladston Raj: An Approach for Automatic Lesion Detection in Mammograms. Cogent Eng 5(1):1444320, 2018. 10.1080/23311916.2018.1444320.
- 11.Matsubara, T., T. Ichikawa, T. Hara, et al.: Automated Detection Methods for Architectural Distortions around Skinline and within Mammary Gland on Mammograms. in International Congress Series1256: p.950-955, 2003. 10.1016/S0531-5131(03)00496-5. [Google Scholar]
- 12.Matsubara, T., D. Fukuoka, N. Yagi, et al.: Detection Method for Architectural Distortion Based on Analysis of Structure of Mammary Gland on Mammograms. in International Congress Series1281: p.1036-1040, 2005. 10.1016/j.ics.2005.03.324. [Google Scholar]
- 13.Rangayyan, R.M. and F.J. Ayres: Gabor Filters and Phase Portraits for the Detection of Architectural Distortion in Mammograms. Med Biol Eng Comput 44:883-894, 2006. 10.1007/s11517-006-0088-3. [DOI] [PubMed] [Google Scholar]
- 14.Oyelade, O.N. and A.E.-S. Ezugwu: A State-of-the-Art Survey on Deep Learning Methods for Detection of Architectural Distortion from Digital Mammography. IEEE access 8:148644-148676, 2020. 10.1109/ACCESS.2020.3016223. [Google Scholar]
- 15.Matsubara, T., T. Ichikawa, T. Hara, et al.: Novel Method for Detecting Mammographic Architectural Distortion Based on Concentration of Mammary Gland. in International Congress Series1268: p.867-871, 2004. 10.1016/j.ics.2004.03.103. [Google Scholar]
- 16.Palma, G., S. Muller, I. Bloch, et al.: Detection of Convergence Areas in Digital Breast Tomosynthesis Using a Contrario Modeling. in Medical Imaging 2009: Computer-Aided Diagnosis7260: p.217–224, 2009. 10.1117/12.807547.
- 17.Rangayyan, R.M., S. Banik and J.L. Desautels: Detection of Architectural Distortion in Prior Mammograms Via Analysis of Oriented Patterns. JoVE (Journal of Visualized Experiments) (78):e50341, 2013. 10.3791/50341. [DOI] [PMC free article] [PubMed]
- 18.Rangayyan, R.M., J. Chakraborty, S. Banik, et al.: Detection of Architectural Distortion Using Coherence in Relation to the Expected Orientation of Breast Tissue. in 2012 25th IEEE International Symposium on Computer-Based Medical Systems (CBMS): p.1–4, 2012. 10.1109/CBMS.2012.6266310.
- 19.Rangayyan, R.M., S. Banik, J. Chakraborty, et al.: Measures of Divergence of Oriented Patterns for the Detection of Architectural Distortion in Prior Mammograms. Int J Comput Assist Radiol Surg 8:527-545, 2013. 10.1007/s11548-012-0793-3. [DOI] [PubMed] [Google Scholar]
- 20.Cai, Q., X. Liu and Z. Guo: Identifying Architectural Distortion in Mammogram Images Via a Se-Densenet Model and Twice Transfer Learning. in 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI): p.1–6, 2018. 10.1109/CISP-BMEI.2018.8633197.
- 21.Ben-Ari, R., A. Akselrod-Ballin, L. Karlinsky, et al.: Domain Specific Convolutional Neural Nets for Detection of Architectural Distortion in Mammograms. in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017): p.552–556, 2017. 10.1109/ISBI.2017.7950581.
- 22.Cogswell, M., F. Ahmed, R. Girshick, et al.: Reducing Overfitting in Deep Networks by Decorrelating Representations. arXiv preprint arXiv:1511.06068, 2015. 10.48550/arXiv.1511.06068.
- 23.Li, Y., Z. He, J. Pan, et al.: Atypical Architectural Distortion Detection in Digital Breast Tomosynthesis: A Computer-Aided Detection Model with Adaptive Receptive Field. Phys Med Biol 68(4):045013, 2023. [DOI] [PubMed] [Google Scholar]
- 24.Banik, S., R.M. Rangayyan and J.L. Desautels: Detection of Architectural Distortion in Prior Mammograms of Interval-Cancer Cases with Neural Networks. in 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society: p.6667–6670, 2009. 10.1109/IEMBS.2009.5334517. [DOI] [PubMed]
- 25.Banik, S., R.M. Rangayyan and J.L. Desautels: Detection of Architectural Distortion in Prior Mammograms. IEEE Trans Med Imaging 30(2):279-294, 2010. 10.1109/TMI.2010.2076828. [DOI] [PubMed] [Google Scholar]
- 26.Jasionowska, M., A. Przelaskowski, A. Rutczynska, et al.: A Two-Step Method for Detection of Architectural Distortions in Mammograms. in Information Technologies in Biomedicine: Volume 2: p.73-84, 2010. 10.1007/978-3-642-13105-9_8. [Google Scholar]
- 27.Yamazaki, M., A. Teramoto and H. Fujita: A Hybrid Detection Scheme of Architectural Distortion in Mammograms Using Iris Filter and Gabor Filter. in Breast Imaging: 13th International Workshop, IWDM 2016, Malmö, Sweden, June 19–22, 2016, Proceedings 13: p.174–182, 2016. 10.1007/978-3-319-41546-8_23.
- 28.Hasegawa, J.I., T. Tsutsui and J.I.T. Members: Automated Extraction of Cancer Lesions with Convergent Fold Patterns in Double Contrast X‐Ray Images of the Stomach. Systems and Computers in Japan 22(7):51-62, 1991. 10.1002/scj.4690220706. [Google Scholar]
- 29.Matsubara, T., A. Ito, A. Tsunomori, et al.: An Automated Method for Detecting Architectural Distortions on Mammograms Using Direction Analysis of Linear Structures. in 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC): p.2661–2664, 2015. 10.1109/EMBC.2015.7318939. [DOI] [PubMed]
- 30.Yoshikawa, R., A. Teramoto, T. Matsubara, et al.: Automated Detection of Architectural Distortion Using Improved Adaptive Gabor Filter. in Breast Imaging: 12th International Workshop, IWDM 2014, Gifu City, Japan, June 29–July 2, 2014. Proceedings 12: p.606–611, 2014. 10.1007/978-3-319-07887-8_84.
- 31.Palma, G., S. Muller, I. Bloch, et al.: Fast Detection of Convergence Areas in Digital Breast Tomosynthesis. in 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro: p.847–850, 2009. 10.1109/ISBI.2009.5193185.
- 32.Molins, E., F. Macià, F. Ferrer, et al.: Association between Radiologists’ Experience and Accuracy in Interpreting Screening Mammograms. BMC Health Serv Res 8(1):1-10, 2008. 10.1186/1472-6963-8-91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Oyelade, O.N. and A.E. Ezugwu: A Deep Learning Model Using Data Augmentation for Detection of Architectural Distortion in Whole and Patches of Images. Biomed Signal Process Control 65:102366, 2021. 10.1016/j.bspc.2020.102366. [Google Scholar]
- 34.Rehman, K.u., J. Li, Y. Pei, et al.: Architectural Distortion-Based Digital Mammograms Classification Using Depth Wise Convolutional Neural Network. Biology 11(1):15, 2021. 10.3390/biology11010015. [DOI] [PMC free article] [PubMed]
- 35.Li, Y., Z. He, X. Ma, et al.: Architectural Distortion Detection Based on Superior–Inferior Directional Context and Anatomic Prior Knowledge in Digital Breast Tomosynthesis. Med Phys 49(6):3749-3768, 2022. 10.1002/mp.15631. [DOI] [PubMed] [Google Scholar]
- 36.Liu, Z., Y. Lin, Y. Cao, et al.: Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. in Proceedings of the IEEE/CVF international conference on computer vision: p.10012–10022, 2021. 10.1109/ICCV48922.2021.00986.
- 37.Woo, S., J. Park, J.-Y. Lee, et al.: Cbam: Convolutional Block Attention Module. in Proceedings of the European conference on computer vision (ECCV): p.3–19, 2018. 10.1007/978-3-030-01234-2_1.
- 38.Heath, M., K. Bowyer, D. Kopans, et al.: Current Status of the Digital Database for Screening Mammography, in Digital Mammography: Nijmegen, 1998. p. 457–460 Dordrecht: Springer, 1998. 10.1007/978-94-011-5318-8_75.
- 39.Sawyer-Lee, R., F. Gimenez, A. Hoogi, et al., Curated Breast Imaging Subset of Digital Database for Screening Mammography (Cbis-Ddsm)[Data Set], in The Cancer Imaging Archive. 2016. 10.7937/K9/TCIA.2016.7O02S9CY. [Google Scholar]
- 40.Clark, K., B. Vendt, K. Smith, et al.: The Cancer Imaging Archive (Tcia): Maintaining and Operating a Public Information Repository. J Digit Imaging 26:1045-1057, 2013. 10.1007/s10278-013-9622-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lee, R.S., F. Gimenez, A. Hoogi, et al.: A Curated Mammography Data Set for Use in Computer-Aided Detection and Diagnosis Research. Scientific data 4(1):1-9, 2017. 10.1038/sdata.2017.177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Rangayyan, R.M., S. Prajna, F.J. Ayres, et al.: Detection of Architectural Distortion in Prior Screening Mammograms Using Gabor Filters, Phase Portraits, Fractal Dimension, and Texture Analysis. Int J Comput Assist Radiol Surg 2:347-361, 2008. 10.1007/s11548-007-0143-z. [DOI] [PubMed] [Google Scholar]
- 43.Kruizinga, P. and N. Petkov: Nonlinear Operator for Oriented Texture. IEEE Transactions on image processing 8(10):1395-1407, 1999. 10.1109/83.791965. [DOI] [PubMed] [Google Scholar]
- 44.Otsu, N.: A Threshold Selection Method from Gray-Level Histograms. IEEE Trans Syst Man Cybern 9(1):62-66, 1979. 10.1109/TSMC.1979.4310076. [Google Scholar]
- 45.Han, J. and C. Moraga: The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning. in International workshop on artificial neural networks: p.195–201, 1995.
- 46.Bodla, N., B. Singh, R. Chellappa, et al.: Soft-Nms--Improving Object Detection with One Line of Code. in Proceedings of the IEEE international conference on computer vision: p.5561–5569, 2017. 10.1109/ICCV.2017.593.
- 47.Solovyev, R., W. Wang and T. Gabruseva: Weighted Boxes Fusion: Ensembling Boxes from Different Object Detection Models. Image Vis Comput 107:104117, 2021. 10.1016/j.imavis.2021.104117. [Google Scholar]
- 48.Hosang, J., R. Benenson, P. Dollár, et al.: What Makes for Effective Detection Proposals? IEEE Trans Pattern Anal Mach Intell 38(4):814-830, 2015. 10.1109/TPAMI.2015.2465908. [DOI] [PubMed] [Google Scholar]
- 49.Simonyan, K. and A. Zisserman: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014. 10.48550/arXiv.1409.1556.
- 50.He, K., X. Zhang, S. Ren, et al.: Deep Residual Learning for Image Recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition: p.770–778, 2016. 10.1109/CVPR.2016.90.
- 51.Zhang, H., C. Wu, Z. Zhang, et al.: Resnest: Split-Attention Networks. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition: p.2736–2746, 2022. 10.1109/CVPRW56347.2022.00309.
- 52.Sandler, M., A. Howard, M. Zhu, et al.: Mobilenetv2: Inverted Residuals and Linear Bottlenecks. in Proceedings of the IEEE conference on computer vision and pattern recognition: p.4510–4520, 2018. 10.1109/CVPR.2018.00474.
- 53.Wang, W., E. Xie, X. Li, et al.: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. in Proceedings of the IEEE/CVF international conference on computer vision: p.568–578, 2021. 10.1109/ICCV48922.2021.00061.
- 54.He, K., C. Gan, Z. Li, et al.: Transformers in Medical Image Analysis. Intelligent Medicine 3(1):59-78, 2023. 10.1016/j.imed.2022.07.002. [Google Scholar]
- 55.Yang, R. and Y. Yu: Artificial Convolutional Neural Network in Object Detection and Semantic Segmentation for Medical Imaging Analysis. Front Oncol 11:638182, 2021. 10.3389/fonc.2021.638182. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study were derived from the following resources available in the public domain: DDSM: http://www.eng.usf.edu/cvprg/mammography/database.html, and CBIS-DDSM: https://doi.org/10.7937/K9/TCIA.2016.7O02S9CY.








