Multiple-level thresholding for breast mass detection

Xiang Yu; Shui-Hua Wang; Yu-Dong Zhang

doi:10.1016/j.jksuci.2022.11.006

. Author manuscript; available in PMC: 2023 May 22.

Published in final edited form as: J King Saud Univ Comput Inf Sci. 2023 Jan;35(1):115–130. doi: 10.1016/j.jksuci.2022.11.006

Multiple-level thresholding for breast mass detection

Xiang Yu ¹, Shui-Hua Wang ¹, Yu-Dong Zhang ^1,^*

PMCID: PMC7614559 EMSID: EMS174791 PMID: 37220564

Abstract

Detection of breast mass plays a very important role in making the diagnosis of breast cancer. For faster detection of breast cancer caused by breast mass, we developed a novel and efficient patch-based breast mass detection system for mammography images. The proposed framework is comprised of three modules, including pre-processing, multiple-level breast tissue segmentation, and final breast mass detection. An improved Deeplabv3+ model for pectoral muscle removal is deployed in pre-processing. We then proposed a multiple-level thresholding segmentation method to segment breast mass and obtained the connected components (ConCs), where the corresponding image patch to each ConC is extracted for mass detection. In the final detection stage, each image patch is classified into breast mass and breast tissue background by trained deep learning models. The patches that are classified as breast mass are then taken as the candidates for breast mass. To reduce the false positive rate in the detection results, we applied the non-maximum suppression algorithm to combine the overlapped detection results. Once an image patch is considered a breast mass, the accurate detection result can then be retrieved from the corresponding ConC in the segmented images. Moreover, a coarse segmentation result can be simultaneously retrieved after detection. Compared to the state-of-the-art methods, the proposed method achieved comparable performance. On CBIS-DDSM, the proposed method achieved a detection sensitivity of 0.87 at 2.86 FPI (False Positive rate per Image), while the sensitivity reached 0.96 on INbreast with an FPI of only 1.29.

Keywords: Mass detection, Multiple-level thresholding, Deep CNNs

1. Introduction

Medical image analysis has played a key role in the modern health system. By deploying early screening, cancers or diseases can be detected at an early stage and morbidity and death rates can be reduced (Siegel et al., 2021). For breast cancer, severity analysis can be achieved through an invasive method such as a biopsy. However, the patients have to go through a series of painful tissue extraction procedures and wait for a rather long time until the analysis is completed. To facilitate the diagnosis of disease pain-free in a more efficient way, experts in computer science have developed useful Computer-aided detection (CAD) systems. Those systems can be divided into traditional and deep learning-based ones. Traditional CAD systems usually consist of modules, including pre-processing, segmentation, feature extraction(and feature selection if needed), and classification. However, human intervention is usually an indispensable part of ensuring the success of these systems. Instead, deep learning-based systems can greatly minimize manual intervention as deep learning has brought great benefit to the development of many key areas such as autonomous driving, cyber-security, and medical imaging (Grigorescu et al.,2020; Berman et al., 2019; Xiang et al., 2021; Yolcu et al., 2020; Oztel et al., 2018). Compared to traditional CAD systems, deep learning-based CAD systems turn out to be more advantageous in terms of performance and robustness. Also, the individual modules, including feature extraction, selection and classification in conventional CAD systems, can be integrated into single deep learning architecture, which indirectly boosts the robustness of deep learning-based systems. However, some shortcomings still remain for deep learning-based CAD systems. One is the improvable robustness of these systems. While the performance of deep learning-based CAD systems on limited datasets can be promising, these systems, however, can still perform surprisingly badly on images collected by different imaging devices or the same devices with different settings. Another problem mainly comes from the size of available resources, including datasets and computing devices. The final performance of deep learning-based CAD systems is greatly determined by the size of available datasets and annotations. While there are numerous attempts to mitigate the situation (Bria et al., 2020; Taghanaki et al., 2021), the original size of the dataset remains a dominating factor impacting the performance of deep learning-based CAD systems. Especially annotating a dataset is an expensive procedure that requires a large sum of money for manual expenses and a large amount of time due to the challenges of providing accurate annotations. Also, as it was widely known, the training of large-scale deep learning models requires large computing resources such as GPUs. Therefore, the available computing resources and deployment of these models are also potential factors that hinder the spread of deep learning-based CAD systems.

Breast cancer, a common top-ranking cancer like lung, and prostate cancer, has been recognized as one of the major threats to women's health. While the coincidence rate of breast cancer increases, the death rate declines thanks to early screening procedures (Siegel et al., 2021). Considering factors including cost and efficiency, radiologists in the community recommend X-ray mammography as the key tool for the early detection of breast abnormalities. Given that manual interpretation of mammography is a time-consuming and challenging task, numerous CAD systems for mammography images have been developed to aid radiologists in the community during the past decades. Compared to other breast abnormalities such as calcification, distortion, and breast mass are the most significant symptoms of breast cancer. However, the intrinsic complicated nature and varied shapes of breast mass make it a challenging task to detect and segment breast mass. Additionally, the low signal-to-ratio of mammography images indirectly impairs the performance of hand-crafted feature-based CAD systems. Another challenge for successful breast mass detection is the varied density of breast tissues. When dense breast tissue is present, the pixel intensities of these tissues are close to real breast mass and may also overlap the breast mass. Therefore, it is usually more challenging to recognize and partition real breast masses breast tissue.

In this paper, we developed a novel breast mass detection system for mammography images with varied breast densities. A side benefit brought by the detection system is that coarse segmentation can be simultaneously achieved once the detection is finished. In the developed system, the breast pectoral muscle, which is usually shown in the mediolateral oblique (MLO) view, is firstly removed by an improved Deeplabv3+ segmentation module (Yu et al., 2022). We then proposed a multiple-level thresholding method to perform coarse segmentation of breast mass. The proposed multiple-level thresholding method first segments the breast tissues by the averaged pixel intensity of the breast tissues. After analysing the region properties of the connected components (ConCs) in the segmented images, the ConCs with large areas and the corresponding pixels in breast images are selected for further fine-grained segmentation based on the averaged pixel intensity of the selected pixels. The segmentation procedure stops only when no big enough ConCs are found in the segmented images. The segmentation results by different levels of thresholding are then combined to form the final segmentation results. To depress the noisy ConCs in the segmented images, an area opening operation, which eliminates the dot-like ConCs with a small area, is applied. In the final segmented images, each individual ConC and corresponding image patch is then extracted for breast mass and tissue classification, which is implemented by retraining the state-of-the-art deep learning models. The image patches that are classified as breast mass and corresponding ConC patches are taken as the coarse detection and segmentation results. Non-maximum Suppression (NMS) algorithm is applied to refine the results by suppressing the low-scored patches. The image patches that survived all stages can then be taken as the true breast masses, and then the patch-level detection results can be obtained. Furthermore, the segmentation result can be retrieved from the ConC patches, while the accurate detection result can be refined by the bounding boxes of the ConC patches. The main contributions of this paper can be concluded as follows:

We proposed a novel patch-based CAD system for efficient breast mass detection in mammography. Instead of deploying a deep learning object detection framework, we converted the detection problem into a single classification task and therefore saved overall computational cost. Compared to detection frameworks for common objects, the proposed framework is more friendly for training as only deep classifiers are introduced. Experimental results on two public datasets showed comparable performance to the benchmarks.
The proposed system is of high robustness that can be applied to mammography images with varied breast densities. In mammography images with dense breast tissues, it is a challenging task to detect breast mass. The situation is more complicated when the breast density of mammography varies from one image to another. However, our proposed multiple-level thresholding method can easily cope with varied breast density with a few predefined parameters. Therefore, the proposed algorithm is of high generality and flexibility.
We attempted a coarse breast mass segmentation at the same time as implementing breast mass detection. By introducing the multiple-level thresholding segmentation method, the coarse segmentation of breast mass can be obtained once the breast mass candidate patches are determined as a breast mass. While the segmentation can be refined, it could lead to new strategies for simultaneous breast mass detection and segmentation.

The remainder of this paper is arranged as follows. In Section 2, we will briefly revisit the related works in recent years. Then we will introduce the proposed pipeline in Section 3, followed by the experiments in Section 4, where we will introduce the details of the datasets used in this research, the setting of the experiments and the results. We will then discuss some issues regarding the problem in the experiment in Section 5. Finally, we end this paper with the conclusion and future work in Section 6.

2. Related works

Breast mass detection is an important module in mammogram analysis systems as it provides the region of interest (ROI) for further analysis. Given the importance of breast mass detection, there are numerous meaningful attempts toward it from the perspective of traditional and deep learning-based methods(Wang et al., 2018; Cao et al., 2021; Sun et al., 2021). In the work (Wang et al., 2018), Wang et al. proposed to integrate Gestalt psychology into breast mass detection tasks. The proposed framework is comprised of sensation integration, semantic integration, and verification. The proposed method reported a 93.84% detection sensitivity on the Digital Database for Screening Mammography (DDSM) at an FPI of 2.21(Heath et al., 1998). However, the false positive rate can be further reduced. In another deep learning-based method (Cao et al., 2021), an anchor-free architecture was developed. In the developed model, the contrast between breast mass and surrounding tissues is enhanced based on adaptive histogram equalization. The authors then transferred a one-stage detection network called FSAF, which is an anchor-free model, for the detection task here (Zhu et al., 2019). The authors reported the recall rate as 0.943 on DDSM at a 0.599 false positive rate. In another work (Aly et al., 2021), Aly et al. proposed to deploy You Only Look Once (YOLO) for breast mass detection and classification in INbreast mammograms. For comparison, the authors also deployed ResNet and Inception as feature extractors for classification performance against YOLO. The reported results showed that 89.4% of the masses in INbreast can be detected while an average precision of 89.4% and 94.2% were reported for benign and malignant masses classification. In another recent work(Sun et al., 2021), Sun et al. proposed to combine traditional template matching with deep learning. In the proposed method, ROIs are determined by scanning the mammographic images from top to bottom and then left to right via a morphology method that can transform brighter regions into circular-like areas. Deep convolutional neural networks (CNNs) are then trained to classify the ROIs into breast mass and breast background tissues. The reported detection results on DDSM were 86.82% sensitivity with 0.53 FPI. However, the robustness of the proposed method is poor as a minor change in the mammographic images, such as the intensity, would lead to detection failure.

Segmentation can also help breast cancer detection and diagnosis by introducing extra information(Min et al., 2020; Yan et al., 2021; Su et al., 2022). In the work (Min et al., 2020), simultaneous mass detection and segmentation are achieved by introducing a novel deep CNN model called Mask R-CNN that can simultaneously detect and segment objects of interest in images. A two-staged mass detection and segmentation framework can be found in (Yan et al., 2021) where breast masses are detected by a multi-scale fusion-based method and then are segmented via an improved version of UNet. Another transformer-based YOLO framework was introduced in the work (Su et al., 2022) that showed a true positive rate of 95.7% on breast mass detection on CBIS-DDSM. There are also some meaningful attempts at reducing computational costs for medical image analysis (Mukherjee et al., 2019; Zamzmi et al., 2021). In the work (Mukherjee et al., 2019), the authors proposed a segmentationfree method for automatic white matter injury detection in preterm infants. A linear maximally stable extremal regions algorithm with efficiency was first applied to detect the ventricles as blobs. Tissues that adjoin the blobs were identified via brainbackground boundary and a reference contour equidistant from the blobs. These tissues were assumed to follow a normal distribution of the grey-value intensity, and then outlier intensities were labelled as potential white matter injury, which was reconfirmed through the following heuristics. The proposed method is quite inspiring in that it can be transferred to similar scenarios such as breast mass detection, where the linear maximally stable extremal regions algorithm might be helpful in distinguishing breast mass from breast tissue.

In conclusion, the exploration of developing simultaneous breast mass detection is still quite limited, while some proposed methods heavily rely on computational resources for segmentation. Therefore, we proposed a novel framework for these two tasks. After an automatic search of proper thresholding values for segmentation, breast images are first segmented. We then extracted breast tissue patches regarding the ConCs in the segmented images for breast mass and background classification. The classification result is then refined by introducing Non-Maximum-Suppression (NMS). The advantages of the proposed framework include high performance and high robustness as we evaluated the proposed framework on two public datasets with promising results obtained.

3. Methodology

In this section, we will introduce each module in the proposed framework, including pre-processing, multiple-level segmentation, and breast mass detection, where breast mass detection can be further divided into breast mass patch extraction, breast mass classification, and false positive reduction. In the pre-processing module, we mainly remove the breast pectoral muscle and enhance the contrast of the breast-only image. The multiple-level thresholding segmentation is then performed on the pectoral muscle removed and contrast-enhanced images. The varied thresholding values are applied to binarize the breast region into different ConCs, where corresponding breast tissue patches are extracted for mass detection. In the breast mass detection stage, deep learning models are transferred and retrained for breast mass and tissue classification. To further reduce the false positives after classification, we then applied the NMS algorithm and took breast mass patches that survived all stages as true breast mass. An overview of the proposed framework can be seen in Fig. 1.

3.1. Pre-processing

Pre-processing plays a key role in reducing computational costs and improving image quality. In mammograms, breasts only appear in a limited area, so breast region extraction alone could greatly benefit the following modules by shrinking the image size. In this paper, we propose to remove the breast pectoral muscle and enhance the resultant images for the following reasons. One is that the breast pectoral muscle will affect our intensity-based segmentation method as the intensity of the muscle is of high similarity to that of a true breast mass. Also, the size of the breast region can be further narrowed down once we have the pectoral muscle removed and therefore, we can reduce the overall computational cost. The reason why we enhance the contrast of images is that medical images usually suffer from low contrast and intensity inhomogeneity. As a result, image contrast enhancement would greatly mitigate the situation.

A mammogram must fall into one of four views, including a leftside mediolateral oblique (LMLO), a left-side craniocaudal (LCC), a right-side mediolateral oblique (RMLO) and a right-side craniocau-dal (RCC). However, the pectoral muscle usually appears in MLO view, while little or no pectoral muscle is shown in CC view mammograms. For pectoral muscle removal, we deployed a novel deep segmentation model called PeMNet in the work (Yu et al., 2022). After pectoral removal, the breast-only images in the MLO view are then enhanced by a classic method called contrast-limited adaptive histogram equalization. The mammograms in CC views, however, are directly applied with the contrast-enhancement method. One pre-processing example is given in Fig. 2. As can be seen, the pectoral muscle has been successfully segmented and removed. Compared to Fig. 2a, the number of interested pixels in the breast region has been greatly reduced. We then performed a classical contrast-enhancement method called Contrast Limit Adaptive Histogram Enhancement (CLAHE) on the breast-only images. After we segmented the pectoral muscle, the breast mask can be obtained from the segmentation results by simply setting the pixel corresponding to breast pixels to ones. As can be seen from Fig. 2, the pectoral muscle has been successfully removed while the contrast of the resultant image has been improved. Therefore, we believe our pre-processing procedures are effective and helpful in reducing overall computational costs while improving the quality of breast images.

Fig. 2 — **(a)** Original mammogram. **(b)** Segmented Pectoral muscle. **(c)** Breast-only image. **(d)** Contrast-enhanced breast-only image by CLAHE with clip limit as 0.02. **(e)** Obtained breast mask.

3.2. Multiple-level thresholding segmentation

After pectoral muscle removal, the breast mass turns out to be the area of highest intensity if there is any breast mass in the presence of the mammogram. Based on this assumption, we deployed a multiple-level threshing segmentation algorithm, which can be divided into coarse and fine-grain segmentation. Given the preprocessed breast image I ∈ ℝ^HxWx3, where H and W stand for the height and width of the image, respectively. Correspondingly, as mentioned before, we can have the breast mask BMask ∈ ℝ^H×Wx3 for the breast region, which indicates the breast-only region by 1 and the non-breast region by 0. We first calculated the mean intensity of the breast region and segmented the breast image based on the obtained value. As a result, the coarse segmentation result is obtained and then labelled into different ConCs. For each oversized ConC, a fine-grained segmentation is carried out. To determine whether a ConC is oversized or not, we predefined a fixed value Area, and the ConC is believed to be oversized if its area is greater than Area. Fine-grained segmentation proceeds when there are still oversized ConCs in the segmented images. Finally, all segmentation results are aggregated to form the final segmentation result. The detailed algorithm is shown in Algorithm 1. The detailed intermediate results of the segmentation process can be found in Fig. 3. Note that we analyzed the region properties of the ConCs in the segmentation result and then removed noisy ConCs such as segmentlike and the small dot-like ConCs, which can be seen in Fig. 3c.

Fig. 3 — **(a)** Original mammogram. **(b)** Pixel-level ground truth of breast mass. **(c)** Segmentation results by multiple-level thresholding. There are some segments in the segmented results due to the remaining pectoral muscle.

Algorithm 1. Multiple-level thresholding segmentation.

Input : Breast-only image I, Breast mask BMask

Expected output: Segmentation result Output O^FS

Step 1: Calculate mean intensity M_R

M_{R} = \frac{\sum \sum I \cdot B M a s k}{\sum \sum BMask}

(1)

Step 2: Coarse segmentation and labeling

O^{C S^{'}} (i, j) = {\begin{cases} 1, & if I (i, j) \geq α M_{R} \\ 0, & if I (i, j) < α M_{R} \end{cases}

(2)

$O^{C S^{'}} \in {0, 1}^{H \times W}$ stands for the coarse segmentation result that only comprises zero or one. $O^{C S^{'}}$ is then labeled into n ConCs regarding a scaled mean intensity, where α is the weight. So that $O^{C S^{'}}$ can be denoted by the aggregation of ConCs as:

O^{C S^{'}} = O_{1}^{C S^{'}} \cup O_{2}^{C S^{'}} \cup \dots O_{n}^{C S^{'}}

(3)

where $O_{1}^{C S^{'}}, \dots, O_{n}^{C S^{'}} \in {0, 1}^{H \times W}$ is a positive scale factor. To depress the noisy ConCs in the segmentation result, we performed an area opening operation.

fori ⟵ 1 to ndo

if $\sum \sum O_{i}^{C S'} > A r e a^{C S}$ then $O^{C S} = O^{C S} \cup O_{i}^{C S^{'}}$ where $O_{1}^{C S^{'}}, \dots, O_{n}^{C S^{'}} \in {0, 1}^{H \times W}, α$ , and Area^cs is a small predefined threshold value. So that

O^{C S} = O_{1}^{C S^{'}} \cup O_{2}^{C S^{'}} \cup \dots O_{m}^{C S^{'}} (1 \leq m \leq n)

(4)

For simplicity, we integrated the process of Step 1 and Step 2 by the function:

O^{C S} = CoSeg (I, B M a s k)

(5)

Step 3: Fine-grain segmentation

fori ⟵ 1 to mdo

if $\sum \sum O_{i}^{C S'} \geq$ Area then

O_{i}^{F S} = CoSeg (I, O_{i}^{C S})

(6)

else

O_{i}^{F S} = O_{i}^{C S}

(7)

where Area is a predefined threshold value for the identification of ConCs with a large area. $O_{j}^{F S} \in {0, 1}^{H \times W}$ stands for the fine-grained segmentation results corresponding to the ConC labelled as j. By aggregating all fine-grained segmentation results, we can then obtain the final segmentation result. The process can be shown as:

O^{F S} = O_{1}^{F S} \cup O_{2}^{F S} \cup \dots O_{m}^{F S}

(8)

3.3. Breast mass detection

Breast mass detection can be subdivided into three steps, including image patch extraction, classification and false positive reduction. Based on the segmentation results, corresponding patches from the processed breast-only images can be extracted regarding the bounding boxes. We extracted square image patches as deep learning models usually take square images as input. Detailed patch extraction procedures can be seen in Algorithm 2.

Algorithm 2. Image patch extraction.

Step 1: Initialization

1)
Label the ConCs in the segmentation results into 1, …, m;
2)
Obtain the bounding box BBox(cx, cy, width, height) $\in ℝ^{4 \times m}$ of $O_{i}^{F S} (i = 1 \dots m)$ , where cx, cy, width, height stands for the horizontal centroid, vertical centroid, width and height of the ConCs, respectively.
3)
Initialize default image patch size as Size;
4)
Initialize the expected output location EBox(cx, cy, width, width) $\in ℝ^{4 \times m}$ .

Step 2: Location adjustment

for i ← 1 to m do

Size_i = max(BBox_i(width), BBox_i(height), Size)

EBox_i(cx) = min(max(l +

0.5Size_i, BBox_i(cx)), W – 0.5Size_i)

EBoX_i(cy) =

min(max(l+0.5Size_i, BBox_i(cy)), H – 0.5Size_i)

EBox_i (width), EBox_i(height) = Size_i

where max(·) and min(·) stand for maximum and minimum operations correspondingly.

Step 3: Patch extraction

BP_i = I(EBox_i(cx) – 0.5 EBox_i(width): EBox_i(cx) + 0.5 EBox_i(width) – EBox_i(cy) + 0.5 EBox_i(height))

where BP_i indicates the image patch corresponding to the ConC labelled as i, where each individual ConC patch $O P_{i}^{F S}$ can be extracted by: $O P_{i}^{F S}$ = $O P_{i}^{F S}$ (EBox_i(cx) – 0.5 EBox_i(width): EBox_i(cx) + 0.5 EBox_i(width) – 0.5EBox_i (heights):EBox_i(cy) + 0.5 EBox_i(height))

The architectures of deep learning models before adaptation and after can be seen in Fig. 4. We deployed the state-of-the-art deep learning models that were pre-trained on the ImageNet dataset for breast mass and tissue (Simonyan and Zisserman,2014; He et al., 2016; Huang et al., 2017; Szegedy et al., 2017). Those deep learning models have achieved dominating performance on 1,000 categories of classification compared to other methods, which can be seen in Fig. 4a. In deep learning models, a encode layer is responsible for adjusting the size of input images to the input size requirement. The main components in deep learning models are deep blocks that comprise stacks of convolution layers, normalization layers, and activation layers, i.e. ReLU layer, pooling layers in top of Fig. 4a. The features generated from deep conv blocks are fed to a fully connected layer, which is responsible for mapping the learnt features into target space for the classification tasks. For efficient deployment, we transferred these models to our classification task by introducing minimal changes. There are two most straightforward ways to adapt those deep learning models for our classification task here. The first one is to simply replace the original fully connected layer with the expected fully connected layer (Ali et al., 2021), which is shown in Fig. 4b. We aimed at a two-class classification task here and we, therefore, replace the original fully connected layer with a two-neuron fully connected layer. And the second one is to add more layers after the final fully connected layers for desired classification task as is shown in Fig. 4c. To prevent significant information loss, similar to the works in (Xiang et al., 2020; Yu et al.,2021), we added a new fully connected layer with 256-dimensional output. Also, we added a dropout layer at the dropout rate of 0.5. After breast mass classification, numerous overlapping image patches will be predicted as masses. To solve this and reduce false positives, we introduced the NMS algorithm, which can be seen in Algorithm 3. An example can be seen in Fig. 5. As can be seen, only patches containing breast mass are kept in Fig. 5b where there are multiple detection results. By performing the proposed NMS, the detection results combine into a single detection result while the FPI reduces simultaneously.

Fig. 5 — **(a)** Post segmented results. **(b)** ConCs correspond to patches that are classified as breast mass by deep learning models. **(c)** The detection result. The blue bounding box indicates the patch location in the image while the red one indicates the refined bounding box for accurate detection result.

Algorithm 3. Non-maximum Suppression.

Input : Predicted scores: Scores in $ℝ^{s}$ , ConC patches: $O P_{i}^{F S} (1 \leq i \leq s)$ Locations: location(cx, cy, width, height) $l o c a t i o n (c x, c y, w i d t h, h e i g h t) \in ℝ^{4 \times s}$

Expected output: Refined scores: $s c o r e s_{R} \in ℝ^{s'} (s' \leq s)$ , Refined locations: $l o c a t i o n_{R} (c x, c y, w i d t h, h e i g h t) \in ℝ^{4 \times s'}$

fori ⟵ 1 to s – 1 do

forj ⟵ i + 1 to sdo

ifIoU(i,j) > Rate then

ifScores(i)! = 0 then

if $S c o r e s (j) . \sum \sum O P_{j}^{F S} \leq S c o r e s (i) . \sum \sum O P_{i}^{F S}$ then

Scores(j) = 0;

else

Scores(i) = 0;

where IoU(i,j) denotes the area of intersection of union between i_th and j_th image patch and the Rate is the intersection rate.

count = 1;

fori ⟵ 1 to sdo

ifScores(i)!= 0 then

Scores_R(count) = Scores(i)

localion_R(cx_count, cy_count, width_count, height_count)

location(cx_i, cy_i, width_i height_i)

count = count + 1

3.4. Model training and inference

In the training stage, patch classification is the only module that requires training as no learnable parameters are found in other modules. When training the deep CNN models, breast mass patches in the training set are directly extracted regarding the bounding boxes and are fed to the CNN models. The trained deep CNNs tend to recognize the breast mass patches that appeared in the training set before. The overall evaluation of the detection framework on the train set doesn't make too much sense. Instead, the overall evaluation of the testing set relies on some predefined parameters, where the performance of the detection framework may vary slightly due to these parameters. We, therefore, will explore the possible combinations of these parameters in the model inference stage instead of fixing them in the training stage. When inferring, the full mammogram in the testing set is preprocessed regarding the pre-processing module that extracts the breast region and removes the pectoral muscle. Then multiplelevel thresholding is applied to generate segmentation results for breast mass. According to the location information of the ConCs in the segmentation result, breast patches are extracted and then classified by the trained deep CNNs. The patches that are classified as breast mass are aggregated for false positive reduction by NMS. However, patches in some mammograms may fail to be classified as breast mass due to the difference in patch extraction from the training set and the testing set and the complexity of the mammogram. Note that breast mass patches in the training set are extracted regarding the true location information, while the patches in the testing set are extracted based on the segmentation results. Considering this, we take t patches with the top-ranked scores as the breast mass candidates for false positive reduction when no patches are classified as breast mass in the mammogram. We then consider it a successful detection of breast mass when the overlapping rate between the true bounding box and predicted bounding box is no less than 0.2.

4. Experiment

In this section, we will briefly introduce the datasets involved in this research. Later on, we will introduce the setting of parameters in the experiment. The key part of the proposed framework is the performance of the breast mass classification model. So, we will present the performance of the adapted deep learning models before we move to the detection results on two public datasets. We then finish this section with the method comparison, where we will compare our method with the state-of-the-art methods.

4.1. Datasets

In this research, we conducted our experiments on two public datasets, i.e., CBIS-DDSM and INbreast (Lee et al., 2017; Moreira et al., 2012), both of which provide pixel-level annotated ground truth. More importantly, all of the mammograms from the two datasets may have different breast densities that may cause breast mass detection failure. The height and width of the mammograms from the two datasets are usually more than 4000 pixels and 2000 pixels, respectively. We used the training set of CBIS-DDSM for model training and the testing set for evaluation while we directly evaluated the performance of the proposed framework on INbreast dataset without any further adaption. When training the models, we manually extracted mass patches and breast tissue patches from the training set of CBIS-DDSM, where the breast tissue patches have no overlaps with the breast mass. We obtain the breast tissue patches through the sliding window technique while breast mass patches are extracted directly regarding the given annotations. The breast tissue patches are extracted only when there is no intersection between the breast mass patches found. By doing so, the number of breast tissue patches greatly outnumbers that of the breast mass patches. We then applied data augmentation to the breast mass patches while randomly selecting the same number of breast tissue patches. The applied data augmentation methods include flipping upside down, flipping left to right, flipping upside down and then flipping left to right, contrast enhancement by CLAHE with the clip limit of 0.02, random scaling from 1 to 1.2, rotation clockwise by 90 degrees, and rotation counter-clockwise by 90 degrees. By aggregating all augmented images and the original image in the training set, the augmented training set was scaled to eight times of the original size. Similarly, we extracted the breast mass and tissue patches from the test set for evaluation of the deep learning models in the same way. Same here in the testing set, breast tissue patches greatly outnumbered breast mass patches, which will harm the evaluation metrics. Therefore, we created an augmented testing set only for the evaluation of deep learning models. The detailed composition of the dataset for deep learning model training can be seen in Table 1. Note that for overall detection performance evaluation, we applied to proposed patch extraction method to the testing set instead. Some extracted breast mass patches and tissues are shown in Fig. 6. As can be seen, the breast mass patches may vary in size, shape, and location.

Table 1. CBIS-DDSM dataset composition.

Dataset	Masses patches	Negative patches	Total
Original training set	1,230	24,171	25,401
Adjusted training set	9,840	9,840	19,680
Original testing set	353	14,103	14,457
Adjusted testing set	2,824	2,824	5,648

Parameter	Located in	Definition
α	Algorithm 1	The factor that scales the threshold in coarse segmentation.
Area^CS	Algorithm 1	The noise depression threshold.
Area	Algorithm 1	The threshold for the determination of ConCs with a large area.
Width	Algorithm 2	The size of the extracted image patches.
Rate	Algorithm 3	The intersection rate that controls the sensitivity of Non-maximum suppression.
t	Patch extraction in the inference phase	The number of selected patches when breast mass classification fails.

Names	Number of parameters of original models(Millions)	Input size to newly added FC layer	FC layer neurons	Total number of parameters(Millions)	Number of FLOPs(Gbits)
VGG19	143.68	1000	256 & 2	$+ \frac{256 * 1000 + 256 * 2}{10^{6}} = 143.94$	19.80
ResNet50	25.56	1000	256 & 2	25.82	4.25
InceptionV3	24.11	1000	256 & 2	24.37	6.13
DenseNet201	20.02	1000	256 & 2	20.28	4.37
InceptionResNetv2	56.11	1000	256 & 2	56.37	6.64
EfficientNetb0	5.30	1000	256 & 2	5.56	0.02

Parameters	Values
Maximum training epoch	9
Initial learning rate	10⁻⁴
Mini-batch size	60
Learning rate drop period	3
Learning rate drop rate	0.1
Optimization method	SGDM
Shuffle of the train set	Each epoch
Momentum	0.9

	Successful extraction rate(%)
α	Width 129	156	199	224	256	299
0.8	88.70	92.94	97.46	98.31	98.59	98.59
0.9	88.14	93.22	96.61	98.31	98.59	99.51
1.0	84.75	90.96	97.18	98.02	98.87	99.44
1.1	85.31	91.24	97.18	98.59	99.15	100
1.2	80.79	87.85	95.48	98.02	98.87	99.44

Models	Sensitivity	Specificity	Precision	F1_score	Accuracy	AUC(95% confidence interval)
Vgg19	0.83	0.91	0.90	0.87	0.87	0.9376~0.9466
ResNet50	0.74	0.89	0.87	0.80	0.81	0.8445~0.8588
InceptionV3	0.76	0.89	0.87	0.81	0.82	0.8900~0.9020
DenseNet201	0.77	0.89	0.88	0.82	0.83	0.9057~0.9168
InceptionResNetv2	0.76	0.89	0.88	0.81	0.83	0.9090~0.9198
EfficientNetb0	0.53	0.64	0.59	0.56	0.58	0.6062~0.6268

Models	Sensitivity	Specificity	Precision	F1_score	Accuracy	AUC(95% confidence interval)
Vgg19	0.90	0.94	0.94	0.92	0.92	0.9735~0.9793
ResNet50	0.91	0.96	0.96	0.94	0.94	0.9776~0.9802
InceptionV3	0.89	0.97	0.97	0.93	0.93	0.9790~0.9841
DenseNet201	0.90	0.97	0.97	0.93	0.93	0.9812~0.9860
InceptionResNetv2	0.88	0.97	0.97	0.92	0.93	0.9758~0.9813
EfficientNetb0	0.83	0.95	0.94	0.88	0.89	0.9520~0.9598

Models	Sensitivity	Specificity	Precision	F1_score	Accuracy	AUC(95% confidence interval)
Vgg19	0.91	0.95	0.95	0.93	0.93	0.9757~0.9811
ResNet50	0.90	0.94	0.94	0.92	0.92	0.9736~0.9793
InceptionV3	0.87	0.97	0.97	0.91	0.92	0.9736~0.9792
DenseNet201	0.89	0.97	0.96	0.92	0.93	0.9783~0.9834
InceptionResNetv2	0.80	0.94	0.93	0.86	0.87	0.9398~0.9486
EfficientNetb0	0.75	0.91	0.90	0.82	0.83	0.9061~0.9172

α	Sensitivity(%)	p – value_Sen	FPI	p – value_FPI
0.5	82.20	0.70	1.44	0.97
0.6	83.33	-	1.44	-
0.7	80.51	0.34	1.42	0.84
0.8	81.64	0.57	1.39	0.57
0.9	77.12	0.04	1.52	0.37
1.0	77.97	0.08	1.51	0.42
1.1	78.25	0.09	1.46	0.84

Width	Sensitivity(%)	p – value_Sen	FPI	p – value_FPI
129	70.06	≪ 0.05	4.77	≪ 0.05
169	75.71	0.01	3.72	≪ 0.05
199	77.68	0.06	2.94	≪ 0.05
224	77.68	0.07	2.43	≪ 0.05
256	81.36	0.50	1.91	≪ 0.05
299	83.33	-	1.44	-

Rate	Sensitivity(%)	p – value_Sen	FPI	p – value_FPI
0.2	79.10	0.005	1.21	≪ 0.05
0.3	83.33	0.16	1.44	≪ 0.05
0.4	83.90	0.23	1.70	≪ 0.05
0.5	85.31	0.47	2.07	≪ 0.05
0.6	86.44	0.76	2.44	0.003
0.7	87.29	-	2.86	1
0.8	87.29	1	3.21	0.03

	Width
Rate	224	256	299	P_S @ P_F
0.2	87.16@0.98	87.16@0.66	88.99@0.44
0.3	90.83@1.21	88.99@0.81	89.91@0.55
0.4	92.66@1.36	89.91@0.93	91.74@0.64	≪ 0.05@
0.5	95.41@1.69	91.74@1.14	91.74@0.78	≪ 0.05
0.6	95.41@1.93	91.74@1.29	92.66@0.92
0.7	95.41@2.27	92.66@1.51	92.66@1.05
P_S @ P_F		0.007@≪ 0.05

	Width
Rate	224	256	299	P_S @ P_F
0.2	82.57@0.98	84.40@0.66	87.16@0.44
0.3	86.24@1.21	87.16@0.81	88.07@0.55
0.4	89.91@1.36	88.99@0.93	90.83@0.64	≪ 0.05@
0.5	92.66@1.69	94.50@1.14	91.74@0.78	≪ 0.05
0.6	92.66@1.93	96.33@1.29	93.58@0.92
0.7	94.50@2.27	96.33@1.51	92.66@1.05
P_S @ P_F		0.28@≪ 0.05

	Width
Rate	224	256	299	P_S @ P_F
0.2	85.32@0.92	84.40@0.62	88.99@0.35
0.3	88.07@1.10	86.24@0.78	91.74@0.46
0.4	91.74@1.31	89.91@0.94	91.74@0.56	≪ 0.05@
0.5	94.50@1.53	89.91@1.13	91.74@0.69	≪ 0.05
0.6	94.50@1.79	90.83@1.32	93.58@0.79
0.7	94.50@2.14	92.67@1.59	93.58@1.01
P_S @ P_F		0.01@≪ 0.05

Method	Dataset	Number of images for evaluation	Sensitivity @ FPI
Andreadis et al. (2020)	CBIS-DDSM	73	0.81@1.62
Cha et al. (2019)	CBIS-DDSM	361	0.83@0.04
Bandeira Diniz et al. (2018)	DDSM	~ 54	0.90@0.88
Divyashree and Hemantha Kumar (2021)	CBIS-DDSM	200	0.94@–
Lbachir et al. (2021)	CBIS-DDSM	152	0.91@0.65
Our method	CBIS-DDSM	348	0.87@2.86
Hassan Shayma’a et al. (2019)	INbreast	75	0.94@0.67
Kozegar et al. (2013)	INbreast	107	0.87@3.67
Shen et al. (2020)	INbreast	32	0.88@0.50
NiroomandFam et al. (2021)	INbreast	82	0.98@1.43
Cao et al. (2021)	INbreast	107	0.93@0.5
Agarwal et al. (2019)	INbreast	410	0.95@0.79
Our method	INbreast	107	0.96@1.29

PERMALINK

Multiple-level thresholding for breast mass detection

Xiang Yu

Shui-Hua Wang

Yu-Dong Zhang

Abstract

1. Introduction

2. Related works

3. Methodology

Fig. 1. The overview of the proposed framework, which includes pre-processing, multiple-level thresholding, and breast mass detection, where breast mass detection is composed of path extraction, classification and false positive reduction.

3.1. Pre-processing

Fig. 2. Pre-processing.

3.2. Multiple-level thresholding segmentation

Fig. 3. Multiple-level thresholding.

Algorithm 1. Multiple-level thresholding segmentation.

3.3. Breast mass detection

Algorithm 2. Image patch extraction.

Fig. 4. Architectures of deep learning models.

Fig. 5. Patch classification and Non-maximum suppression.

Algorithm 3. Non-maximum Suppression.

3.4. Model training and inference

4. Experiment

4.1. Datasets

Table 1. CBIS-DDSM dataset composition.

Fig. 6. Manually extracted patches from the training set.

4.2. Experiment settings

Table 2. Predefined parameters.

Table 3. Total number of parameters for deep learning models.

Table 4. Setting of hyper-parameters.

4.3. Performance of patch extraction

Table 5. Performance of patch extraction on CBIS-DDSM.

4.4. Model ablation for breast mass classification

Table 6. Performance of the deep learning models trained with original training set while being evaluated on the adjusted testing set. The bold indicates the best.

Fig. 7. Performance of deep learning models on the adjusted testing set.

Table 7. Classification performance of different deep learning models trained with the adjusted training set while being evaluated on the adjusted testing set.

Fig. 8. The learning curve of ResNet50.

Table 8. Performance of the deep learning models with final fully connected layer replaced. The bold indicates the best.

4.5. Detection results on CBIS-DDSM

Table 9. Detection performance with varied α with Width,Rate fixed at 299 and 0.3, respectively.

Table 10. Detection performance with varied Width when α and Rate is 0.6 and 0.3.

Table 11. Detection performance of the proposed framework on CBIS-DDSM dataset when α=0.6 and Width = 299.

Fig. 9. Detection examples from CBIS-DDSM when Width = 224 and α = 0.8.

4.6. Detection results on INbreast

Table 12. Detection performance of the proposed framework on INbreast dataset when α=0.6, where A and B in A@B stand for sensitivity and FPI, respectively. For p-value calculation, two way is carried out.

Table 13. Detection performance of the proposed framework on INbreast dataset when α=0.7, where A and B in A@B stand for sensitivity and FPI, respectively.

Table 14. Detection performance of the proposed framework on INbreast dataset when α=0.8.

Fig. 10. Detection examples from INbreast dataset when Width = 199 and α = 0.8.

4.7. Method comparison

Table 15. Method comparison.

5. Discussion

6. Conclusion and future work

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases