Skip to main content
Heliyon logoLink to Heliyon
. 2023 Nov 15;9(11):e22406. doi: 10.1016/j.heliyon.2023.e22406

Breast cancer classification based on convolutional neural network and image fusion approaches using ultrasound images

Mohammed Alotaibi a,b, Abdulrhman Aljouie c,d, Najd Alluhaidan e, Wasem Qureshi c, Hessa Almatar b,c, Reema Alduhayan b,c, Barrak Alsomaie b, Ahmed Almazroa b,c,
PMCID: PMC10700613  PMID: 38074874

Abstract

Deep learning and image processing are used to classify and segment breast tumor images, specifically in ultrasound (US) modalities, to support clinical decisions and improve healthcare quality. However, directly using US images can be challenging due to noise and diverse imaging modalities. In this study, we developed a three-step image processing scheme involving speckle noise filtering using a block-matching three-dimensional filtering technique, region of interest highlighting, and RGB fusion. This method enhances the generalization of deep-learning models and achieves better performance. We used a deep learning model (VGG19) to perform transfer learning on three datasets: BUSI (780 images), Dataset B (162 images), and KAIMRC (5693 images). When tested on the BUSI and KAIMRC datasets using a fivefold cross-validation mechanism, the model with the proposed preprocessing step performed better than without preprocessing for each dataset. The proposed image processing approach improves the performance of the breast cancer deep learning classification model. Multiple diverse datasets (private and public) were used to generalize the model for clinical application.

Keywords: Image processing, KAIMRC, Big breast cancer data, VGG19, Deep learning

Abbreviations

US:

Ultrasound

KAIMRC

King Abdullah International Medical Research Center

BUSI:

Dataset of Breast Ultrasound images

MRI

Magnetic Resonance Imaging

CNN

Convolutional Ceural Network

ROC:

Receiver Operating Characteristic

AUC

Area Under Curve

ROI

Region Of Interest

RONI

Region Of Non-Interest

MNGHA

Ministery of National Guard-Health Affirs

1. Introduction

Breast cancer is the leading cause of cancer-related mortality worldwide, causing 685,000 female deaths. It has the highest cancer incidence, with 2.3 million cases in 2020 [1]. The mortality rate of breast cancer in Saudi Arabia was 13.1 % in 2018 [2].

Breast lesions are regularly screened and diagnosed using radiological imaging modalities, such as mammography, ultrasound (US), and magnetic resonance imaging (MRI). Although mammography remains the gold standard for breast cancer screening, it has certain imaging limitations due to dense breast parenchyma. Therefore, US and MRI are considered supplementary breast-screening procedures [3]. The sensitivity for breast cancer detection using both mammography and US has increased to 97.3 %, with a false-positive rate of 2.4 % [4].

However, US is noninvasive, nonradioactive, inexpensive, and has high image resolution. Moreover, US can effectively detect dense breast tissues [5]. However, these modalities can be used more efficiently by introducing computer-aided diagnosis systems, which use image processing and artificial intelligence models to assist radiologists in classifying breast lesions and overcoming the shortcomings of screening and diagnosis.

Several novel techniques based on neural networks (NNs) have been published in the literature for diagnosing cancer. For example, convolutional neural networks (CNNs) have been used to diagnose breast lesions [6]. A hybrid-based CNN system with Alexnet, MobilenetV2, and Resnet 50 models has been used to classify breast cancer into benign, malignant, and normal. This system, in conjunction with a support vector machine classifier, achieved a high accuracy rate of 95.6 % [7]. Karthick et al. [8] proposed a stacking ensemble combined with a custom CNN to categorize breast cancers as normal, benign, or malignant. An accuracy rate of 92.15 % was achieved. Saba et al. [9] analyzed two pretrained deep CNN models using 697 US images, which included benign and malignant breast cancer US images. They achieved an accuracy of 92.8 % using the DensNet201 model. Additionally, an abstract CNN architecture using 14 pre-trained models was developed in a recent study to classify breast cancer US images for two different datasets [10]. The accuracy and F1-score were greater than 90 % when Xception, ResNet152V2, and ResNet101V2 were used. Muduli et al. [11] proposed a deep CNN model for breast-cancer classification using two US datasets, BUS-1 and BUS-2, comprising 780 and 250 images, respectively. The accuracy achieved was 100 % for BUS-1 and 89.73 % for BUS-2. Moreover, combining the CNN model with a spatial attention module to classify US breast cancer tumors resulted in a slight improvement. The accuracy, sensitivity, precision, and F1-score of the model before the spatial attention module were 93.59 %, 94.68 %, 97.84 %, and 96.22 %, respectively. The accuracy, sensitivity, precision, and F1-score of the model after adding the spatial attention module were 94.1 %, 94.34 %, 98.14 %, and 96.5 %, respectively [12].

Recently, a new vision transformer model, ViT-Patch, was introduced by Feng et al. [13] to reduce the need for large amounts of data and achieve high performance. The ViT-Patch was tested on the BUSI dataset and compared with a classic ViT model. The classic ViT achieved accuracy and sensitivity of 85.6 % and 66.7 %, respectively. The ViT-Patch achieved accuracy and sensitivity of 89 % and 69.7 %, respectively. Furthermore, Manzari et al. [14] proposed a hypermodel that combined convolutional layers with the ViT architecture and a transformer augmentation block to embed the augmentation process in ViT. The proposed model was trained and tested on several medical datasets, including the BUSI dataset. The highest accuracy and area under the ROC curve (AUC) were 89.6 % and 0.934, respectively.

Directly applying standardized deep learning models to US images for breast lesion diagnosis is challenging. Unlike mammograms, in which the lesion appearance and shape remain consistent, US images can significantly vary due to noise, diverse imaging modalities, and variations in lesion framing [15]. Although the models in the aforementioned studies have shown potential, they must be further improved. Few studies have applied these techniques using local data, which can affect the distribution of the ground truth labels among different populations [16]. Therefore, local images must be incorporated into the models to address region-specific characteristics. In addition, many studies have relied on a single dataset type, limiting the generalization capability of the models.

Furthermore, the evaluation of breast US images can greatly vary across patients due to variations in lesion margins and echogenicity [17]. This variability can result in missed diagnoses in critical cases with false-negative model performance rates.

To explore new avenues for enhancement and address these challenges, we propose an approach that uses the VGG19 deep-learning model with a three-step image-processing scheme involving speckle noise filtering using a block-matching three-dimensional (3D) filtering technique, region of interest (ROI) highlighting, and RGB fusion. This method enhances the generalization of deep learning models, achieves better performance, adapts the approach to the local data context, and explores new avenues for improving breast lesion diagnosis from US images using deep learning techniques.

The main contributions of this study are as follows.

  • 1.

    We developed an image preprocessing scheme that enhances model predictions compared with baseline results. The proposed image-preprocessing scheme contributes to the model in three ways: it denoises the images, reduces the bias of the model toward images that exhibit high contrast between the ROI and region of noninterest (RONI), and augments the data.

  • 2.

    The proposed image-preprocessing techniques, including ROI highlighting and RGB fusion, can be viewed as new augmentation techniques since they increase the richness and number of training samples.

  • 3.

    We could generalize our model to different datasets, namely the BUSI dataset, which was collected in 2018 from Baheya Hospital in Egypt, the KAIMRC dataset, which was collected in 2022 from MNGHA Hospital, Saudi Arabia, and dataset B, collected in 2012 from the UDIAT Diagnostic Center of the Parc Taul'ı Corporation, Spain.

2. Methodology

2.1. Overview

This study was reviewed and approved by the Ethics Committee of King Abdullah International Medical Research Center (IRB number RC19/316/R). However, the committee waived consent because the data were images of anonymized persons.

The proposed model comprises two main stages: image preprocessing and classification. We developed an image preprocessing scheme comprising three main parts: speckle noise filtering using a block-matching 3D filtering technique to denoise all images [18], ROI highlighting, and RGB fusion. Subsequently, a transfer learning technique was used to train the deep learning model. The proposed model is illustrated in Fig. 1. We considered using various models, such as VGG19, EfficientNet V2, and recent models, to select the best-performing model for the proposed preprocessing scheme. However, EfficientNet V2 was selected because it is more novel than the other models [19]. Therefore, two architectures, VGG19 and EfficientNet V2, were compared using the BUSI dataset with classic augmentation. The following two models were tested.

  • -

    The number of epochs: 50 with early stopping

  • -

    Optimizer: Stochastic gradient descent

  • -

    Initial learning rate: 0.0001

  • -

    Sampling: Tested using 100 random samples

Fig. 1.

Fig. 1

Procedure of our proposed study, which starts with three image preprocessing operations: i) speckle noise filtering, ii) region of interest (ROI) highlighting, and iii) RGB fusion. Subsequently, the processed data is used to perform transfer learning on a deep-learning model.

The same samples were tested in both models, and the results are listed in Table 1. The VGG19 model outperformed efficientNet V2. Accordingly, VGG19 was selected for subsequent processes and analyses.

Table 1.

The results for the two models based on the classic augmentation.

CNN model Accuracy Recall Precision F1-Score
EfficientNet V2 B3 67 % 58 % 71 % 64 %
VGG19 91 % 58 % 97 % 72 %

2.2. Datasets

In this study, two public datasets of breast US images were used: i) the breast ultrasound images dataset (Dataset BUSI) [20] and ii) Dataset B [21]. Additionally, a local dataset from the King Abdullah International Medical Research Center (KAIMRC) [22], obtained from the Ministry of National Guard-Health Affairs Hospital (MNGHA), was also used in this study. The details of these datasets and the class distributions are described in the following subsections and in Table 2, Table 3.

Table 2.

Datasets used in this study and their availability.

Data name Year of availability Number of images Images format Source of the images (country)
Dataset BUSI 2018 780 PNG Egypt
Dataset B 2012 163 PNG Spain
KAIMRC [22] 2022 5693 DICOM Saudi Arabia

Table 3.

Datasets used in this study.

Dataset Number of test images Number of training images without the preprocessing step Number of train images with the preprocessing step Benign Malignant
Dataset BUSI [20] 129 466 9320 437 210
Dataset B [21] 0 163 3260 109 54
KAIMRC B [22] 414 1659 6636 1447 636

2.2.1. Dataset BUSI

BUSI is a public dataset that contains 780 breast US images divided into three classes: benign (437 images), malignant (210 images), and normal (133 images). The dataset was collected in 2018 from Baheya Hospital for early detection and treatment of cancer in women in Cairo, Egypt. The number of patients was 600, and their ages were between 25 and 75 years. The dataset contained tumor segmentation masks. Fig. 2(a)–(d) show samples from the dataset with their corresponding masks.

Fig. 2.

Fig. 2

Samples from the three datasets used in this study. (a) A benign sample from the BUSI dataset, and (b) its corresponding mask. (c) A malignant sample from Dataset BUSI, and (d) its corresponding mask. (e) A benign sample from Dataset B, and (f) its corresponding mask. (g) A malignant sample from Dataset B, (h) its corresponding mask. (i) A benign sample, and (j) a malignant sample. Both datasets originate from the KAIMRC dataset. This dataset does not include masks for detection or segmentation problems.

2.2.2. Dataset B

Dataset B was collected in 2012 from the UDIAT Diagnostic Center of the Parc Taul'ı Corporation, Spain. It is a public dataset that contains 163 breast US images divided into two classes: benign (110 images) and malignant (53 images). The dataset contained the tumor segmentation masks shown in Fig. 2(e–h).

2.2.3. KAIMRC dataset

The third dataset was collected from the MNGHA Hospital. It contains 5693 breast US images, divided into three classes: benign (1447 images), malignant (656 images), and normal (3590 images). This dataset had no segmentation masks. Furthermore, the images were not cropped, and all metadata were included, which required further processing. Fig. 2(i and j) shows some samples from the dataset.

2.3. Preprocessing

To enhance and prepare the breast US images for deep learning, we performed (1) image cropping, (2) noise filtering, (3) ROI highlighting, and (4) RGB fusion. This process is described below.

2.3.1. Image auto-cropping

The model used for classification required all images to have one size and a fixed aspect ratio of 1:1. Therefore, a cropping step was required to match the required aspect ratio of the model. Because public datasets have masks but private datasets do not, two different methods were used for cropping.

For the public dataset, we designed a cropping algorithm using masks with a ratio of 1:1. If the tumor was too large to fill a square image, we added black padding to make the aspect ratio equal to 1:1.

Unlike public datasets, the local dataset does not contain masks, and the images in the local dataset are not cropped. Therefore, a cropping step is required to match the forms of the public datasets. Because manual cropping is a time-consuming process performed by an expert, we used the mask-RCNN segmentation model [23]. This model was pre-trained on the MS COCO dataset [24] and fine-tuned on two public datasets to detect ROIs in private dataset images. Cropping was performed based on all suggested segmentations, and a buffer area was added to avoid missing the tumor as much as possible. If an image did not exhibit a 1:1 ratio, black padding was added to its borders. Fig. 3 shows an example of the cropping step applied to a private dataset.

Fig. 3.

Fig. 3

Samples from private datasets were cropped using the mask-RCNN segmentation model. We cropped the images based on all the suggested segmentations.

2.3.2. Image denoising using block-matching 3D filtering

A US denoising step is required to attenuate the speckle noise observed in US images. We used a block-matching 3D filtering technique to denoise all the images [18]. This algorithm first groups data that exhibit high correlation into blocks and then applies a 3D transform to all groups. Noise was attenuated by applying hard thresholding to transform the coefficients. Finally, the estimated true image signal was obtained by applying a 3D inverse transform to all groups. Fig. 4 shows the flowchart of this algorithm. Fig. 5 (a) illustrates an image from the KAIMRC dataset that exhibits high speckle noise, and Fig. 5 (b) presents its variant after denoising.

Fig. 4.

Fig. 4

The first step in the proposed preprocessing procedure is to denoise the dataset using a block-matching 3D filtering technique.

Fig. 5.

Fig. 5

The first step in the proposed preprocessing procedure is to denoise the dataset using a block-matching 3D filtering technique. (a) A sample exhibiting high speckle noise, and (b) the same image after denoising.

2.3.3. ROI highlighting

To highlight the breast tumor ROI, we used masks provided by the public datasets. Because the private dataset did not contain masks, this step was applied only to public datasets. These masks allowed us to apply the same image-processing operation to the tumor region, ROI, and RONI. This technique produces several possibilities for altering the image, and we used it to normalize the images, in which we changed the contrast between the region of the tumor and the RONI to different values. Tumors and glandular tissues generally exhibit high contrast, which can be used as a distinctive feature in deep learning models. However, the contrast between fatty tissues, or posterior shadows, and glandular tissues is similar to that between the tumor and glandular tissues. Furthermore, because the breast mainly comprises fatty tissues, the contrast between the tumor and normal tissues is extremely low. Therefore, we aimed to train the classification model under these different conditions by changing the contrast between tumor and normal tissues. First, we took a sample image and created two altered versions of this sample, in which each version exhibited a different contrast level. Thereafter, we multiplied one altered version with the mask, producing a cropped version that included only the tumor region. However, the second altered version was multiplied by the inverted mask, producing a cropped version that excluded the tumor region. Finally, a fitted image was produced by adding the two altered cropped versions. This image appeared similar to the original images, except that the ROI became more highlighted because it exhibited a different contrast level than the RONI. This step increases the richness of the data when the ROI is highlighted, normalizes the dataset with different contrast levels, and augments the public datasets five times. Fig. 6 shows the procedure used in this step.

Fig. 6.

Fig. 6

The second step of our preprocessing method is to highlight the region of interest (ROI) in the images using their corresponding masks.

2.3.4. RGB fusion

The final preprocessing step was RGB fusion. The concept of fusing different variants of an image into the three channels of an RGB image has been applied in various studies [[25], [26], [27], [28], [29]]. Our method was influenced by the idea proposed by Yap et al. [27], which did not enhance their results. Its main concept involves applying two different image-processing operations to an image and concatenating the results with the original image into an RGB image. We modified it to allow different orders of RGB channels, rather than having one fixed order. RGB channels can be placed into six different orders; however, we used only three orders, all of which were included in our dataset, along with the original image. Because this step does not require any external information, it was applied to all images. Therefore, the total number of images was augmented four times in this step, as the original image and three variants were included. Fig. 7 shows the application of this technique to a single sample.

Fig. 7.

Fig. 7

The final step of our preprocessing procedure is to fuse two variants of an image along with its original form into the three channels of the RGB image.

2.4. CNN model

A CNN was used for feature extraction and classification. The pretrained VGG19 model proposed by Simonyan and Zisserman [30] comprised 19 hidden layers and was pretrained on the ImageNet dataset to circumvent data scarcity limitations. VGG19 has been widely used to achieve good results in the medical field [25,31,32]. In our study, we used the Keras implementation of VGG19, which was constructed to operate on the TensorFlow 2.0 platform. We employed the transfer learning technique and retrained the model using the processed data. Fig. 8 illustrates the architecture of the model.

Fig. 8.

Fig. 8

Architecture of the VGG19 model [28], which consists of 19 hidden layers and ranges between the convolutional and pooling layers.

A CNN is a deep-learning model designed to handle images. It provides an end-to-end scheme for feature extraction and classification without requiring handcrafted features. It is also effective as a feature extractor for other applications such as object detection and object segmentation because it can serve as a backbone for these models. A CNN comprises convolutional and pooling layers. Convolutional layers are used to extract features from images by convolving the kernel with the input image. The output of these layers is called the feature map, which serves as the input for the next layer. The pooling layer is used to reduce the size of the feature map, allowing the following convolutional layers to extract more holistic features as the feature map becomes smaller by the pooling layer. Consequently, the features extracted at the initial convolutional layers were low-level, whereas those extracted at the top layers were high-level. Additionally, the pooling layer reduces the computation time as the image size decreases.

3. Performance evaluation

3.1. Cross-validation and model evaluation

For all experiments, we used five-fold cross-validation, which ensured that each image was tested exactly once. Although other techniques satisfy this condition, such as single and pair holdout random sampling, they are less common and more time-consuming than cross-validation sampling [33]. Moreover, empirical studies have shown that the best results are obtained when the model learns patterns from the data if 20–30 % is used for testing and 70–80 % for training [34]. Therefore, the selected ratio was ideal for the data used. Within each fold, 20 % of the data was used for testing, and the remaining 80 % was used for training. An illustration of five-fold cross-validation is shown in Fig. 9. We further split the training set into training and validation sets in a 90:10 ratio to tune the number of epoch hyperparameters. Subsequently, the validation and training sets were combined to construct each model. This ratio was selected as the benchmark standard [35,36]. Hence, only the aforementioned ratio was tested without considering any further options to avoid optimization bias. We selected the best-performing model on the validation set for testing. Furthermore, we performed two separate tests: one for the local dataset and the other for the public dataset (i.e., the BUSI dataset).

Fig. 9.

Fig. 9

Five-fold cross-validation sampling to train and test the model.

3.2. Experiment environment

To train the network, we used a learning rate of 0.0001, batch size of 32, and maximum number of epochs of 30. The training process was configured to stop earlier if the validation accuracy did not show improvement, and the patience parameter was set to three epochs. Finally, we assigned a higher weight to the malignant class to reduce the bias toward the dominant class. The weights assigned to each class are based on the number of samples in each class. Subsequently, the weighted sums of both classes were equal. We used five-fold cross-validation as the evaluation method in this study to reduce the variability of our model.

The proposed method was developed in Python. For classification, we utilized the Keras implementation of VGG19, which ran on TensorFlow 2. For auto-image cropping, we utilized mask-RCNN, which runs on the Pytorch platform. The system was operated on the Windows 10 operating system. The hardware specifications included an Nvidia RTX 3060 12 GB, 48 GB RAM, and AMD Ryzen 7 2700 eight-core processor 3.20 GHz.

3.3. Performance metrics

Six quantitative performance metrics were used to evaluate and compare the proposed method: accuracy, recall, specificity, precision, F1-score (equations (1), (2), (3), (4), (5))), and AUC.

Accuracy(ACC)=TP+TNTP+FP+TN+FN (1)
Recall=TPTP+FN (2)
Specificity(SPEC)=TNFP+TN (3)
Precision=TPTP+FP (4)
F1score=2*Precision*RecallPrecision+Recall (5)

In the ROC curve, the x-axis represents the false-positive rate, and the y-axis represents the true-positive rate. The ROC curve was used to compare the different models at different classification thresholds. However, the area under each ROC curve, known as the AUC, was measured to precisely compare the different models [37,38]. Both the ROC curve and AUC have become standards for comparison and have been used in various studies [6,7,39,40].

3.4. ROI highlighting

In this step, we used masks to generate four different variants for each sample from the public datasets, including the original samples. All four variants, including the original samples, were included in the training and validation subsets. However, during testing, we separated each variant from all the samples in a separate test. Separate tests were conducted to determine the influence of each image variant on the overall performance and model outcomes when masks were not provided. Therefore, for the BUSI dataset [20], six tests were performed for each fold.

  • (1)

    Test 1: Applied to the original samples of the testing set.

  • (2)

    Test 2: Applied to variant Type 1 of the testing set.

  • (3)

    Test 3: Applied to variant Type 2 of the testing set.

  • (4)

    Test 4: Applied to variant Type 3 of the testing set.

  • (5)

    Test 5: Applied to variant Type 4 of the testing set.

  • (6)

    Test 6: Applied to all variants and original samples, and thereafter the average of the probabilities of each variant type was determined.

Since the KAIMRC dataset did not contain segmentation masks, only one test set was used for each fold [19].

4. Results

We used the pretrained model VGG19 to compare three different image preprocessing procedures: (1) denoising only (base model); (2) denoising and RGB fusion; and (3) denoising, RGB fusion, and ROI highlighting.

The first two procedures were tested on the BUSI and KAIMRC datasets [20,22]. Because ROI highlighting was applied only to public datasets, the third procedure was tested only on the BUSI dataset.

In the BUSI dataset (Table 4), the accuracy of the base model in classifying the malignant class was poor, as it could correctly identify only 76.8 % of the total malignant samples. Moreover, only 67.8 % of the samples classified as malignant were malignant. The performance slightly improved after applying RGB fusion. The recall increased from 76.8 % to 85.2 %; however, the precision only marginally increased from 67.8 % to 69.0 %. These values are representative of the performance of the model in the absence of masks. After applying ROI highlighting, the precision increased from 69.0 % to 77.4 %, and the F1-score increased from 75.8 % to 83.8 % in Test 1. Based on the other tests, we observed that the best accuracy (87.8 %) and precision (80.8 %) were achieved in Test 4. The best recall (87.4 %) and the best F1-score (87.4 %) were achieved in Test 3. The best specificity and AUC ROC were achieved in Tests 2 and 6, respectively. Moreover, the results indicated that image variant Type 2 contributed to reducing the false-negative rate, thereby increasing the F1-score. Image variant Type 3 helped reduce the false-positive rate while achieving the highest accuracy. However, other image variants achieved comparable performances; therefore, all image variants were used to conduct Test 6, for which the best AUC was achieved. The ROCs of all the tests on the public dataset are shown in Fig. 10(a–h) Whereas the ROCs for the baseline model without the proposed scheme are shown in Fig. 10(a–b) and the ROCs for the model with the proposed scheme are shown in Fig. 10(c–h).

Table 4.

Results of the VGG19 model after three different preprocessing procedures were applied to BUSI dataset [17].

Preprocessing procedure Mean Accuracy Mean Recall Mean Precision Mean Specificity Mean F1-Score Mean ROC AUC
Denoising 80 % 76.8 % 67.8 % 81.6 % 71.4 % 0.8479
Denoising, and RGB fusion 82.0 % 85.2 % 69.0 % 80.4 % 75.8 % 0.907
Denoising, RGB fusion, and ROI highlighting (Test 1) 85.4 % 83.8 % 77.4 % 86.2 % 83.8 % 0.9364
Denoising, RGB fusion, and ROI highlighting (Test 2) 87.4 % 82.8 % 80.9 % 90.0 % 82.8 % 0.9466
Denoising, RGB fusion, and ROI highlighting (Test 3) 85.6 % 87.4 % 75.4 % 84.8 % 87.4 % 0.9450
Denoising, RGB fusion, and ROI highlighting (Test 4) 87.8 % 83.8 % 80.8 % 89.8 % 83.8 % 0.9463
Denoising, RGB fusion, and ROI highlighting (Test 5) 85.6 % 86.9 % 76.0 % 85.0 % 86.9 % 0.9451
Denoising, RGB fusion, and ROI highlighting (Test 6) 86.8 % 86.4 % 78.6 % 87.4 % 86.5 % 0.9497

Fig. 10.

Fig. 10

The ROC graphs were extracted from each test performed on the public dataset: (a) ROC graph of the denoising method; (b) ROC graph of the denoising + RGB method; (c) ROC graph of Test 1; (d) ROC graph of Test 2; (e) ROC graph of Test 3. (f) ROC graph of Test 4; (g) ROC graph of Test 5; (h) ROC graph of Test 6.

According to the testing results for the KAIMRC dataset in Table 5, the base model had the same problem in detecting the malignant class, as it could detect malignancy in only 59.8 % of the total malignant samples, and only 74.0 % of the samples classified as malignant were malignant. After applying the RGB fusion method, the recall increased from 59.8 % to 76.4 %, and the precision slightly increased from 74.0 % to 75.8 %. This resulted in an increase in the F1-score from 66.0 % to 76.0 % and the AUC ROC from 0.839 to 0.9. The ROCs of all tests on the private dataset are shown in Fig. 11(a–b). Whereas, the ROC of the baseline model is shown in Fig. 11 (a) and the ROC for the proposed scheme shows is shown in Fig. 11 (b).

Table 5.

Results of the VGG19 model after three different preprocessing procedures were applied to KAIMRC dataset [19].

Preprocessing procedure Mean Accuracy Mean Recall Mean Precision Mean Specificity Mean F1-Score Mean ROC AUC
Denoising 81.6 % 59.8 % 74.0 % 90.6 % 66.0 % 0.839
Denoising, and RGB fusion 85.2 % 76.4 % 75.8 % 89.4 % 76.0 % 0.9

Fig. 11.

Fig. 11

ROC graphs were extracted for each test, and the ROCs of the five folds were grouped. These graphs correspond to the tests performed on the private dataset. (a) ROC graph of the denoising method; (b) ROC graph of the denoising + RGB method.

By observing the results from both datasets, we concluded that RGB fusion can enhance the ability of the model to correctly detect malignant samples and reduce the false-negative rate. Highlighting the ROI increased the precision of the model and enhanced the F1-score. Table 6, Table 7 list the results and improvement ratios, respectively. The features imposed on some testing samples are shown in Fig. 12(a–d). The samples from KAIMRC dataset are shown in Fig. 12(a–b), while the samples from BUSI dataset are shown in Fig. 12(c–d).

Table 6.

Summary of the results of VGG19 for both datasets, BUSI and KAIMRC, based on the highest accuracy.

Dataset Mean Accuracy Mean Recall Mean Precision Mean Specificity Mean F1-Score Mean ROC AUC
BUSI dataset 87.8 % 83.8 % 80.8 % 89.8 % 83.8 % 0.9463
KAIMRC dataset 85.2 % 76.4 % 75.8 % 89.4 % 76.0 % 0.9

Table 7.

The ratios of improvement in the accuracy measures of the proposed method to those of the baseline.

Dataset Mean Accuracy Mean Recall Mean Precision Mean Specificity Mean F1-Score Mean ROC AUC
BUSI dataset 1.0975 1.09 1.192 1.1 1.174 1.116
KAIMRC dataset 1.044 1.278 1.024 0.987 1.152 1.073

Fig. 12.

Fig. 12

The features extracted from the proposed model are imposed on the testing samples. (a) A malignant sample and (b) a benign sample from the KAIMRC dataset [19]. (c) A malignant sample and (d) a benign sample from the BUSI dataset [17].

5. Discussion

Hospitals greatly rely on the expertise of experienced radiologists for the correct classification of benign and malignant masses in US images [41]. However, in many cases, the morphological features of masses in US images overlap due to noise, diverse imaging modalities, and variations in lesion framing, resulting in classification errors that lead to unnecessary follow-ups and biopsies, which burden clinicians and increase the risk of missing critical cases [42]. Therefore, improved methods are necessary to enhance the accuracy and efficiency of mass classification in US images. Consequently, our deep-learning model, which utilizes image preprocessing, can assist in identifying and prioritizing malignant cases for radiologist reviews.

This study showed that accurate classification can be achieved using a deep learning model after applying the RGB fusion approach, which relies on the ROI. RGB fusion enhanced the ability of the model to detect more malignant samples and reduced the false-negative rate. ROI highlighting increased the precision of the model and thus enhanced the F1-score. These findings indicate that deep learning can be improved using image-processing techniques. Compared to the performance of previously published deep-learning models using the VGG19 and BUSI public datasets, our approach shows highly competitive performance and can be generalized to different US image models. However, no study has used both the VGG19 model and the BUSI public dataset. Therefore, we present a comparison of the results of our approach with those of studies that use either the VGG19 model or the BUSI public dataset.

Two studies reported results using the VGG19 model. Zhang et al. [43] constructed different prediction models using a CNN (based on InceptionV3, VGG16, ResNet50, and VGG19) to predict the malignant and benign features in US images. The results of the best model were compared with the sonographic examination and interpretation of 683 images (benign, 493; malignant, 190), and the obtained AUC of the best model was 0.913. However, this model was developed using only one image resource. Another recent study conducted by Ragab et al. [44] developed a new ensemble deep learning-led clinical decision support system for breast cancer classification based on three deep learning models (VGG-16, VGG-19, and SqueezeNet) for feature extraction. The model parameters were determined using a public dataset containing 780 images. The ROC curve of the best model was 0.997 after 500 epochs. However, their system was trained and tested using only a single public dataset. Similarly, Lu [12] applied a spatial attention mechanism to a pre-trained CNN model, ResNet 18. The model is known for its skip connections in its architecture, and the authors added a spatial attention module to the path of the skip connections. The accuracy, sensitivity, precision, and F1-score of the model before the application of the spatial attention module were 93.59 %, 94.68 %, 97.84 %, and 96.22 %, respectively. After including the spatial attention module, a slight improvement was observed. The accuracy, sensitivity, precision, and F1-score were 94.1 %, 94.34 %, 98.14 %, and 96.5 %, respectively. However, these models were trained and tested using only one public dataset.

Furthermore, several studies have been published that use the public BUSI dataset. Moon et al. [40] proposed an ensemble deep-learning model trained on different variants and fused US images. They selected the best-performing model for each image variant and added an ensemble-averaging layer at the end of the model. The accuracy, sensitivity, specificity, precision, F1-score, and AUC of the best-performing model were 94.62 %, 92.31 %, 95.60 %, 90.00 %, 91.14 %, and 0.9711, respectively. However, the sampling technique used was simple random sampling, which does not guarantee that the model was tested on all samples. In other words, simple random sampling requires only a small portion of a dataset for testing. Moon et al. tested the model on 20 % of the dataset, which is relatively small considering that the BUSI dataset [20] contains only 210 malignant images. Therefore, only 42 malignant tissue samples were tested in this study. Another study carried out by Eroğlu et al. [7] employed the BUSI public dataset. They applied three different deep learning models to the same dataset and concatenated the features extracted from these models. Subsequently, the images were filtered using the mRMR algorithm and classified into one of three classes: benign, malignant, or normal. The average accuracy, sensitivity, specificity, and F1-score were 95.6 %, 95.4 %, 97.8 %, and 95.6 %, respectively. However, similar to the study by Moon et al. [40], the model was trained and tested using the BUSI dataset. Moreover, the disadvantage of basic random sampling, which does not guarantee that all samples will be tested, has been criticized.

The effectiveness of vision transformers in medical imaging has been widely investigated [13,14]. For example, Dosovitskiy et al. [45] reported that ViTs require large datasets to outperform CNN. However, medical image datasets are small. Therefore, Manzari et al. [14] proposed a hybrid model that combines ViT global connectivity with the locality of the CNN. Moreover, they proposed transformer augmentation, a new technique for embedding the augmentation process into the ViT model. The model was tested on several medical image datasets, including the BUSI dataset, which was the only US dataset tested. The best accuracy and AUC were 89.6 % and 0.934, respectively.

Our study focused on image preprocessing techniques and the utilization of available data to improve the performance of deep learning models. Image preprocessing with data augmentation has been widely used to address the problems of small datasets [46]. The purpose of implementing ROI highlighting and RGB fusion is to emphasize the lesion region by increasing the richness of the features, normalizing the dataset with different levels of contrast, and augmenting the dataset. When applying these techniques, the performance of the model was enhanced, particularly the F1-score, recall, and precision, indicating better classification of breast cancer tumors into (benign or malignant). Moreover, the model can be generalized to different datasets.

6. Limitations and future work

This study focuses on enhancing the predictions of deep learning models using image preprocessing. The methodologies used required additional information, such as masks, which were available for only two public datasets. However, the images in the local dataset were not cropped and did not contain masks. Therefore, we used the segments proposed by the mask-R-CNN segmentation model to crop the local dataset, which sometimes included noisy information from a large spatial area and did not include lesions. In future studies, manual cropping of the local dataset is recommended to make the dataset more informative. Providing masks for the local dataset enhanced the cropping quality and allowed the application of the ROI-highlighting technique to the local dataset. Based on the results for the BUSI dataset [20], the ROI highlighting technique improved the performance of the model.

We can also attempt to expand the RGB fusion technique by applying different image processes rather than using only three.

This study focuses on image preprocessing to improve the performance of the classification model. However, future studies should focus on classification models, such as ensemble learning or designing our own CNN, to develop an efficient model. Additional advanced techniques and testing methodologies, such as those in Refs. [47,48], can improve model outcomes. Moreover, including more images from additional resources must be considered to improve the performance of the system.

7. Conclusion

In this study, we proposed an image preprocessing scheme that augments the dataset and emphasizes the lesion area. The proposed preprocessing scheme consists of three steps: noise filtering using a block-matching 3D filtering technique, ROI highlighting, and RGB fusion. We applied transfer learning using three different resources of processed data, and the experiments demonstrated that the proposed image-preprocessing scheme can potentially improve the performance of a deep learning model used for diagnosing US breast lesion images. Moreover, we proved the generalizability of our model, as the model was tested on several data sources. Deep learning models have great potential for improving the performance of breast cancer detection in US screening as datasets and computational resources expand. Our approach can assist in prioritizing malignant cases for analysis by a radiologist or a second reader to confirm the diagnosis. The developed image preprocessing scheme can be applied to other medical imaging modalities to simplify feature detection.

Ethics declarations

This study was reviewed and approved by the Research Ethics Committee of King Abdullah International Medical Research Center (IRB number RC19/316/R). Informed consent was not required for this study because the research data and images were anonymized, and the institutional ethics committee determined it was not necessary based on the study's nature.

Data availability statement

Data will be made available on request.

Additional information

No additional information is available for this paper.

CRediT authorship contribution statement

Mohammed Alotaibi: Writing – review & editing, Methodology, Investigation, Formal analysis. Abdulrhman Aljouie: Writing – review & editing, Supervision, Methodology, Conceptualization. Najd Alluhaidan: Visualization, Validation, Data curation. Wasem Qureshi: Validation. Hessa Almatar: Writing – review & editing, Writing – original draft, Data curation. Reema Alduhayan: Writing – original draft, Data curation. Barrak Alsomaie: Writing – review & editing, Supervision, Funding acquisition. Ahmed Almazroa: Writing – review & editing, Project administration, Methodology, Investigation, Funding acquisition, Data curation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by King Abdullah International Medical Research Center (RC-19-316 R).

References

  • 1.Sung H., et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J. Clin. 2021;71(3):209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
  • 2.Alotaibi R.M., Rezk H.R., Juliana C.I., Guure C. Breast cancer mortality in Saudi Arabia: modelling observed and unobserved factors. PLoS One. 2018;13(10) doi: 10.1371/journal.pone.0206148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Esmaeili M., Ayyoubzadeh S.M., Ahmadinejad N., Ghazisaeedi M., Nahvijou A., Maghooli K. A decision support system for mammography reports interpretation. Health Inf. Sci. Syst. 2020;8(1–8) doi: 10.1007/s13755-020-00109-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Berg W.A. Reducing unnecessary biopsy and follow-up of benign cystic breast lesions. Radiology. 2020;295(1):52–53. doi: 10.1148/radiol.2020200037. Radiological Society of North America. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yap M.H., Edirisinghe E.A., Bez H.E. A novel algorithm for initial lesion detection in ultrasound breast images. J. Appl. Clin. Med. Phys. 2008;9(4):181–199. doi: 10.1120/jacmp.v9i4.2741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wang Y., Choi E.J., Choi Y., Zhang H., Jin G.Y., Ko S.-B. Breast cancer classification in automated breast ultrasound using multiview convolutional neural network with transfer learning. Ultrasound Med. Biol. 2020;46(5):1119–1132. doi: 10.1016/j.ultrasmedbio.2020.01.001. [DOI] [PubMed] [Google Scholar]
  • 7.Eroğlu Y., Yildirim M., Cinar A. Convolutional Neural Networks based classification of breast ultrasonography images by hybrid method with respect to benign, malignant, and normal using mRMR. Comput. Biol. Med. 2021;133 doi: 10.1016/j.compbiomed.2021.104407. [DOI] [PubMed] [Google Scholar]
  • 8.Karthik R., Menaka R., Kathiresan G.S., Anirudh M., Nagharjun M. Gaussian dropout based stacked ensemble CNN for classification of breast tumor in ultrasound images. Irbm. 2022;43(6):715–733. [Google Scholar]
  • 9.Saba T., Abunadi I., Sadad T., Khan A.R., Bahaj S.A. Optimizing the transfer‐learning with pretrained deep convolutional neural networks for first stage breast tumor diagnosis using breast ultrasound visual images. Microsc. Res. Tech. 2022;85(4):1444–1453. doi: 10.1002/jemt.24008. [DOI] [PubMed] [Google Scholar]
  • 10.Balaha H.M., Saif M., Tamer A., Abdelhay E.H. Hybrid deep learning and genetic algorithms approach (HMB-DLGAHA) for the early ultrasound diagnoses of breast cancer. Neural Comput. Appl. 2022;34(11):8671–8695. [Google Scholar]
  • 11.Muduli D., Dash R., Majhi B. Automated diagnosis of breast cancer using multi-modal datasets: a deep convolution neural network based approach. Biomed. Signal Process Control. 2022;71 doi: 10.1016/j.bspc.2021.103126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lu S.-Y., Wang S.-H., Zhang Y.-D. SAFNet: a deep spatial attention network with classifier fusion for breast cancer detection. Comput. Biol. Med. 2022;148 doi: 10.1016/j.compbiomed.2022.105812. [DOI] [PubMed] [Google Scholar]
  • 13.Feng H., et al. Identifying malignant breast ultrasound images using ViT-patch. Appl. Sci. 2023;13(6):3489. [Google Scholar]
  • 14.Manzari O.N., Ahmadabadi H., Kashiani H., Shokouhi S.B., Ayatollahi A. MedViT: a robust vision transformer for generalized medical image classification. Comput. Biol. Med. 2023;157 doi: 10.1016/j.compbiomed.2023.106791. [DOI] [PubMed] [Google Scholar]
  • 15.Qi X., et al. Computer-aided diagnosis of breast cancer in ultrasonography images by deep learning. Neurocomputing. 2022;472:152–165. [Google Scholar]
  • 16.Dou Q., Coelho de Castro D., Kamnitsas K., Glocker B. Domain generalization via model-agnostic learning of semantic features. Adv. Neural Inf. Process. Syst. 2019;32 [Google Scholar]
  • 17.Kopans D.B., D'Orsi C.J., Adler D.E.D. vA American College of,” Radiology; Reston: 1993. Breast Imaging Reporting and Data System. [Google Scholar]
  • 18.Dabov K., Foi A., Katkovnik V., Egiazarian K. Image denoising with block-matching and 3D filtering. Image Process.: Algorith. Syst. Neural Netw. Machine Learn. 2006;6064:354–365. [Google Scholar]
  • 19.Tan M., Le Q. International Conference on Machine Learning. 2021. Efficientnetv2: smaller models and faster training; pp. 10096–10106. [Google Scholar]
  • 20.Al-Dhabyani W., Gomaa M., Khaled H., Fahmy A. Dataset of breast ultrasound images. Data Brief. 2020;28 doi: 10.1016/j.dib.2019.104863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Yap M.H., et al. Automated breast ultrasound lesions detection using convolutional neural networks. IEEE J. Biomed. Heal. informatics. 2017;22(4):1218–1226. doi: 10.1109/JBHI.2017.2731873. [DOI] [PubMed] [Google Scholar]
  • 22.Almazroa A., et al. King Abdullah International Medical Research Center (KAIMRC)’s breast cancer big images data set. Med. Imag. 2022: Imag. Inf. Healthcare, Res. Appl. 2022;12037:77–83. [Google Scholar]
  • 23.He K., Gkioxari G., Dollár P., Girshick R. Proceedings of the IEEE International Conference on Computer Vision. 2017. Mask r-cnn; pp. 2961–2969. [Google Scholar]
  • 24.Lin T.-Y., et al. Computer Vision–ECCV 2014: 13th European Conference. 2014. Microsoft coco: common objects in context; pp. 740–755. Zurich, Switzerland, September 6-12, 2014, Proceedings, Part vol. 13. [Google Scholar]
  • 25.Byra M. Breast mass classification with transfer learning based on scaling of deep representations. Biomed. Signal Process Control. 2021;69 [Google Scholar]
  • 26.Zhuang Z., Yang Z., Raj A.N.J., Wei C., Jin P., Zhuang S. Breast ultrasound tumor image classification using image decomposition and fusion based on adaptive multi-model spatial feature fusion. Comput. Methods Progr. Biomed. 2021;208 doi: 10.1016/j.cmpb.2021.106221. [DOI] [PubMed] [Google Scholar]
  • 27.Yap M.H., et al. Breast ultrasound region of interest detection and lesion localisation. Artif. Intell. Med. 2020;107 doi: 10.1016/j.artmed.2020.101880. [DOI] [PubMed] [Google Scholar]
  • 28.Shakya A., Biswas M., Pal M. CNN-based fusion and classification of SAR and Optical data. Int. J. Rem. Sens. 2020;41(22):8839–8861. [Google Scholar]
  • 29.Liu S., Liu Z. “Multi-Channel CNN-Based Object Detection Enhanced Situation Awareness,”. arXiv Prepr. 2017 doi: 10.48550/arXiv.1712.00075. [DOI] [Google Scholar]
  • 30.Simonyan K., Zisserman A. “Very deep convolutional networks for large-scale image recognition,”. 2014;1556 arXiv Prepr. arXiv1409. [Google Scholar]
  • 31.Byra M., et al. Breast mass classification in sonography with transfer learning using a deep convolutional neural network and color conversion. Med. Phys. 2019;46(2):746–755. doi: 10.1002/mp.13361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Antropova N., Huynh B.Q., Giger M.L. A deep feature fusion methodology for breast cancer diagnosis demonstrated on three imaging modality datasets. Med. Phys. 2017;44(10):5162–5171. doi: 10.1002/mp.12453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Fushiki T. Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 2011;21:137–146. [Google Scholar]
  • 34.Gholamy A., Kreinovich V., Kosheleva O. A pedagogical explanation; 2018. Why 70/30 or 80/20 Relation between Training and Testing Sets. [Google Scholar]
  • 35.Yang J., Shi R., Ni B. 2021 IEEE 18th International Symposium on Biomedical Imaging. ISBI; 2021. Medmnist classification decathlon: a lightweight automl benchmark for medical image analysis; pp. 191–195. [Google Scholar]
  • 36.Yang J., et al. MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification. Sci. Data. 2023;10(1):41. doi: 10.1038/s41597-022-01721-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Metz C.E. Basic principles of ROC analysis. Semin. Nucl. Med. 1978;8(4):283–298. doi: 10.1016/s0001-2998(78)80014-2. [DOI] [PubMed] [Google Scholar]
  • 38.Hanley J.A., McNeil B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
  • 39.Sun Q., et al. Deep learning vs. radiomics for predicting axillary lymph node metastasis of breast cancer using ultrasound images: don't forget the peritumoral region. Front. Oncol. 2020;10(53) doi: 10.3389/fonc.2020.00053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Moon W.K., Lee Y.-W., Ke H.-H., Lee S.H., Huang C.-S., Chang R.-F. Computer‐aided diagnosis of breast ultrasound images using ensemble learning from convolutional neural networks. Comput. Methods Progr. Biomed. 2020;190 doi: 10.1016/j.cmpb.2020.105361. [DOI] [PubMed] [Google Scholar]
  • 41.Steifer T., Lewandowski M. Ultrasound tissue characterization based on the Lempel–Ziv complexity with application to breast lesion classification. Biomed. Signal Process Control. 2019;51:235–242. [Google Scholar]
  • 42.Demircioğlu Ö., Uluer M., Arıbal E. How many of the biopsy decisions taken at inexperienced breast radiology units were correct? J. Breast Heal. 2017;13(1):23. doi: 10.5152/tjbh.2016.2962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Zhang H., Han L., Chen K., Peng Y., Lin J. Diagnostic efficiency of the breast ultrasound computer-aided prediction model based on convolutional neural network in breast cancer. J. Digit. Imag. 2020;33:1218–1223. doi: 10.1007/s10278-020-00357-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ragab M., Albukhari A., Alyami J., Mansour R.F. Ensemble deep-learning-enabled clinical decision support system for breast cancer diagnosis and classification on ultrasound images. Biology. 2022;11(3):439. doi: 10.3390/biology11030439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Dosovitskiy A., et al. “An image is worth 16x16 words: Transformers for image recognition at scale,”. 2020 arXiv Prepr. arXiv2010.11929 1-22. [Google Scholar]
  • 46.Lemley J., Bazrafkan S., Corcoran P. Smart augmentation learning an optimal data augmentation strategy. IEEE Access. 2017;5:5858–5869. [Google Scholar]
  • 47.Zamri N.E., Azhar S.A., Mansor M.A., Alway A., Kasihmuddin M.S.M. Weighted random k satisfiability for k= 1, 2 (r2SAT) in discrete Hopfield neural network. Appl. Soft Comput. 2022;126 [Google Scholar]
  • 48.Zamri N.E., et al. Multi-discrete genetic algorithm in hopfield neural network with weighted random k satisfiability. Neural Comput. Appl. 2022;34(21):19283–19311. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES