Skip to main content
PLOS One logoLink to PLOS One
. 2023 Feb 16;18(2):e0275194. doi: 10.1371/journal.pone.0275194

Cancer detection for small-size and ambiguous tumors based on semantic FPN and transformer

Jingzhen He 1, Jing Wang 2,*, Zeyu Han 3, Baojun Li 4, Mei Lv 5, Yunfeng Shi 2,*
Editor: Mohamed Hammad6
PMCID: PMC9934456  PMID: 36795663

Abstract

Early detection of tumors has great significance for formative detection and determination of treatment plans. However, cancer detection remains a challenging task due to the interference of diseased tissue, the diversity of mass scales, and the ambiguity of tumor boundaries. It is difficult to extract the features of small-sized tumors and tumor boundaries, so semantic information of high-level feature maps is needed to enrich the regional features and local attention features of tumors. To solve the problems of small tumor objects and lack of contextual features, this paper proposes a novel Semantic Pyramid Network with a Transformer Self-attention, named SPN-TS, for tumor detection. Specifically, the paper first designs a new Feature Pyramid Network in the feature extraction stage. It changes the traditional cross-layer connection scheme and focuses on enriching the features of small-sized tumor regions. Then, we introduce the transformer attention mechanism into the framework to learn the local feature of tumor boundaries. Extensive experimental evaluations were performed on the publicly available CBIS-DDSM dataset, which is a Curated Breast Imaging Subset of the Digital Database for Screening Mammography. The proposed method achieved better performance in these models, achieving 93.26% sensitivity, 95.26% specificity, 96.78% accuracy, and 87.27% Matthews Correlation Coefficient (MCC) value, respectively. The method can achieve the best detection performance by effectively solving the difficulties of small objects and boundaries ambiguity. The algorithm can further promote the detection of other diseases in the future, and also provide algorithmic references for the general object detection field.

Introduction

Breast cancer is one of the most deadly malignancies among women worldwide [1]. The difficulty of accurately screening for early-stage tumors has increased mortality from breast cancer [2, 3]. Therefore, early identification of breast cancer is necessary for proper treatment and recovery. With the increasing quantity of mammograms in hospitals, manual reading has become complex and time-consuming for radiologists. Computer-aided detection system assists radiologists in diagnosis, with the goal of reducing screening time and improving early detection accuracy [46]. However, there are some challenges in image feature extraction and accurate detection of early breast mammograms. Firstly, there are some differences between cancerous and noncancerous breast tissue on imaging, but the difference in early cancer diagnosis is minimal. The low signal-to-noise ratio (SNR) [7, 8] of mass compared with the surrounding tissue leads to feature extraction difficulties in the lesion region. Secondly, the varying size of cancer masses is one of the challenges of detection, especially when small ones are difficult to detect. The third challenge is the blurring of tumor boundaries, which may cause visual confusion leading to inefficient target regression. Therefore, cancer detection remains a challenging task.

In small object detection, early research focused on feature extraction from small-sized regions. The classical Feature Pyramid Network (FPN) [9] algorithm achieves the extraction of multi-scale features through a top-down multi-level architecture. However, the upsampling operation used by FPN loses the position information of small objects. Recently, the popular transformer attention model [10, 11] can effectively capture objects regions based on encoder-decoder and attention mechanisms. In addition, Wu et al. [12] disentangle the sibling head into two independent branches for classification and localization. The feature coupling method can effectively improve the performance of small-size detection. Two characteristics of early medical images are small tumor areas and blurred borders. Thus, it is necessary to design an effective method to enrich high-resolution and local attention features with semantic information from multi-level feature maps.

To solve mentioned above problems, the paper proposes a detection framework for medical imaging, named Semantic Pyramid Network with a Transformer Self-attention (SPN-TS). As shown in Fig 1(A), without enough semantic information, low-level feature maps are difficult to detect small tumor objects and have good performance. Fig 1(A) shows the traditional FPN by top-down and horizontal connection. The way does not yield sufficient semantic information, resulting in poor feature extraction from small tumor regions. In this paper, we creatively design a new FPN feature extraction scheme, which changes the traditional connectivity and improves the semantic information. As shown in Fig 1(B), the SPN module enriches the semantic features through the three steps of lateral connectivity, multiple up-sampling, and feature fusion. In addition, we introduce the transformer attention mechanism for comprehensive feature learning in tumor image detection. The transformer mechanism abandons the traditional CNN. It can capture the effective regions of objects in the image during tumor detection. Finally, we conducted extensive experimental verification on the CBIS-DDSM, which is the Curated Breast Imaging Subset of the Digital Database for Screening Mammography. The contribution of this paper is mainly in the following three aspects.

Fig 1. The figure compares the original FPN with our proposed SPN connection.

Fig 1

Fi is multi-level features from layer 1 to 5, and Bi is the output of the feature pyramid of the i level. (A) The traditional FPN is by top-down and horizontal connection. (B) Enriched the semantic features through the three steps of lateral connectivity, multiple up-sampling, and feature fusion.

  1. The paper proposes an effective network called SPN-TS for cancer detection, containing semantic FPN and transformer self-attention mechanisms. It addresses the extraction of small tumor objects and the lack of contextual information.

  2. The novel transformer attention mechanism is integrated into the extraction network. The network focuses on tumor region features by attention mechanism and location encoding.

  3. The paper also decouples the classification and regression branches to improve the classification confidence. The experiment conducted an extensive evaluation of the CBIS-DDSM dataset to illustrate the effectiveness of the SPN-TS method in detecting small objects.

The rest of this paper is organized as follows. The Related Work summarizes the cancer diagnosis and object detection. In Materials and methods section, it is described in detail for addressing problems. In section Results, experimental design and results analysis are presented to compare with the existing studies on breast cancer detection and object detection, respectively. Discussion section mainly describes the discussion of breast disease. Finally, the last includes the conclusions of the work and future work.

Related work

Mammography is currently an effective method to detect breast cancer and its formative stage for breast cancer screening [13, 14]. In earlier studies, the Support Vector Machines (SVM) algorithm has a good classification performance and is mostly used for breast tumor classification and identification tasks [1518]. V. Jitendra et al. [19] proposed a system based on the Histogram of Orientation Gradients (HOG) descriptor to classify objects using SVM. C. Muramatsu et al. [18] use texture features to classify breast lesions into benign and malignant categories. In reference [20], multi-scale region growth and wavelet decomposition were combined to locate a region of interest and achieved 96.19% detection accuracy of pathology images of breast cancer. In recent years, convolutional neural networks (CNN) and artificial intelligence technology have been widely used in object detection [21, 22]. The deep learning-based methods also were widely used for medical imaging [2325]. Yang et al. [23] proposed a multi-perspective detection framework, which combines the convolution neural network of two views of mammogram image to predict the case of mammogram image classification. R. Platania et al. [26] proposed a YouOnlyLookOnce (YOLO) based CAD system called BC-DROID.

Object detection algorithms are generally applied to the automatic detection of cancer images, especially breast masses [27, 28]. Object detection algorithms are divided into two-stage and one-stage algorithms depending on the processing flow. Faster-RCNN, as a two-stage representative algorithm, generates region proposals through the Region Proposal Network (RPN) module, and then performs fine-grained classification and regression [9, 2931]. The typical one-stage YOLO [26, 32] and SSD [33] detectors are designed to detect the diversity of object sizes. In feature extraction, FPN [9] first builds a top-down architecture to extract features from multiple layers by connecting them laterally. In addition, Wu et al. [12] disentangle the sibling head into two independent branches for classification and localization. The DETR method [34] successfully integrates the Transformer self-attention mechanism into an object detection framework for detecting the central building blocks of the pipeline.

Materials and methods

Overview

This paper novelty proposes the Semantic Feature Pyramid network with Transformer Self-attention module, named SPN-TS, for tumor detection. The overall process is as follows. Firstly, this paper creatively designs a semantic FPN feature extraction scheme, which changes the traditional cross-layer connectivity way. Then, we introduce the transformer attention mechanism abandoning the traditional CNN method. The mechanism is used for comprehensive feature learning and boundaries feature extraction in tumor image detection. The overall architecture of SPN-TS method is shown in Fig 2. It is mainly divided into four parts, transformer attention module, semantic feature pyramid network, region proposal network, prediction of classification and regression module. Specifically, the transformer attention module stitches multiple attention features using the multi-headed self-attention mechanism and positional encoding. It can effectively capture the key regions and richer semantic features via the combined contextual information. Then, the semantic FPN network is creatively proposed to change the traditional cross-layer connectivity scheme. It is mainly used to enrich the semantic information of low-level feature maps and improve the detection performance of small tumor objects. In addition, the paper decouples the features of classification and regression branches and uses fully connected layers and convolutional layers for prediction, respectively.

Fig 2. The overview architecture of proposed SPN-TS, including the following four parts, transformer attention feature extractor, semantic feature pyramid network module, region proposal network and classification and regression module.

Fig 2

Semantic feature pyramid network

The paper proposes the semantic FPN network to enrich the semantic features by integrating high-level features into low-level feature maps and improve the detection performance of small tumor objects. As shown in Fig 2, the rich semantic information is obtained through the three steps of lateral connectivity, multiple up-sampling, and feature fusion. The newly connected feature maps are input into Region Proposal Network (RPN) and Region of Interest Alignment (ROIAlign) respectively. Then, the output results of ROIAlign are detected and regressed to fully integrate the spatial location and semantic information of the object, so that the detection model can better classify and localize the input images.

The first operation is lateral connection. We propose innovative lateral connections that preserve the resolution and semantic information of the current layer while facilitating the feature fusion operations in the subsequent steps. In Fig 1, Fi are multi-level features from layer 1 to 5, and is the feature map of Ith stage in the intermediate process through the transverse connection defined as Mi. It is also the corresponding output feature of the original FPN without feature fusion. At the same time, lateral connections are calculated by setting a 3 convolutional layer on each of the merged feature map to reduce the aliasing effect of up-sampling and integration.

Then, the second operation is lateral connection multiple up-sampling. To integrate multi-level features and preserve semantic information, we need to up-sampling feature maps Mi to the corresponding size. The M2 was obtained by fusion of three times down-sampling of F2-F5 at different scales. In short, the fusion operation in the cross-scale and dense pathway of SPN is described as:

{B5=F5B4=f4(F4,Sp(F4),up(F5))B3=f3(F3,Sp(F3),up(F5),up(F4))B2=f2(F2,Sp(F2),up(F5),up(F4),up(F3)) (1)

where fi is the function of ith multi-scale feature fusion operation, i is the layer. The high-level feature Bi+1 are propagated through the classical nearest interpolation function up(.) for up-sampling. The Sp is represented as skip connection.

In the feature fusion step, we integrate feature maps from different layers and different sizes, where Bi is the output of the feature pyramid of i level. For simplicity, we take the third layer as an example, and first scaled the high-level feature map (F4) by a scaling factor of 2 using the nearest neighbor up-sampling method. Then, it is merged with the shallow feature maps F3 and F2, respectively, and then, we obtain B with a 3 × 3 convolution layer. Finally, this feature map is fused into B2 through a 3 × 3 convolutional layer, and the feature maps of other levels are obtained in the same way in turn. By completing the above steps, we will achieve an effective fusion of features from low to high.

Transformer attention module

As a result of these improvements, the detection of small objects is not as good as expected. These approaches lack adjustments for small object and boundary, which is one of the reasons for the unsatisfactory performance of small object detection. Therefore, the paper integrated the transformer attention module into a network that contains a self-attention mechanism based on deconvolution and a skip connection. The structure of a complete transformer attention model is shown in Fig 3(a). The transformer attention module helps to extract clearer and richer semantic features, especially adge features, through combining contextual information.

Fig 3. The residual connectivity operation and attention module.

Fig 3

(a) Overall Architecture. (b) Operating details. (c) Contrast abstract representation.

As shown in Fig 3(b), the module applies 3 × 3 convolution in ResNet-50 to get the output results. Firstly, a new feature map is obtained by 1 × 1 convolution and 3 × 3 convolution of the input feature map. Then, it is added to the feature map obtained through transformer attention, followed by 1 × 1 convolution of the network and finally summed with the input feature map. The feature map obtained from the previous steps is the input for the next stage of FPN.

As shown in Fig 3(c), our improved transformer is connected across layers using normalization instead of regularization on the backbone network. The transformer layers are a stacked transformer encoders that extract the degradation features from the reconstructed data, with two sub-layers: Multi-Head Self-Attention and Feed-Forward (FFN). A transformer attention is composed of K stacks with identical layers. Each stack is divided into two sub-layers, a novel multi-headed self attention mechanism and a MLP feed-forward network. The residual connectivity operation common in ResNet, is used to feature normalization around each sub-layer. In Fig 3, the first layer of multi-headed attention is the integration of multiple self-attention structures. The sub-layer designed to capture the dependencies between features and ignores their distances. Given the representation of the (l − 1) layer, H(l−1) and h parallel attention functions, the ith attention is defined:

headi=Attention(Hl-1·WQl,Hl-1·WKl,Hl-1·WVl) (2)

where WQ, WK, WV are projection weights. Let Q, K, and V denote query, key, and value, the scaled dot-product attention defined as follows,

Attention(Q,K,V)=softmax(Q·KTdh)·V (3)

where dh=dh, which avoids extremely small gradients and produces a softer attention distribution. Then, the Multi-Head Attention is defined as follows,

Multi-Head(Hl-1)=[head1,head2,,headh]·W0 (4)

where W0 is a trainable weight.

Feed Forward has two different mappings, both linear and ReLU nonlinear, and is applied identically to each time step. Then, the method abtains Hl from the previous multi-head (Hl−1) as follows,

Hl=FFN(Multi-Head(Hl-1)) (5)
FFN(x)=ReLU(x·W1+b1)·W2+b2 (6)

The experimental data is fed into the self-attentive module, which is vectorized to obtain a weighted feature vector. This vector is fed into a feedforward network containing more global information. This operation is then repeated several times on each vector, which takes two linear transformation layers and a ReLU activation function layer. After the end of each sub-layer, the data and feature are normalized by the Norm module to ensure the stability of the network’s computational gradients.

Loss function

Object detection mainly contains two types of loss functions, classification, and regression, and the classification loss mostly uses the generic cross-entropy loss. The regression loss has many forms of improvements, which can be applied with the further research progresses. When the model is trained, the loss values are calculated to predict the change between the bounding box and the ground truth box. The SPN-TS discards the original loss function and innovatively introduced new classification and regression functions. The final loss function is as follows,

L=Lcls+Lreg (7)

where Lcls and Lreg are losses of classification and regression, respectively. The classification loss is defined as follows Focal Loss [30] to solve the problem of sample imbalance.

Lcls=FocalLoss={-α(1-y)βlogy,y=1-(1-α)yβlog(1-y),y=0 (8)

The method adopted CIoU loss function, which to solve the problem of inconsistency between the metric and the border regression on logo detection, and the calculation method is shown in equation,

Lreg=1-IoU+RCIoU(Bpd,Bgt) (9)
Lreg=1-BpdBgtBpdBgt+φ2(b,bgt)c2+α4π2(arctanwgthgt-arctanwh)2 (10)

Due to the high performance of the proposed SPN-TS method in breast cancer object detection, we strongly recommend it for the diagnosis of multiple diseases, such as lung cancer and brain cancer. Moreover, the proposed method can be easily incorporated into healthcare systems for reliable diagnosis of multiple diseases due to its reproducibility.

Results

We performed extensive experimental evaluations on the CBIS-DDSM dataset. The experimental design was compared with various state-of-the-art detection methods and medical detection methods to demonstrate the effectiveness of the proposed SPN-TS network. We further performed ablation studies and qualitative analysis to demonstrate the effectiveness of our method in detecting small objects.

Dataset

CBIS-DDSM is a publicly available breast photographic dataset. The earlier dataset was manually screened by experienced physicians. In this experiment, 2424 complete photographs of benign and malignant breast masses were selected from this data as experimental data. It is an unbalanced dataset containing 1629 benign and 795 malignant tumors. At the same time, high quality and clean detection frame labels are provided manually for the object detection task. The proportion of training verification and test in the experiment was set as 70%, 20% and 10%. Before the experiment, the lossless JPG image was converted to PNG format using the calibration feature of the DDSM website (http://www.eng.usf.edu/cvprg/Mammography/Database.html). The image size was set to 224 × 224, and the pixel values were readjust to be within range from 0 to 255 pixels. A sample image of a mammogram is shown in Fig 4, containing the tumor of varying sizes and different resolutions.

Fig 4. Some samples of mammogram images.

Fig 4

Experimental setup

The size of the mammogram image input to the network is 224 × 224, and the ReLU function was set as a nonlinear activation function. When these detectors were trained, the initial learning rate is set to 0.001. The learning rate in MMdetection is calculated using the principle of linear scaling [35] to obtain the learning rate of the training model. The learning rate is adjusted downward to 0.0001 when the number of iterations is 10000, to further converge the function loss value. The batch size is 64 and the momentum factor is set to 0.9. Inspired by recent CNN methods [3638], the experiments implement feature extraction with VGG16 and ResNet-50 as the backbone network. All baseline detectors were re-implemented using the same code base based on the exposed MMDetection (https//github.com/open-mmlab/mmdetection) toolkit for a fair comparison of all methods. All models are trained on the same training set and validated on the validation set. This work was performed on an Nvidia GTX1080Ti GPU and Python was used as the programming language on Ubuntu 14.04 OS.

Evaluation metrics

Different evaluation indicators were used to evaluate the performance of the proposed method. Each identified mass requires a specific detector to output its cancer tumor type, which is then compared to a given ground-based reality for one of four options. The first one is a mass classified as benign (True Negative TN), the second is a benign mass classified as malignant (False Positive FP), the third is a malignant mass classified as malignant (True Positive TP) and finally, a malignant mass classified as benign (False Negative FN). The above four indexes constitute the confusion matrix, which is one of the important indexes to evaluate accuracy. In this paper, the above indicators will be used to further calculate several other indicators.

Accuracy(ACC)=TN+TPFP+TN+FN+TP (11)
Specificity(FPR)=TNFP+TN (12)
Sensitivity(TPR)=TPFN+TP (13)
Precision=TPFP+TP (14)
F-measure=2Sensitivity×PrecisionSensitivity+Precision (15)
MCC=TP×TN-FP×FN(TP+FP)(TP+FN)(TN+FP)(TN+FN) (16)

Eqs (11)–(16) represent Accuracy, mAP, sensitivity, specificity, precision, F-measure, and Matthews Correlation Coefficient (MCC), respectively. For evaluation, we used the widely used mean accuracy (mAP) to evaluate the network detection results. IoU threshold set to 0.5 mAP, which denotes the precision and recall as the region included in the two-axis mapping. The N is represents mean value. P represents the average accuracy rate of each category, which is used to evaluate the detection results of each category. The higher its value, the better the performance of the model.

Comparison with detector baselines

Results

In order to prove the effectiveness of the proposed model for the detection of cancer tumors, the paper first compares the SPN-TS method with various popular object detection models. Among them, the classical two-stage detection method Faster R-CNN [29] and the single-stage detector SSD [33] and YOLOv3 [32] are included. There are also recently improved proposed FPN [9], mask R-CNN [39] and Cascade R-CNN [31] object detectors for experiments. All of the above detectors use ResNet-50 as a backbone network for feature extraction, which ensures a uniform scale.

Table 1 summarizes the results of CBIS-DDSM in different detection models. Compared with the existing baselines such as Faster RCNN, YOLOV3 and Mask R-CNN, Cascaded R-CNN detectors in all baselines achieved 89.81% mAP and 93.61% ACC, respectively. This detector achieves better results compared with other generic detectors, therefore, Cascaded R-CNN is used as the baseline in this paper. Faster-RCNN, a popular detector, also achieved 83.20% mAP detection performance because there are more small tumor objects and fewer objects in many images in the real world. We then compared the performance of SPN-TS with all baselines, and the mAP metric achieved 92.6%, a 9.4% improvement over Faster-RCNN. As seen in Table 1, it was found that SPN-TS achieved better performance in these models, which achieved 93.26% sensitivity, 95.26% specificity, 96.78% accuracy, and 87.27% MCC value, respectively. It is worth noting that the performance of SPN-TS is improved by 9.4% m AP, 8.66% sensitivity, 10.14% specificity, 15.44% precision, 15.47% accuracy, and 15.07% MCC value, compared with the Faster-RCNN method. The above results demonstrate the significant improvement in detection and classification performance of our proposed method, which is beneficial for localization and treatment of irregular and ambiguous tumor regions.

Table 1. Comparison of the method performance with existing general object detection.
Method mAP Sensitivity Specificity Precision MCC ACC
Faster-RCNN [29] 83.20 84.60 85.12 82.22 72.70 81.31
SSD [33] 83.90 84.80 85.66 86.91 76.20 86.30
YOLOV3 [32] 85.10 86.10 87.80 89.31 79.20 89.00
FPN [9] 88.70 90.00 91.00 90.92 81.00 90.50
Mask-RCNN [39] 89.31 90.86 91.37 93.57 82.09 92.50
Cascade R-CNN [31] 89.81 91.19 92.80 94.11 83.20 93.61
SPN-TS (Ours) 92.60 93.26 95.26 97.66 87.27 96.78

Analysis

In addition to the quantitative results of the above experiments, in this section, Fig 5 further presents some detection results of SPN-TS method, including the predicted boundary boxes and classification accuracy. The yellow boxes represent the predicted results. Apparently, the SPN-TS detector can detect larger tumor regions, either benign or malignant, and obtain accurate results, as shown in the first result in the second row of Fig 5. In Fig 5, we show the detection of more small tumor regions and border blurred regions. The second image in the first row with blurred tumor region, the SPN-TS method detects both tumors in the image and results in benign tumors with 95% and 98% accuracy, respectively. The fourth image in the third row possesses a smaller tumor size, and the model can still return 98% of benign tumor detection results, indicating that the method can accurately return smaller tumor regions. The detector also shows high performance for the detection of malignant tumors with blurred edges and irregularities, and is able to regress malignant tumors to identify more accurate regions.

Fig 5. Results of breast tumor detection and classification using SPN-TS network.

Fig 5

In addition, to more comprehensively analyze the detection effect of SPN-TS method, the Precision-Recall (PR) curve and Receiver Operating Characteristic (ROC) curve were drawn in Fig 6. The PR curve is a curve drawn with Precision as the vertical axis and Recall as the horizontal axis. Therefore, PR is more concerned with the classification of positive samples. The ROC curve uses the FPR values as the horizontal axis and the TPR values as the vertical axis. Therefore, the ability of the model can be accurately judged without the influence of positive and negative sample distribution. As shown in Fig 6, the detection results are better than the original Faster RCNN method in both positive and negative sample processing.

Fig 6. Comparison of PR and ROC curves between Faster RCNN and SPN-TS.

Fig 6

Comparison with medical baselines

Results

In addition, the article focuses on the diagnosis of the disease and we will compare the performance of SPN-TS with the better CAD systems available, using the detection of breast cancer as an example. The results generated will also be analyzed in depth. Our choice of the baseline is based on the principle of identification methods that have performed well in recent years in the field of breast tumor detection, mainly containing the classical cancer detection arfLAD [40], RCNN-SVM [40] and DeepCAD [41]. There are also recent BCDROID [26], Cascade R-CNN [31], FMSVM and SC-FU-CS RCNN methods with better detection performance.

As shown in Table 2, it shows the AUC and ACC two indicators for breast tumor detection methods. The two evaluation indicators of BC-DROID method, reached 93.50% and 92.31%, respectively. And the newly proposed SC-FU-CS RCNN method achieved 94.06% accuracy and 94.71% AUC. Our SPN-TS method achieved a 2.72% accuracy improvement over the SC-FU-CS RCNN method to reach the best accuracy of 96.78%. Meanwhile, the AUC reaches 96.78%, which is 2.32% higher than that of the SC-FU-CS RCNN method. In this paper, Table 2 shows that the proposed method achieves the realizes the state-of-art algorithm and solves the problem of identifying multi-scale or ambiguous breast masses.

Table 2. Comparison of the method performance with existing medical methods.
Method ACC AUC
LAD 78.57 -
RCNN-SVM 87.20 94.00
DeepCAD 91.00 91.00
BC-DROID 93.50 92.31
Cascade R-CNN 92.76 92.72
FMSVM 91.65 96.00
SC-FU-CS RCNN 94.06 94.71
SPN-TS (Ours) 96.78 97.03

Analysis

In addition to the graphs of detection results given in the previous section, as shown in Fig 7, this paper also shows the tumor visualization results of some test images. The first row is an example of the source image, and the second row shows a simple cut of the breast tissue and sharpening of the tumor region on the original image. The third row of them gives the tumor region highlighted in red, which is displayed more accurately after using the transformer attention module, thus proving the effectiveness of our method. The proposed model effectively detects malignant lesions, which focuses on the interface between breast cancer and the surrounding.

Fig 7. Gradient-weighted class activation map showing cancerous regions of breast cancer.

Fig 7

The figure contains source data, processed enhanced images, and gram-cam images.

Ablation study

Table 3 shows a combined analysis of the two modules based on baselines Faster RCNN. First, adding the two modules to the Faster RCNN, the mAP results improved by 5.02% and 4.11%, respectively, which proves the effectiveness of the two modules of semantic feature pyramid network and Transformer Attention. The final mAP result of our method adding two modules is 92.60%, which achieves the best detection result. The mAP, Sensitivity, Specificity, Precision, MCC, and ACC of each index of the ablation experiment were improved by 9.4%, 8.66%, 10.14%, 15.44%, 14.57%, and 15.47%, respectively, compared with the basic baseline. The performance improvement by adding SPN module over TS module is 0.91%, 1.26%, 1.38%, 2.55%, 0.66%, 0.67% for mAP, Sensitivity, Specificity, Precision, MCC, and ACC, respectively. The experimental results indicate that the two designed modules are effective in tumor detection. The results show that our suggested model successfully identifies and classifies breast tumors. In this paper, in addition to presenting a table of ablation experiments, we have further represented the performance increments as bar charts in Fig 8 to see more clearly the degree of effectiveness of each module.

Table 3. The detection effects of the two modules based on baselines Faster RCNN.

Method mAP Sensitivity Specificity Precision MCC ACC
Faster RCNN 83.20 84.60 85.12 82.22 72.70 81.31
Faster RCNN+SPN 88.22 89.94 91.35 92.82 82.56 91.00
Faster RCNN+TS 87.31 88.68 89.97 90.27 81.90 90.33
SPN-TS (Ours) 92.60 93.26 95.26 97.66 87.27 96.78

Fig 8. The histogram results of the ablation experiment showed.

Fig 8

Discussion

One of the major causes of increased mortality from breast cancer is screening for tumors that are found at an advanced stage, making it difficult to treat effectively. Tumor regions in medical images are usually small in size and fuzzy in boundary. Although the existing research on tumor detection has made a good improvement in feature extraction and target detection algorithm, its performance has not achieved the expected results, especially for the detection of small size tumor. In this paper, we propose an object detection scheme SPN-TS, including semantic feature pyramid network and converter attention module, which specifically solves small tumor targets and lacks context information. An extensive evaluation of state-of-the-art detection methods on the CBIS-DDSM dataset demonstrates the validity of the proposed SPN-TS network.

Compared with existing methods, our SPN-TS method obtains better detection performance, especially in solving small objects and fuzzy boundaries of medical images. We compared the performance of SPN-TS with all baselines, and the mAP metric achieved 92.6%, a 9.4% improvement over Faster-RCNN. Our SPN-TS method achieved a 2.72% accuracy improvement over the SC-FU-CS RCNN method to reach the best accuracy of 96.78%. The final mAP result of our method adding transformer feature extractor and semantic feature pyramid network modules is 92.60%, which achieves the best detection result. The mAP, Sensitivity, Specificity, Precision, MCC, and ACC of each index of the ablation experiment were improved by 9.4%, 8.66%, 10.14%, 15.44%, 14.57%, and 15.47%, respectively, compared with the basic baseline.

However, it can not achieve high detection performance in some cases. The accuracy of small-size tumor detection for medical images still needs to be improved. During the tumor detection process, we only used medical image data as the input of the model, and did not take into account factors such as the patient’s age, gender, and family history, which are parts that need to be improved in the future. In addition, the number of patient samples is also a limitation of this study, and we will collect and label more data from hospitals to support and extend our work in the future. Furthermore, we will use lightweight methods to achieve faster and more accurate performance for tumor detection.

Conclusion

To solve the tumor of small-size and fuzzy boundaries, the paper proposes a novel detection method by cascading semantic feature pyramid networks and a transformer attention module in medical images. The Semantic FPN obtains semantic features by integrating high-level features into low-level feature maps. The transformer attention model has been used in a variety of object detection tasks and has proven to capture effective key regions. The experiment section demonstrates the effectiveness of the proposed SPN-TS by doing extensive evaluations in CBIS-DDSM dataset, including ablation studies and qualitative analysis. Our algorithm can provide more accurate suggestions for radiologists to diagnose breast cancer and reduce the number of operations for benign breast nodules. The method achieves the best detection performance and effectively solves the difficulties of early tumor detection. In the future, the algorithm is expected to object detection of other diseases. It further provides algorithmic references for the general object detection field. The breadth of the proposed model can be extended to other disease diagnosis and vision-related fields.

Data Availability

All data files are published and available from the CBIS-DDSM database. (http://www.eng.usf.edu/cvprg/Mammography/Database.html).

Funding Statement

We provide repository information for our data at acceptance. All data files are published and available from the CBIS-DDSM database. “http://www.eng.usf.edu/cvprg/Mammography/Database.html”.

References

  • 1.Lu HC, Loh EW, Huang SC. The Classification of Mammogram Using Convolutional Neural Network with Specific Image Preprocessing for Breast Cancer Detection. In: 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD); 2019. p. 9-12.
  • 2. Rajput G, Agrawal S, Biyani KN, Vishvakarma SK. Early breast cancer diagnosis using cogent activation function-based deep learning implementation on screened mammograms. Int J Imaging Syst Technol. 2022; p. 1101–1118. doi: 10.1002/ima.22701 [DOI] [Google Scholar]
  • 3. Lee Y, Huang C, Shih C, Chang R. Axillary lymph node metastasis status prediction of early-stage breast cancer using convolutional neural networks. Comput Biol Medicine. 2021; p. 104206. doi: 10.1016/j.compbiomed.2020.104206 [DOI] [PubMed] [Google Scholar]
  • 4. Khan HN, Shahid AR, Raza B, Dar AH, Alquhayz H. Multi-View Feature Fusion based Four Views Model for Mammogram Classification using Convolutional Neural Network. IEEE Access. 2019;PP(99):1–1. [Google Scholar]
  • 5. Suh YJ, Jung J, Cho BJ. Automated breast cancer detection in digital mammograms of various densities via deep learning. Journal of personalized medicine. 2020;10(4):211. doi: 10.3390/jpm10040211 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ansar W, Shahid AR, Raza B, Dar AH. Breast cancer detection and localization using mobilenet based transfer learning for mammograms. In: International symposium on intelligent computing systems. Springer; 2020. p. 11-21.
  • 7.Ismail NS, Sovuthy C. Breast Cancer Detection Based on Deep Learning Technique. In: 2019 International UNIMAS STEM 12th Engineering Conference (EnCon); 2019. p. 89-92.
  • 8. Abbas S, Jalil Z, Javed AR, Batool I, Khan MZ, Noorwali A, et al. BCD-WERT: a novel approach for breast cancer detection using whale optimization based efficient features and extremely randomized tree algorithm. PeerJ Computer Science. 2021;7:e390. doi: 10.7717/peerj-cs.390 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ. Feature Pyramid Networks for Object Detection. In: IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 936-944.
  • 10.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention Is All You Need. In: Conference and Workshop on Neural Information Processing Systems; 2017.
  • 11.Ilya Sutskever OV, Le QV. Sequence to sequence learning with neural networks. In: In Advances in Neural Information Processing Systems; 2014. p. 3104-3112.
  • 12.Wu Y, Chen Y, Yuan L, Liu Z, Wang L, Li H, et al. Rethinking Classification and Localization for Object Detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020. p. 10183-10192.
  • 13.American Institute for Breast Cancer Research. Accessed: [Online]. 2020.
  • 14. Gnanasekaran VS, Joypaul S, Sundaram PM, Chairman DD. Deep learning algorithm for breast masses classification in mammograms. IET Image Processing. 2020;14(12):2860–2868. doi: 10.1049/iet-ipr.2020.0070 [DOI] [Google Scholar]
  • 15. Hamza Osman A. An Enhanced Breast Cancer Diagnosis Scheme based on Two-Step-SVM Technique. International Journal of Advanced Computer Science and Applications. 2017. [Google Scholar]
  • 16. Azar AT, El-Said SA. Probabilistic neural network for breast cancer classification. Neural Computing and Applications. 2013;23(6):1737–1751. doi: 10.1007/s00521-012-1134-8 [DOI] [Google Scholar]
  • 17. Abdel-Zaher AM, Eldeib AM. Breast cancer classification using deep belief networks. Expert Systems with Applications. 2016;46:139–144. doi: 10.1016/j.eswa.2015.10.015 [DOI] [Google Scholar]
  • 18. Muramatsu C, Hara T, Endo T, Fujita H. Breast mass classification on mammograms using radial local ternary patterns. Computers in Biology and Medicine. 2016;72:43–53. doi: 10.1016/j.compbiomed.2016.03.007 [DOI] [PubMed] [Google Scholar]
  • 19. Virmani J, Kriti, Dey N, Kumar V. PCA-PNN and PCA-SVM based CAD Systems for Breast Density Classification. Springer International Publishing. 2016; p. 159–180. [Google Scholar]
  • 20.Pomponiu V, Hariharan H, Zheng B, Gur D. Improving Breast Mass Detection using Histogram of Oriented Gradients. vol. 9035; 2014. p. 90351R.
  • 21. Filipczuk P, Fevens T, Krzyzak A, Monczak R. Computer-Aided Breast Cancer Diagnosis Based on the Analysis of Cytological Images of Fine Needle Biopsies. IEEE Transactions on Medical Imaging. 2013;32(12):2169–2178. doi: 10.1109/TMI.2013.2275151 [DOI] [PubMed] [Google Scholar]
  • 22. M K Ebrahimpour HM, Sattari-Naeini V. Improving breast cancer classification by dimensional reduction on mammograms. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization. 2018;6(6):618–628. [Google Scholar]
  • 23.Yang WT, Su TY, Cheng TC, He Y, Fang YHD. Deep learning for breast cancer classification with mammography. In: Other Conferences, Computer Science, Medicine; 2019. p. 1005.
  • 24. Samala R, Chan HP, Hadjiiski L, Helvie M, Richter C, Cha K. Evolutionary pruning of transfer learned deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis. Physics in medicine and biology. 2018;63. doi: 10.1088/1361-6560/aabb5b [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lotter W, Sorensen G, Cox D. A Multi-Scale CNN and Curriculum Learning Strategy for Mammogram Classification. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. 2017; p. 169-177.
  • 26.Platania R, Shams S, Yang S, Zhang J, Lee K, Park SJ. Automated Breast Cancer Diagnosis Using Deep Learning and Region of Interest Detection (BC-DROID). In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics; 2017. p. 536-543.
  • 27.A U Haq MHMJKAMTAAASNIA J P Li, Shahid M. Feature selection based on L1-norm support vector machine and effective recognition system for Parkinsons disease using voice recordings. IEEE Access. 2019; p. 37718-37734.
  • 28. Roslidar R, Rahman A, Muharar R, Syahputra MR, Arnia F, Syukri M, et al. A review on recent progress in thermal imaging and deep learning approaches for breast cancer detection. IEEE Access. 2020;8:116176–116194. doi: 10.1109/ACCESS.2020.3004056 [DOI] [Google Scholar]
  • 29.Ren S, He K, Girshick RB, Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In: Conference on Neural Information Processing Systems; 2015. p. 91-99.
  • 30.Lin T, Goyal P, Girshick RB, He K, Dollar P. Focal Loss for Dense Object Detection. In: IEEE International Conference on Computer Vision; 2017. p. 2999-3007.
  • 31.Cai Z, Vasconcelos N. Cascade R-CNN: Delving Into High Quality Object Detection. In: IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 6154-6162.
  • 32.Redmon J, Divvala SK, Girshick RB, Farhadi A. You Only Look Once: Unified, Real-Time Object Detection. In: IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 779-788.
  • 33.Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C, et al. SSD: Single Shot MultiBox Detector. In: European Conference on Computer Vision; 2016. p. 21-37.
  • 34.Brungel R, Friedrich CM. DETR and YOLOv5: Exploring Performance and Self-Training for Diabetic Foot Ulcer Detection. In: IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS); 2021. p. 148-153.
  • 35.Goyal P, Dollár P, Girshick RB, Noordhuis P, Wesolowski L, Kyrola A, et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR. 2017;abs/1706.02677.
  • 36. Sahoo JP, Prakash AJ, Plawiak P, Samantray S. Real-Time Hand Gesture Recognition Using Fine-Tuned Convolutional Neural Network. Sensors. 2022;22(3):706. doi: 10.3390/s22030706 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Allam JP, Samantray S, Ari S. SpEC: A system for patient specific ECG beat classification using deep residual network. Biocybernetics and Biomedical Engineering. 2020;40(4):1446–1457. doi: 10.1016/j.bbe.2020.08.001 [DOI] [Google Scholar]
  • 38.Allam JP, Samantray S, Behara C, Kurkute KK, Sinha VK. Customized deep learning algorithm for drowsiness detection using single-channel EEG signal. In: Artificial Intelligence-Based Brain-Computer Interface. Elsevier; 2022. p. 189-201.
  • 39.He K, Gkioxari G, Dollar P, Girshick R. Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017. p. 2980-2988.
  • 40. MA Al-Antari SPJPMMYKSH MA Al-Masni, Kim T. An Automatic Computer-Aided Diagnosis System for Breast Cancer in Digital Mammograms via Deep Belief Network. Journal of Medical and Biological Engineering. 2018; p. 2199–4757. [Google Scholar]
  • 41. Abbas Q. DeepCAD: A Computer-Aided Diagnosis System for Mammographic Masses Using Deep Invariant Features. Computers. 2016;5(4):28–28. doi: 10.3390/computers5040028 [DOI] [Google Scholar]

Decision Letter 0

Mohamed Hammad

22 Jul 2022

PONE-D-22-17707Cancer Detection for Small-size and Ambiguous Tumors based on Semantic FPN and TransformerPLOS ONE

Dear Dr. Wang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Please submit your revised manuscript by Sep 05 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Mohamed Hammad, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service. 

Whilst you may use any professional scientific editing service of your choice, PLOS has partnered with both American Journal Experts (AJE) and Editage to provide discounted services to PLOS authors. Both organizations have experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. To take advantage of our partnership with AJE, visit the AJE website (http://learn.aje.com/plos/) for a 15% discount off AJE services. To take advantage of our partnership with Editage, visit the Editage website (www.editage.com) and enter referral code PLOSEDIT for a 15% discount off Editage services.  If the PLOS editorial team finds any language issues in text that either AJE or Editage has edited, the service provider will re-edit the text for free.

Upon resubmission, please provide the following:

The name of the colleague or the details of the professional service that edited your manuscript

A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file)

A clean copy of the edited manuscript (uploaded as the new *manuscript* file).

3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

4. Thank you for stating the following in your Competing Interests section:  

NO authors have competing interests. 

Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now 

 This information should be included in your cover letter; we will change the online submission form on your behalf.

5. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

6. Please ensure that you refer to Figure 6 in your text as, if accepted, production will need this reference to link the reader to the figure.

7.  Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

When updating your manuscript, you should elaborate on your points and clarify with references, examples, data, etc. Also, note that if a reviewer suggested references, you should only add those that are relevant to your work if you feel they strengthen your article.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Summary: The authors propose a novel semantic feature pyramid network with a transformer self-attention module, named SPN-TS, for tumor detection, specifically designed to detect small-sized breast tumors with vague boundaries. For the same, authors first create a new FPN feature extraction scheme in the feature extraction stage for small-sized tumor regions, which is followed by the introduction of transformer attention mechanism to capture the features of local tumor boundaries. The technique achieves very good performance while beating all the major state-of-the-art techniques.

My Conclusion: I believe the manuscript is well structured and generally well written. The technique also seems to be sound and achieves very good performance as well. As the authors claim, I believe the technique can be used for variety of image-based prediction tasks. The only issues are that of the grammar and lack of clarity at some places. Most importantly, there is no section for related works, which needs to be introduced. Having said that, I believe the manuscript needs revision, which I classify as a minor one, for acceptance at Plos One, based on the minor comments I provide below.

Comments: My comments are organised and ordered from the start to the end of the manuscript.

1. In Abstract, the authors write, “… we novelty propose… ”. This needs to be corrected to “… we propose a novel…”. This mistake is present throughout the paper and the authors are advised to correct it wherever needed.

2. In Abstract, what are “SPN-TS”, “FPN” and “CBIS-DDSM”? Full forms are needed here.

3. In Introduction, Paragraph 1, Sentence 2, “Breast cancer”, b needs to be small here. Besides, relevant and strong references are needed here as well.

4. In Introduction, Paragraph 1, “As the large number of mammograms performed daily in hospitals, … ” . Grammar check needed here.

5. In Introduction, Paragraph 1, “Firstly, there is a huge difference between cancerous and noncancerous breast tissue … ”. How is this an issue, if there is a “huge” difference between the two? Shouldn't this difference mean that the two are easily differentiable?

6. In Introduction, Paragraph 2, full form of SVM should be provided.

7. In Introduction, Paragraph 2, “General object detection is often applied to automatic detection of cancer images, especially breast masses citeHaq2019Feature, roslidar2020review. Two-stage detectors are a series of R-CNN and RPN modules designed to address the challenges of multi-scale objects citeRen2015, Tsung2017Focal, Cai2018CascadeRCNN, Lin2017FPN. The tyoical one-stage YOLO citeRedmon2016YOLO, Platania2017Automated and SSD citeliu2016ssd detectors are also designed to detect the diversity of object sizes.” There are multiple citation issues which need correcting. Full forms of R-CNN and RPN are needed. When the authors talk about “Two-stage detectors”, this comes out of nowhere and needs some background to stay coherent with the on-going narrative of the paragraph.

8. In Introduction, Paragraph 3, Sentence 1 needs grammar check. If T and A are to be kept capital, perhaps authors could add acronym (TA).

9. In Introduction, Paragraph 3, Sentence 2. Again, what is FPN? Full form and explanation needed. The authors write “… FPN 'first constructs' …”. It is expected that FPN does something more after this phase. This is not clear in the text.

10. In Introduction, Paragraph 3, Sentence 3. Is “general FPN” different from the FPN earlier mentioned? If yes, citation needed here.

11. In Introduction, Paragraph 3. “This will lead to the lack of high-level feature map with sufficient resolution and lacking location information for detecting small objects.” Grammar check required. Perhaps, the authors mean, "This can lead to... ".

12. In Introduction, Paragraph 3. What do the authors mean by feature coupling? How does feature coupling affect the performance?

13. In Introduction, Paragraph 3. “Since it is difficult to extract the information of small tumor objects and the blurring of tumor boundaries in medical images. It is necessary to design an effective method to enrich the high-resolution features and local attention features with semantic information from high-level feature maps.” Grammar check needed. Perhaps the authors meant "... medical images, it is necessary to design...". I think the authors need to put in more background and rationale behind the question of why it is hard to capture small tumor objects and tumor boundaries.

14. In Introduction, Paragraph 4, Sentence 2. Grammar check required.

15. In Introduction, Paragraph 4, Sentence 4. “… we propose a object”. Grammar check needed.

16. Fig 1 needs a lot of explaining. What are Fs and what are Bs. Authors need to provide a brief explanation of how FPN and SPN work in context of the figure itself. As of now, the figure does not explain anything. If it is explained later, the author should mention this and still give a brief overview of the figure.

17. In Introduction, Paragraph 4, last Sentence. Check spellings.

18. Before starting the contribution points, the authors should add a few lines in the previous paragraph starting that the following are their contributions.

19. Contribution 2, why is T capital in "Transformer"? Check this throughout the manuscript.

20. Fig 2, “… transformer attention feature extractor… ”. Comma might be missing here.

21. Section Overview, Paragraph 1, Sentence 3. Grammar check.

22. Section Overview, Paragraph 1, Sentence 7. Grammar check.

23. Section Overview, Paragraph 1. “It helps extract clearer and richer semantic features through the combined contextual information.” How does it work and how is it able to extract clearer and richer semantic features?

24. Section Overview, Paragraph 1, last Sentence. Grammar check.

25. Section Semantic Feature Pyramid Network, Paragraph 1, ROIAlign and RPN need full forms.

26. Section Semantic Feature Pyramid Network, Paragraph 2. “… and is the feature map of Ith stage in the intermediate process through the transverse connection defined as Mi.” ith stage of what? The sentence is not clear.

27. Section Transformer Attention Module, Paragraph 2. What is ResNet-50. Needs full form and brief introduction.

28. Section Results, “the CBIS-DDSM dataset”. Citation required. Web link in the footnote will be good as well.

29. Section Dataset, “and contains 2[U+FF0C]424”. Textual error.

30. Section Dataset, last Sentence, “A sample image of a mammogram is shown in Fig. 4 and contains tumor areas of varying sizes and sharpness.” These are multiple images. Grammar check needed.

31. Section Experimental Setup, Sentence 2. Grammar check needed.

32. Section Experimental Setup, “The learning rate is adjusted downward to

0.0001… ”. By what factor?

33. Section Experimental Setup, what is “VGG16”?

34. Section Experimental Setup, “ImageNet” needs Citation/web link. Brief introduction will also be good.

35. Section Experimental Setup, “… the exposed MMDetection toolkit… ”. What do the authors mean by "exposed"? Citation/web link is also need.

36. Section Comparison with detector baselines, SubSection Results, Paragraph 1, “There are also recently improved proposed FPN /citeLin2017FPN, mask-rCN /citeHe2017Mask and Cascade R-CNN /citeCai2018CascadeRCN object detectors for experiments.” Reference issues.

37. Section Comparison with detector baselines, SubSection Results, Paragraph 2, “Table refobject summarizes… ”. Missing Table reference.

38. Table 1, SSD reference missing.

39. Section Comparison with Medical Baselines, Subsection Results, Paragraph 2, AUC and ACC need full forms.

40. Full fledged Related Works Section needs to be introduced, covering all the significant research conducted in the past pertaining to the manuscript.

41. Dataset needs to be explained in more detail. For example, how many samples are benign and how many malignant, whether there is data imbalance, etc.?

Reviewer #2: Cite the following latest articles published on CNN

1. @article{sahoo2022real,

title={Real-Time Hand Gesture Recognition Using Fine-Tuned Convolutional Neural Network},

author={Sahoo, Jaya Prakash and Prakash, Allam Jaya and P{\\l}awiak, Pawe{\\l} and Samantray, Saunak},

journal={Sensors},

volume={22},

number={3},

pages={706},

year={2022},

publisher={Multidisciplinary Digital Publishing Institute}

}

2. @article{allam2020spec,

title={SpEC: A system for patient specific ECG beat classification using deep residual network},

author={Allam, Jaya Prakash and Samantray, Saunak and Ari, Samit},

journal={Biocybernetics and Biomedical Engineering},

volume={40},

number={4},

pages={1446--1457},

year={2020},

publisher={Elsevier}

}

3. @incollection{allam2022customized,

title={Customized deep learning algorithm for drowsiness detection using single-channel EEG signal},

author={Allam, Jaya Prakash and Samantray, Saunak and Behara, Chinmaya and Kurkute, Ketan Kishor and Sinha, Vikas Kumar},

booktitle={Artificial Intelligence-Based Brain-Computer Interface},

pages={189--201},

year={2022},

publisher={Elsevier}

}

4. @article{venkata2021deep,

title={Deep review of machine learning techniques on detection of drowsiness using EEG signal},

author={Venkata Phanikrishna, B and Jaya Prakash, Allam and Suchismitha, Chinara},

journal={IETE Journal of Research},

pages={1--16},

year={2021},

publisher={Taylor \\& Francis}

}

5. @article{govinda2020review,

title={Review of the Convolution Neural Network Architectures for Deep Learning},

author={Govinda Rao Locharla , Jaya Prakash Allam , Y.V Narayana, Yellapu Anusha},

journal={International Journal of Advanced Science and Technology},

volume={29},

number={4},

pages={2251--2262},

year={2020}

}

6. @article{venkata2021brief,

title={A Brief Review on EEG Signal Pre-processing Techniques for Real-Time Brain-Computer Interface Applications},

author={Venkata Phanikrishna, B and P{\\l}awiak, Pawe{\\l} and Jaya Prakash, Allam},

year={2021},

publisher={TechRxiv}

}

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Review_PLOS4.docx

PLoS One. 2023 Feb 16;18(2):e0275194. doi: 10.1371/journal.pone.0275194.r002

Author response to Decision Letter 0


4 Sep 2022

Response to Reviewers

Dear Reviewers,

First, we appreciate the constructive comments, helpful critiques and favorable assessments from all reviewers. Then, listed below please find our one-by-one response to your comment.

Reviewer #1:

1. In Abstract, the authors write, “… we novelty propose… ”. This needs to be corrected to “… we propose a novel…”.

We throughout the paper to correct the same mistake, changing the “we novelty propose…” to “we propose a novel…”

2. In Abstract, what are “SPN-TS”, “FPN” and “CBIS-DDSM”? Full forms are needed here.

The full format of the abbreviations was further added. SPN-TS method means Semantic Pyramid Network with Transformer Self-attention. Feature Pyramid Network is PFN. CBIS-DDSM dataset is the Curated Breast Imaging Subset of the Digital Database for Screening Mammography.

3. In Introduction, Paragraph 1, Sentence 2, “Breast cancer”, b needs to be small here. Besides, relevant and strong references are needed here as well.

We have revised “Breast cancer” to “breast cancer” in sentence. We have added references in this paragraph that are relevant and powerful for the diagnosis of early breast cancer.

[1] Rajput G, Agrawal S, Biyani KN, Vishvakarma SK. Early breast cancer diagnosis

using cogent activation function-based deep learning implementation on screened

mammograms. Int J Imaging Syst Technol. 2022; p. 1101–1118.

[2] Lee Y, Huang C, Shih C, Chang R. Axillary lymph node metastasis status prediction of early-stage breast cancer using convolutional neural networks. Comput Biol Medicine. 2021; p. 104206.

4. In Introduction, Paragraph 1, “As the large number of mammograms performed daily in hospitals, … ” . Grammar check needed here.

The grammar have checked. “With the increasing quantity of mammograms in hospitals, manual reading has become complex and time-consuming for radiologists.”

5. In Introduction, Paragraph 1, “Firstly, there is a huge difference between cancerous and noncancerous breast tissue … ”. How is this an issue, if there is a “huge” difference between the two? Shouldn't this difference mean that the two are easily differentiable?

We have corrected the expression of the first challenge. “There are some differences between cancerous and noncancerous breast tissue on imaging, but the difference in early cancer diagnosis is minimal. ” In general, the lesion area and the tissue area vary widely. However, the difference is too small for early-stage cancer imaging, which makes it difficult to extract tumor region features.

6. In Introduction, Paragraph 2, full form of SVM should be provided.

The full form of SVM is Support Vector Machines. We have provided in Introduction section.

7. In Introduction, Paragraph 2, “General object detection is often applied to automatic detection of cancer images, especially breast masses citeHaq2019Feature, roslidar2020review.”There are multiple citation issues which need correcting. Full forms of RPN are needed. When the authors talk about “Two-stage detectors”, this comes out of nowhere and needs some background.

We correct multiple citation issues. “Object detection algorithms are generally applied to the automatic detection of cancer images, especially breast masses [28, 29]....” The full forms RPN is Region Proposal Network. Most importantly, we have introduced the related work section. “Two-stage detectors”Faster-RCNN generates region proposals through the Region Proposal Network (RPN) module, and then performs fine-grained classification and regression [9, 30–32].

8. In Introduction, Paragraph 3, Sentence 1 needs grammar check. If T and A are to be kept capital, perhaps authors could add acronym (TA).

The sentence grammar have checked. “To solve mentioned above problems, the paper proposes a detection framework for medical imaging, named Semantic Pyramid Network with a Transformer Self-attention (SPN-TS).” We change the initial letter of the method to upper case uniformly.

9. In Introduction, Paragraph 3, Sentence 2. Again, what is FPN? Full form and explanation needed. The authors write “… FPN 'first constructs' …”. It is expected that FPN does something more after this phase. This is not clear in the text.

The Full form of FPN is Feature Pyramid Network. The classical Feature Pyramid Network (FPN) algorithm achieves the extraction of multi-scale features through a top-down multi-level architecture. However, the up-sampling operation used by FPN loses the position information of small objects.

10. In Introduction, Paragraph 3, Sentence 3. Is “general FPN” different from the FPN earlier mentioned? If yes, citation needed here.

Both General PFN and FPN refer to algorithms for target detection of natural images. And the semantic PFN designed in this paper refers to the detection of medical images.

11. In Introduction, Paragraph 3. “This will lead to the lack of high-level feature map with sufficient resolution and lacking location information for detecting small objects.” Grammar check required. Perhaps, the authors mean, "This can lead to... ".

“This can lead to a lack of resolution of features and detection of small objects with position information.”

12. In Introduction, Paragraph 3. What do the authors mean by feature coupling? How does feature coupling affect the performance?

Double-head RCNN[1] disentangle the sibling head into two independent branches for classification and localization. For the classification branch, fully connected layers are employed to extract features to obtain confidence for tumor classification. For regression branches, using convolutional layers to learn representations. Therefore, the feature coupling method achieves the performance of the regression branch under boundary ambiguity by different operations on the two branches. The feature coupling method can effectively improve the performance of small-size detection.

[1] Wu Y, Chen Y, Yuan L, Liu Z, Wang L, Li H, et al. Rethinking Classification and Localization for Object Detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020. p. 10183–10192.

13. In Introduction, Paragraph 3. “Since it is difficult to extract the information of small tumor objects and the blurring of tumor boundaries in medical images...” Grammar check needed. Perhaps the authors meant "... medical images, it is necessary to design...". Why it is hard to capture small tumor objects and tumor boundaries?

The Grammar has checked. The traditional Feature Pyramid Network (FPN) was designed for multi-scale feature extraction. For small objects, which occupy fewer pixels in medical images, extracting features becomes particularly difficult. FPN uses top-down and horizontal connection structures leads to low-level feature maps lacking high-level semantic information, which cannot achieve effective detection of small objects. In medical images, malignant tumors have blurred boundaries, which will lead to ineffective discrimination in the detection of regression, so a separate convolution of the regression branch is required to extract features.

14. In Introduction, Paragraph 4, Sentence 2. Grammar check required.

Figure 1(A) shows the traditional FPN by top-down and horizontal connection. The

way does not yield sufficient semantic information, resulting in poor feature extraction from small tumor regions.

15. In Introduction, Paragraph 4, Sentence 4. “… we propose a object”. Grammar check needed.

The paper design a novel FPN feature extraction scheme, which changes the traditional connectivity and improves the semantic information.

16. Fig 1 needs a lot of explaining. What are Fs and what are Bs. Authors need to provide a brief explanation of how FPN and SPN work in context of the figure itself.

We have mentioned Fig.1 and gave a brief overview of the figure. They are used for the extraction of features. The figure compares the original FPN with our proposed SPN connection. Fi is multi-level features from layer 1 to 5, and Bi is the output of the feature pyramid of the i level. (A) The traditional FPN is by top-down and horizontal connection. (B) Enriched the semantic features through the three steps of lateral connectivity, multiple up-sampling, and feature fusion.

17. In Introduction, Paragraph 4, last Sentence. Check spellings.

Finally, we conducted extensive experimental verification on the CBIS-DDSM, which is the Curated Breast Imaging Subset of the Digital Database for Screening Mammography.

18. Before starting the contribution points, the authors should add a few lines in the previous paragraph starting that the following are their contributions.

The sentence “the contribution of this paper is mainly in the following three aspects. ” was added in contribution.

19. Contribution 2, why is T capital in "Transformer"? Check this throughout the manuscript.

We thoroughly checked the manuscript and corrected any unnecessary capitalization. “The novel transformer attention mechanism is integrated into the extraction network. The network focuses on tumor region features by attention mechanism and location encoding.”

20. Fig 2, “… transformer attention feature extractor… ”. Comma might be missing here.

The image is input to a feature extractor that incorporates the transformer attention module. This module is used as the first module of the SPN-TS method, called transformer attention feature extractor.

21. Section Overview, Paragraph 1, Sentence 3. Grammar check.

Firstly, this paper designs a novel semantic FPN feature extraction scheme, which changes the traditional cross-layer connectivity way.

22. Section Overview, Paragraph 1, Sentence 7. Grammar check.

The overall architecture of SPN-TS method is shown in Fig 2. It is mainly divided into four parts, transformer attention module, semantic feature pyramid network, region proposal network, prediction of classification and regression module.

23. Section Overview, Paragraph 1. “It helps extract clearer and richer semantic features through the combined contextual information.” How does it work and how is it able to extract clearer and richer semantic features?

The proposed SPN module enriches the semantic features through the three steps of lateral connectivity, multiple up-sampling, and feature fusion. The first operation is a lateral connection. The lateral connections preserve the resolution and semantic information of the current layer while facilitating the feature fusion operations in the subsequent steps. Then, the second operation is lateral connection multiple up-sampling. In the feature fusion step, we integrate feature maps from different layers and different sizes. The semantic FPN network enriches the semantic features by integrating high-level features into low-level feature maps and improving the detection performance of small tumor objects.

24. Section Overview, Paragraph 1, last Sentence. Grammar check.

In addition, the paper decouples the features for classification and regression branches. The module uses fully connected layers and convolutional layers for classification and regression branches, respectively.

25. Section Semantic Feature Pyramid Network, Paragraph 1, ROIAlign and RPN need full forms.

The connected feature maps are input into Region Proposal Network (RPN) and Region of Interest Alignment (ROIAlign) respectively.

26. Section Semantic Feature Pyramid Network, Paragraph 2. “… and is the feature map of Ith stage in the intermediate process through the transverse connection defined as Mi.” ith stage of what? The sentence is not clear.

In Fig. 1, Fi are multi-level features from layer 1 to 5, and is the feature map of Ith (1-5) stage in the intermediate process through the transverse connection defined as Mi.

27. Section Transformer Attention Module, Paragraph 2. What is ResNet-50. Needs full form and brief introduction.

ResNet is a feature extraction network with residual structure. And 50 layers ResNet is abbreviated as ResNet-50.

28. Section Results, “the CBIS-DDSM dataset”. Citation required. Web link in the footnote will be good as well.

Before the experiment, the lossless JPG image was converted to PNG format using the calibration feature of the DDSM website1. In this paper, the footnotes "CBIS-DDSM" dataset was added to the experimental description. The download links are "1:http://www.eng.usf.edu/cvprg/Mammography/Database.html"

29. Section Dataset, “and contains 2[U+FF0C]424”. Textual error.

We revised the Textual error. “In this experiment, 2424 complete photographs of benign and malignant breast masses were selected from this data as experimental data.”

30. Section Dataset, last Sentence, “A sample image of a mammogram is shown in Fig. 4 and contains tumor areas of varying sizes and sharpness.” These are multiple images. Grammar check needed.

The grammar was checked. “Fig. 4 shows some samples of mammogram, which contained the tumor of varying sizes and different resolutions.”

31. Section Experimental Setup, Sentence 2. Grammar check needed.

The grammar was checked. “The ReLU function was set as a nonlinear activation function.

”32. Section Experimental Setup, “The learning rate is adjusted downward to

0.0001… ”. By what factor?

When these detectors were trained, the initial learning rate is set to 0.001. The learning rate in MM detection is calculated using the principle of linear scaling to obtain the learning rate of the training model. The learning rate is adjusted downward to 0.0001 when the number of iterations is 10000, to further converge the function loss value.

33. Section Experimental Setup, what is “VGG16”?

VGGNet is a representative deep network that explores the relationship between the depth of CNN and its performance. VGG16 represents the network with 16 layers.

34. Section Experimental Setup, “ImageNet” needs Citation/web link. Brief introduction will also be good.

ImageNet is currently the largest database for image recognition in the world. The models pre-trained on this dataset have a wide range of applications. The link is “https://image-net.org/”.

35. Section Experimental Setup, “… the exposed MMDetection toolkit… ”. What do the authors mean by "exposed"? Citation/web link is also need.

In this paper, the footnotes " MMDetection" toolkit are added to the experimental description. The download links is "https//github.com/open-mmlab/mmdetection". The “exposed” means “published”.

36. Section Comparison with detector baselines, SubSection Results, Paragraph 1, “There are also recently improved proposed FPN /citeLin2017FPN, mask-rCN /citeHe2017Mask and Cascade R-CNN /citeCai2018CascadeRCN object detectors for experiments.” Reference issues.

We revised the compilation issues in references. “There are also recently improved proposed FPN [9], mask R-CNN [40] and Cascade R-CNN [32] object detectors for experiments.”

37. Section Comparison with detector baselines, SubSection Results, Paragraph 2, “Table refobject summarizes… ”. Missing Table reference.

We added the missing Table reference. “Table 1 summarizes the results of CBIS-DDSM in different detection models.”

38. Table 1, SSD reference missing.

We corrected the missing SSD reference in Table 1.

39. Section Comparison with Medical Baselines, Subsection Results, Paragraph 2, AUC and ACC need full forms.

ACC is an abbreviation for Indicator Accuracy. AUC (Area Under Curve) is defined as the area enclosed with the coordinate axis under the ROC curve.

40. Full fledged Related Works Section needs to be introduced, covering all the significant research conducted in the past pertaining to the manuscript.

We have added a complete section of related work, which contains the following four main aspects. First, the current state of research in medical image analysis is summarized. Then, studies examples are given in the early and recent stages of disease detection. Third, we talk about more background of “Two-stage and one-stage detectors”. The process of improvement in small object detection and accuracy is presented. Fourth, recent feature decoupling and transformer object detection algorithms are listed as algorithmic inspirations.

41. Dataset needs to be explained in more detail. For example, how many samples are benign and how many malignant, whether there is data imbalance, etc.?

The earlier dataset was manually screened by experienced physicians. In this experiment, 2424 complete photographs of benign and malignant breast masses were selected from this data as experimental data. It is an unbalanced dataset containing 1629 benign and 795 malignant tumors. The proportion of training verification and test in the experiment was set as 70\\%, 20\\% and 10\\%.

42. The only issues are that of the grammar and lack of clarity at some places.

We have throughout the paper and have been advised to correct the grammar.

Reviewer #2:

1. Cite the following latest articles published on CNN.

By reading the full article, the article has cited the latest references related to deep learning CNNs in the appropriate positions.

[1] Sahoo JP, Prakash AJ, P lawiak P, Samantray S. Real-Time Hand Gesture Recognition Using Fine-Tuned Convolutional Neural Network. Sensors. 2022;22(3):706.

[2] Allam JP, Samantray S, Ari S. SpEC: A system for patient specific ECG beat classification using deep residual network. Biocybernetics and Biomedical Engineering. 2020;40(4):1446–1457.

[3] Allam JP, Samantray S, Behara C, Kurkute KK, Sinha VK. Customized deep learning algorithm for drowsiness detection using single-channel EEG signal. In: Artificial Intelligence-Based Brain-Computer Interface. Elsevier; 2022. p. 189–201.

2. Kindly write the abstract is a concise and succinct manner

In the abstract, we carefully integrated the logic with concise writing and corrected the grammatical errors.

3. In abstract SPN-TS, CBIS-DDSM full form missed

The full format of the abbreviations was further added. SPN-TS method means Semantic Pyramid Network with Transformer Self-attention. CBIS-DDSM dataset is the Curated Breast Imaging Subset of the Digital Database for Screening Mammography.

4. In Table 1 citation missed

We corrected the citation errors in Table1 and several textual errors in the references as pointed out by the reviewers.

5. et al. should be in italics

We checked all the citations and italicized all et al. cited into italics.

6. Please avoid the word “we” throughout the script

In the paper, we checked that the word "we" was avoided throughout the text and was replaced by "the paper", "this method", etc.

7. MCC full form in the abstract?

The full format of the abbreviations was further added. MCC value means Matthews Correlation Coefficient.

8. Exact motivation of the work in the introduction missed

In the introduction section, we reorganize the motivation of this work. Cancer detection remains a challenging task due to the interference of diseased tissue, the diversity of mass scales, and the ambiguity of tumor boundaries. Secondly, the varying size of cancer masses is one of the challenges of detection, especially when small ones are difficult to detect. The third challenge is the blurring of tumor boundaries, which may cause visual confusion. Thus, it is necessary to design an effective method to enrich the high-resolution features and local attention features with semantic information from multi-level feature maps.

9.Learning rate of the network?

When these detectors were trained, the initial learning rate is set to 0.001.

10.Tuning of the network was missed

The learning rate in MM detection is calculated using the principle of linear scaling to obtain the learning rate of the training model. The learning rate is adjusted downward to 0.0001 when the number of iterations is 10000, to further converge the function loss value.

11.Please remove unnecessary capitalizations in the manuscript.

We checked for grammatical and spelling errors throughout the paper and removed unnecessary capitalization.

12.Please mention important major contributions only

In the introduction section, we added “The contribution of this paper is mainly in the following three aspects.”

(1) The paper proposes an effective network called SPN-TS for cancer detection, containing Semantic FPN and transformer self-attention mechanisms. It addresses the extraction of small tumor objects and the lack of contextual information.

(2) The novel transformer attention mechanism is integrated into the extraction network. The network focuses on tumor region features by attention mechanism and location encoding.

(3) The paper also decouples the classification and regression branches to improve the classification confidence. The experiment conducted an extensive evaluation on CBIS-DDSM dataset to illustrate the effectiveness of the SPN-TS method in detecting small objects.

13.Quality of the images are very poor, and not visible also

We rechecked the figures in the paper for quality enhancement, especially the methodological framework that was redrawn to make it visible. We use “https://pacev2.apexcovantage.com/Upload” link to quality check all the images in this paper.

14.Needs to include ROC curves, and precision-recall curves

To more comprehensively analyze the detection effect of SPN-TS method, the Precision-Recall (PR) curve and Receiver Operating Characteristic (ROC) curve were drawn in this paper. The PR curve is a curve drawn with Precision as the vertical axis and Recall as the horizontal axis. Therefore, PR is more concerned with the classification of positive samples. The ROC curve uses the FPR values as the horizontal axis and the TPR values as the vertical axis. Therefore, the ability of the model can be accurately judged without the influence of positive and negative sample distribution. In the ROC figure, the detection results are better than the original Faster RCNN method in both positive and negative sample processing.

15.Grammatically needs to recheck again.

We have throughout the paper and have been advised to correct the grammar.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Mohamed Hammad

12 Sep 2022

Cancer Detection for Small-size and Ambiguous Tumors based on Semantic FPN and Transformer

PONE-D-22-17707R1

Dear Dr. Wang,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Mohamed Hammad, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have addressed all my comments. I have no further comments and I consider the manuscript fit for acceptance and publication.

Reviewer #2: The authors are incorporated all the suggested comments. The quality of the script is now up to the mark.

Please include the following recently published script before the publication:

1. Hammad, M., Chelloug, S.A., Alkanhel, R., Prakash, A.J., Muthanna, A., Elgendy, I.A. and Pławiak, P., 2022. Automated Detection of Myocardial Infarction and Heart Conduction Disorders Based on Feature Selection and a Deep Learning Model. Sensors, 22(17), p.6503.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Allam Jaya Prakash

**********

Acceptance letter

Mohamed Hammad

29 Sep 2022

PONE-D-22-17707R1

Cancer Detection for Small-size and Ambiguous Tumors based on Semantic FPN and Transformer

Dear Dr. Wang:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Mohamed Hammad

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Review_PLOS4.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All data files are published and available from the CBIS-DDSM database. (http://www.eng.usf.edu/cvprg/Mammography/Database.html).


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES