Skip to main content
PLOS One logoLink to PLOS One
. 2025 Feb 5;20(2):e0318670. doi: 10.1371/journal.pone.0318670

Cross-ViT based benign and malignant classification of pulmonary nodules

Qinfang Zhu 1,*, Liangyan Fei 2
Editor: Mohammad Amin Fraiwan3
PMCID: PMC11798455  PMID: 39908279

Abstract

The benign and malignant discrimination of pulmonary nodules plays a very important role in diagnosing the extent of lung cancer lesions. There are many methods using Convolutional neural network (CNN) for benign and malignant classification of pulmonary nodules, but traditional CNN models focus more on the local features of pulmonary nodules and lack the extraction of global features of pulmonary nodules. To solve this problem, a Cross fusion attention ViT (Cross-ViT) network that fuses local features extracted by CNN and global features extracted by Transformer is proposed. The network first extracts different features independently through two branches and then performs feature fusion through the Cross fusion attention module. Cross-ViT can effectively capture and process both local and global information of lung nodules, which improves the accuracy of classifying the benign and malignant nature of pulmonary nodules. Experimental validation was performed on the LUNA16 dataset, and the accuracy, precision, recall and F1 score reached 91.04%, 91.42%, 92.45% and 91.92%, respectively, and the accuracy, precision, recall and F1 score with SENet as CNN branch reached 92.43%, 94.27%, 91.68% and 92.96%, respectively. The results show that the accuracy, precision, recall and F1 score of the proposed method are 0.3%, 0.11%, 4.52% and 3.03% higher than those of the average optimal method, respectively, and the performance of Cross-ViT network for benign and malignant classification is better than most classification methods.

Introduction

Lung cancer is a multifactorial, granulomatous disease with inconspicuous early symptoms that usually manifests as lung nodules. The International Agency for Research on Cancer has compiled cancer data from 185 countries around the world, showing that the incidence rate of lung cancer is 11.4%, which is the second highest cancer incidence rate, and the mortality rate of lung cancer is 18% of all deaths due to cancer, which is the first place [1]. The degree of lung nodule pathology (benign or malignant) is of vital importance to the physician in confirming the diagnosis of the disease. If the benign or malignant nature of the lung nodule was determined, early interventions could be made, more targeted treatment plans could be developed, and survival rates could be improved [2].

Pulmonary nodules are round or irregularly shaped lesions that proliferate in the lungs and usually appear as densely shaded, well-defined or ill-defined masses on computed tomography (CT) images of the lungs [3]. Clinically, physicians use lung CT images to observe the shape and size of lung nodules. Depending on the thickness of CT slices, there are as many as dozens to hundreds of slices in one CT scan. A large amount of CT data needs to be interpreted by the doctor’s professional knowledge, but this manual detection is inefficient, and the results are dependent on the doctor’s personal level of practice, which is prone to omissions and misdetections leading to delaying in the patient’s condition. At this point, a Computer Aided Diagnosis (CAD) system that can efficiently diagnose lung nodules is needed to assist doctors in their diagnosis, which not only greatly reduces the work pressure of doctors but also reduces leakage and misdiagnosis due to doctors’ inexperience.

As early as 1980s, many researchers developed CAD programs focused on the detection and identification of pulmonary nodules in CT scans [4]. Traditional machine learning methods dealing with pulmonary nodule classification first extract information about the texture, shape and edges of the nodule, and subsequently train classifiers to make classification predictions. Wu et al. [5] developed a new clustered random forest classification algorithm that combines clustering with RF to distinguish benign and malignant nodules by class decomposition and adjusting weights. Han et al. [6] compared three 2D texture features of pulmonary nodules on CT images and found that Haralick features were more favorable for classification of pulmonary nodules as benign or malignant, and finally the features were classified using support vector machines. Farahani et al. [7] proposed a computer-aided classification method for lung CT images integrating three classifiers. Morphological features of roundness, denseness, ellipticity and eccentricity of pulmonary nodules were used in the classification process in a manner decided by each classifier itself and finally the classification results were output by voting using majority voting method. Traditional machine learning classification methods have shown significant results in the task of pulmonary nodule classification, but they rely too much on manual feature selection and are unable to extract deep-level features of pulmonary nodules.

Later, deep learning techniques were rapidly developed and widely used in different scenarios in the field of medical image processing [8]. Among them, Convolutional Neural Networks (CNN) has shown superior performance in the image processing field, especially in extracting local features of images. More and more researchers [911] have developed new CNN models to discriminate the benign or malignant nature of pulmonary nodules in lung CT images. However, CNN performance is relatively weak to process global information. In particular, lung nodules in CT images are characterized by fuzzy boundaries and diverse shapes (as shown in Fig 1), which results in insufficient global feature extraction in many CNN networks during the classification of pulmonary nodules as benign or malignant, leading to poor accuracy. In contrast, the Transformer network [12] used in natural language processing excels at extracting long-distance contextual information and has been applied to image tasks in recent years. The Transformer network has a strong ability to process global information but is relatively weak to process local information. Since lung nodules have complex shapes and are prone to adhesion with lung tissues, in order to perform benign and malignant classification more accurately, this paper fuses CNN with Transformer-based ViT network [13], and proposes a ViT (Cross-ViT) network with cross-fusion attention. Cross-ViT network introduces the Cross- fusion Attention (CFA) module, which effectively captures and processes the local and global information of pulmonary nodules in CT images by fusing the features extracted by CNN and ViT, which improves the accuracy of classifying the benign and malignant nature of pulmonary nodules. The primary contributions of this paper are as follows:

Fig 1. Examples of lung nodules with various shapes in CT images.

Fig 1

(a) juxtapleural nodule. (b) cavitary nodule. (c) calcific nodule. (d) ground-glass opacity nodule.

  1. Aiming at the problem of insufficient feature extraction in the existing classification models for benign and malignant pulmonary nodules, a Cross-ViT network that can fuse local and global information is proposed.

  2. The feature Coupling (FC) module is proposed to solve the problem that CNN branch and Transformer branch feature map scales cannot be fused due to mismatch.

  3. The CFA module is proposed to fuse local and global features in both directions.

The structure of the paper is structured as follows: The Material and methods chapter described the dataset and the structure of the network proposed in this paper, and the Result chapter described the details of the experiments and discussed the experimental results in detail. In the Conclusion chapter, the methods of this paper are summarized, and future research is prospected. Source code is available at https://github.com/tipyan/benign-and-malignant-classification.

Material and methods

Dataset

The experiments in this paper use CT images of lung nodules from the LUNA16 dataset to train and evaluate the proposed network. The LUNA16 dataset stands for Lung Nodule Analysis 16. This dataset was introduced in 2016 to develop a CAD system that can automatically detect lung nodules in CT scans. The LUNA16 dataset is a subset of the largest open-source pulmonary nodule dataset, LIDC-IDRI [14]. The LIDC-IDRI dataset is collected by the U.S. National Cancer Institute and has a total of 1,018 study instances. In the LUNA16 dataset, nodules less than 3 mm in diameter are called micronodules, and if the slices of a CT scan are too thick, there may be a situation in which the CT slice does not contain pixels of pulmonary nodules. Moreover, the malignancy rate of micronodules is extremely low, and it is not meaningful to categorize micronodules. Therefore, the LUNA16 dataset was screened against the LIDC-IDRI dataset, the final dataset contains 1,186 nodules.

The benign and malignant grades of pulmonary nodules were categorized into 5 categories, with category 1 being benign, category 2 being suspected benign, category 3 being uncertain, category 4 being suspected malignant, and category 5 being malignant. In this paper, we removed the uncertain nodules labeled as category 3 in the LUNA16 dataset, and then the nodules labeled as malignant or suspected malignant by all 4 physicians were recorded as malignant nodules with the label set to 1, and the remaining pulmonary nodule samples were recorded as benign with the label set to 0. The imbalance of data would have an impact on the effectiveness of the network of classification. In unbalanced data, the results of model training tend to favor the side with the greater number. Therefore, the balance of data is very important for the task of pulmonary nodule classification. The dataset after screening and reformulation of labels contained a total of 1004 pulmonary nodules, of which 450 were malignant and 554 were benign, with a ratio of positive to negative samples close to 1:1.

Before the CT images are input into the network, appropriate cropping is beneficial to reduce the proportion of the image occupied by the non-pulmonary nodule background in the image, in order to enhance the attention of the classification network to the pulmonary nodules. The lung nodule classification task filtered out 1004 lung nodules, and the size statistics of these 1004 lung nodules are shown in Fig 2. From Fig 2, it can be seen that when the cropped size size is 35×35 basically covers all the nodule information. In order to include all the lung nodule information and facilitate network input, a CT data block of size 3 × 48 × 48 is finally cropped with the nodule that is the center of the input data. Where 3 represents 3 layers of CT slices containing lung nodules, which are the center layer of the nodule slice and the upper and lower slices immediately above and below the center slice.

Fig 2. Diameter distribution of pulmonary nodules after screening in the LUNA16 dataset.

Fig 2

Since the cropped image is only 48×48×3 in size, a single pixel contains too much information, which is not conducive to the use of more complex preprocessing methods. Therefore, this paper only carried out normalization of the CT image simply. In addition, this paper also adds an online random flipping data enhancement method.

Proposed method overview

Two key concepts are often involved when extracting visual features: local features and global features. Local features are small-area vector representations of an image, which plays a key role in many computer vision algorithms. Extracting local features helps to better understand the local information of an image. Global representations include some holistic information such as contours, shapes and object types of the image. Traditional CNN networks learn the local information of an image in a hierarchical manner through convolutional operations, while Transformer-based ViT networks integrate global information through cascading self-attentive modules.

Fig 3 shows examples of lung nodules of different sizes and shapes. When classifying these lung nodules, feature extraction of size and morphology should take into account the overall image, while the local image should be taken into account when extracting small features. Therefore, it is particularly important to integrate the local features of CNN and the global features of Transformer in the pulmonary nodule classification task.

Fig 3. Sections of lung nodules of different sizes and shapes.

Fig 3

In order to synthesize the advantages of the two networks, this paper proposes a Cross-ViT network with cross-confused attention for lung nodule classification, the structure is shown in Fig 4. The Cross-ViT network performs complementary fusion of information after extracting features from the CNN branch and the ViT branch, respectively, which fully combines the advantages of the two networks in extracting different features.

Fig 4. Cross-fusion attention ViT network diagram.

Fig 4

The Cross-ViT network mainly consists of two branches: a CNN branch for extracting local features and a Transformer branch for capturing global features. CT images of pulmonary nodules are input into each branch separately for feature extraction, and this parallel structure means that it maximizes the preservation of local and global features. However, the sizes of the feature maps output from the two are usually mismatched, so an FC module is needed to reshape the size between the output feature maps of the two branches for feature fusion by the CFA module. For the feature maps output from both branches, a fully connected layer serves as the classifier, and both branches are trained using a binary cross-entropy loss function. Finally, the mean of the probabilities of the two for classification is taken as the final classification result. The details of the structure of each component in the Cross-ViT network will be provided in the following subsections.

CNN branch and Transformer branch of the proposed method

CNN branch

ResNet50 [15] introduces the residual structure, which can make the training of the network smoother and more stable. Therefore, the CNN branch uses ResNet50 to extract local features of pulmonary nodules. There are five stages in ResNet50. In Stage 0, the image is downsampled using a large 7×7 convolution to preserve the original image information as much as possible. The CNN branch inputs the image, and then downsamples with a 7×7 convolution to reduce the feature map size. Each of the latter four stages consists of an unequal bottleneck layer, each bottleneck layer consists of one 1×1 convolution, one 3×3 convolution, one 1×1 convolution, and the residual connection between the bottleneck input and output. To reduce the dimensionality of the ResNet50 output, only Stage1- Stage3 are taken as the feature extraction nets for the CNN branches.

As shown in Fig 4, we take the outputs of Stage1-Stage3 as the features extracted in the CNN part, and the outputs of Stage1-Stage3 contain information of different scales. On this basis, this paper adds the decoder part of Unet as the fusion of features of different scales. The output of stage 3 unsampled is combined with the output of stage 2, then combined with the output of stage 1 again after two convolution layers and upsampling, and output after two convolution layers. Since the input first goes through a convolution with a kernel size of 7×7 and a stride of 2, and a maxpooling with a stride of 2, the input size of stage1 is 1/4 of the original size, and the output of the CNN part is the same as the output size of stage1, which is also 1/4 of the original size. This makes the output size of the CNN section the same as the output size of the subsequent Transformer section.

Transformer branch

The Transformer branch uses the ViT network to extract the global features of the pulmonary nodules. The ViT network is an image classification network proposed by Google team, and its structure largely retains the Transformer structure. The ViT structure is shown in Fig 5, it comprises five parts: Linear Projection of Flattened Patches, class token, Position Embedding, Transformer Encoder and MLP Head.

Fig 5. Illustration of ViT network.

Fig 5

The Transformer input requirement is a sequence of vectors, while images are three channel pixel matrices, which does not meet the input requirement. Therefore, the images are cropped into a number of image blocks before input, and then these image blocks are linearly projected and straightened into one-dimensional vectors with classification markers and embedding position coding, which finally form Embedding Patches.

The ViT network uses several Transformer encoders to extract features. As can be seen in Fig 5, each Transformer encoder consists of layers Layer Normalization (LN), Multi-Head Attention, MLP block. LN is a normalization method introduced for Natural Language Processing, which avoids the occurrence of gradient vanishing phenomenon. After the data is normalized, it enters the Multi-Head Attention mechanism for computation. The multi-head attention mechanism can integrate the features learned by different heads and is the most important structure in the whole ViT network, whose structure is shown in Fig 6.

Fig 6. Chart of multi-head attention mechanism structure.

Fig 6

As can be seen in Fig 6, multiple heads in the multi-head attention mechanism enable the model to attend to various aspects of information. This mechanism stacks the Scaled Dot-Product Attention process h times and combining the outputs. After computing the scaled dot product attention process, three weights representing the query (Q), key (K), and value (V), respectively, are first extracted, where K has size dk and V has size dv. In this calculation process, Q is first subjected to a dot product operation with K, where each K is then divided by dv, followed by utilizing the Softmax function to obtain the weights of the V. The overall calculation of the single head attention is shown in Eq (1).

AttentionQ,K,V=softmaxQKTdkV (1)

Single head attention does not pay enough attention to the model, resulting in extracting information poorly. In contrast, the basic idea of the multi-head attention mechanism is to notice the information from each representation subspace at the same time in each position, so that the manipulation of the different positional inputs is produced in an order independent contextual information attention, which is a global attention mechanism. The multi-head attention calculation process is shown in Eq (2).

MultiHeadQ,K,V=Concathead1,head2,,headnwo (2)

where Concat denotes matrix splicing, headi denotes the ith single-head attention, WiQdmodel×dk,WiKdmodel×dk,WiVdmodel×dk,WiOdmodel×dk.

After passing through the multi-head attention mechanism, the input data proceeds to a MLP block where the input vector length is quadrupled in the fully connected layer, and the vector is reshaped to the original length in the output, so that more deep information can be learned, and it is guaranteed that the size of the vector is the same. At this point, a Transformer encoder computation process is finished, such an operation will be repeated L times. Eventually the MLP header will integrate the information extracted earlier and then get the final classification result from the class token.

As shown in the Fig 4, the input size for the Transformer branch is 3×H×W (H and W are the height and width of the input image, respectively), and it is first sliced in 3×4×4 slices to get H/4×W/4 image blocks for subsequent operations. Same as ViT, this paper adopts the convolution method to slice images. The output channel size H of the convolution is 768, so the input size of Transformer branch is H×(H/4×W/4). The main purpose of Transformer branch in this article is to extract features, so the class token is removed from the method in this paper. After the position embedding, we use multiple consecutive Transformer encoder layers. In this experiment, the number of layers N is 12, and each Transformer encoder layer contains 12 heads. The final output size of the Transformer branch is the same as the input size, which is H×(H/4×W/4).

Feature Coupling module

The CNN branch and the Transformer branch output feature maps with different dimensions, and to realize the feature fusion of the two branches, the first step is to eliminate the difference in size between them. To address the issue of dimension mismatch between the two feature maps, this paper uses a two-way interactive FC module to match the size of the feature maps, and its specific structure is shown in Fig 7(a).

Fig 7. The structure diagram of the two modules proposed in this paper.

Fig 7

(a) Feature Coupling module; (b) Cross-fusing attention module.

The output of CNN branch is a two-dimensional feature map, while the output of Transformer branch is a one-dimensional data, which has different dimensions. However, as described in the previous chapter, because CNN uses the decoder structure of Unet, the output of CNN branch is 1/4 of the original size (So we’re going to write it in terms of C×H/4×W/4.). For Tansformer branch, the output shape of the branch is the same as the input shape, while the input of Transformer branch is divided into the size of E×H/4×W/4 from the original image and reshaped into the shape of E×N (where the size of N is HW/16). In order to integrate features extracted by two branchs, a Feature Coupling module is designed in this paper. As shown in Fig 7(a), the module processes the output of two branchs in different paths. The path from top to bottom is to process the output of CNN branch. The output of CNN branch goes through a 1×1 convolution and reshapes into the same size as the output of Transformer branch. This operation is to make the number of channels of the output of CNN branch consistent with Transformer branch. The bottom-to-top path is to process the output of the Transformer branch, which is similar to the processing of CNN branch. The output of Transformer branch is reshaped into the same shape as the output of CNN branch and integrated by 1×1 convolution. The results of both paths are used as the outputs of the FC module. The FC module achieves the two branches through a series of operations, which provides the basis for the subsequent feature fusion by the CFA module.

Cross- fusion attention module

Non-local attention modules [16] are widely used as their ability to capture remote dependencies between locations. In order to fuse the feature maps output from the two branches, a cross- fusion attention module is proposed based on the non-local attention module, and its structure is shown in Fig 7(b).

CFA takes two inputs: one from the current branch’s feature map and the other from a cross-branch feature map generated through the Feature Coupling module. When feature fusion is performed at the CNN branch, the size of the feature map Oin at the CNN branch is H×W×C, and the size of the feature map Cin at the integrated Transformer branch in the Feature Coupling module is H×W×C. The Oin is first convolved with two 1 × 1 convolutions, the channel is reduced to C/2, and then a spreading operation is performed in both H×W dimensions to obtain a feature map of size H×W×C/2. The Cin is convolved with one 1 × 1 convolution for the same purpose. Next the feature map passing through ϕ is timed with the feature map passing through θ to get a matrix of size HW×HW, and then the similarity matrix is calculated by softmax operation. Finally, the feature map passing through g is multiplied with the similarity matrix and the dimension is expanded to H×W×C/2 to get Y. The formula is shown in Eq (3).

yi=softmaxθxiTϕxjgxj (3)

where xi denotes the feature of Oin at position i, xi denotes the feature of Cin at position j, g(xj), ϕ(xj) and θ(xi) denotes the linear mapping (1×1 convolution can be used to instead), and yi denotes the output of the corresponding position Y.

The output Y passes through a 1×1 convolution to reshape the channels to C, and an elementwise summation operation is performed with the original output Oin to obtain the output Z of the cross-fertilization attention module. When feature fusion is performed in the Transformer branch, the size of the feature map in the Transformer branch is E×N, and the size of the feature map in the integrated CNN branch in the Feature Coupling module input is E×N (E and N are the dimensionality of the feature vectors and the number of channels, respectively). In order to fuse the two features, the 1×1 convolution in the CFA module needs to be replaced with a 1D convolution before feature fusing.

The two branches perform feature fusion with each other and then output separately, such an operation can utilize the respective advantages of CNN and Transformer in feature extraction well and strengthen its classification capacity. The output of the network presented in this paper will contain the output of two branches.

lossdiff=BCEoutputCNNifabsoutputCNN0.5>absoutputTrans0.5elseoutputTrans (4)
loss=lossdiff+BCECNN+BCETrans (5)

As shown in the Eqs (4) and (5), BCE represents cross entropy loss, outputCNN represents the output of CNN, and outputTrans represents the output of Transformer. In the training stage, the output of each branch will calculate the difference with the label and retain the result with the furthest difference, then we calculate the crossentropy loss of the retained result and sum the crossentropy losses of the output result of each branch. The sum of the three will be the final loss of the method in this paper. In the test phase, the output of the two branches is farthest from 0.5 will be the final classification result.

Results

Experimental details

The experiments are conducted in an environment built on Windows 10 system, the processor used is 12700KF, the size of the running memory is 32GB, the graphics card is NVIDIA GeForce RTX 4080 with 16GB of video memory size, and the framework used is Pytorch 2.5. The dataset is randomly divided into two sets according to the ratio of 8:2, which are used for training and testing respectively. During the experiments, the batch size used for training is set to 32, optimization is performed using AdamW algorithm, the initial learning rate is set to 0.00002, the betas are set to (0.9, 0.999), and the total number of training rounds is 120.

The article also used the ExponentialLR method to update the learning rate in real time, which decreases the learning rate exponentially after each epoch training, and the parameter of gamma is set to 0.9 in this article. Fig 8 shows the change of the loss function with the epochs after using ExponentialLR or not. We can see that the loss updates are more stable when we are using ExponentialLR. The reduction of the volatility of the loss function also is helpful to avoid the occurrence of overfitting.

Fig 8. Loss function changes with the epoch with ExponentialLR or without ExponentialLR.

Fig 8

In order to evaluate the performance of the proposed network, a 5-fold cross-validation strategy is used for the experiments and the average segmentation performance is taken as the final segmentation result.

Operation time has always been the focus of deep learning-related work. The Cross-ViT proposed in this paper takes the image after slicing as input, and the size of the slice is only 48×48×3, which makes the calculation amount very small. In the experiment of this paper, the average calculation time of each graph is 6.34ms.

Evaluation metrics

The task of classifying pulmonary nodules as benign or malignant is a dichotomous problem in which four cases occur in the specific classification result: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). Classification metrics are capable of evaluating the performance of the model, and common classification metrics are Accuracy (ACC), Precision (PRE), Recall (REC), and F1-Score. These evaluation metrics can be calculated using TP, FP, TN, and FN.

The formulas for calculating ACC, PRE, REC and F1-Score are given below:

ACC=TP+TNTP+TN+FP+FN (6)
PRE=TPTP+FP (7)
REC=TPTP+FN (8)
F1Score=2×PRE×RECPRE+REC (9)

5-Fold cross-validation

In order to make effective use of the datasets, in this section, the Cross-ViT network is trained and validated using the 5-fold cross-validation method, the experimental results are presented in Table 1. It can be seen that the ACC, PRE, REC and F1-score of the results of the 5-fold cross-validation reach 91.04%, 91.96%, 92.45% and 91.92%, respectively.

Table 1. Results of cross-validation of the Crdoss ViT network for benign and malignant pulmonary nodules.

Folds ACC(%) PRE(%) REC(%) F1-Score (%)
Fold1 91.05 92.86 91.23 92.04
Fold2 89.05 91.07 89.47 90.26
Fold3 92.51 91.89 94.44 93.15
Fold4 89.55 89.30 91.74 90.51
Fold5 93.03 91.96 95.37 93.63
Average 91.04 91.42 92.45 91.92

Ablation experimental validation

To verify the validity of the FC and CFA modules proposed in this paper, we tested the ablation experiments without the addition of these two modules. In the ablation experiment in this paper, output features of CNN branch and Transformer branch are directly output without fusion, and other settings are the same as in this paper.

Table 2 lists the results of using FC and CFA modules and without using FC and CFA modules. It can be seen that except ACC, the PRE, REC and F1-Score using FC and CFA modules are 0.96%, 3.12% and 2.03% higher respectively, while ACC index is only 0.31% lower than that without FC and CFA modules. Kappa consistency test was also carried out in this paper, and it can be seen that both with and without FC and CFA have high consistency, but the consistency of the method with FC and CFA is low but similar. Combining PRE and REC, it can be seen that the proposed method has higher recognition accuracy for positive samples and can have a good fusion effect on the features extracted by the two branches.

Table 2. Results of ablation experiments without FC and CFA modules.

Methods ACC(%) PRE(%) REC(%) F1-Score (%) Kappa
Without FC and CFA 91.35 90.46 89.33 89.89 0.8252
With FC and CFA 91.04 91.42 92.45 91.92 0.8186

Comparative experimental validation

To further validate the performance of the proposed Cross-ViT network, this section compares the performance of the proposed method with some current mainstream classification networks and lung nodule classification networks, and the results are displayed in Table 3 and Fig 9.

Table 3. Comparison of pulmonary nodule classification networks.

Methods ACC(%) PRE(%) REC(%) F1-Score (%)
VGG16 [17] 86.43 89.41 87.23 88.31
GoogleNet [18] 87.78 90.89 89.65 90.27
ResNet50 87.54 90.23 88.47 89.34
SENet [19] 88.24 90.77 88.56 89.65
ViT 88.23 90.35 90.27 90.31
DeepLung [20] 90.44 - 95.80 -
3D-SE-CDNet [21] 89.93 - 83.34 -
DFOF [22] 92.13 94.16 87.16 89.93
Tang et al. [23] 89.35 - 87.31 -
Lin et al. [24] 92.80 - - 92.16
Cross-ViT 91.04 91.42 92.45 91.92
Cross-ViT (SENet) 92.43 94.27 91.68 92.96

Fig 9. Histograms of each metric compared to state of the art methods.

Fig 9

(a) ACC; (b) PRE; (c) REC; (d) F1-Score.

It can be seen that VGG16 [17] has the worst performance in all aspects, and its redundant parameters can easily lead to the overfitting problem when training on small data. GoogleNet [18] and ResNet50 have improved accuracy compared to VGG16, and their accuracies reach 88.36% and 87.54%, respectively. Compared with VGG16, GoogleNet and ResNet50 have fewer parameters and deeper networks, which can achieve better results with fewer training parameters, but still fail to capture important information when extracting complex CT images. SENet [19] adds the SE attention module compared with ResNet50, which pays more attention to the lung nodule, and the accuracy was improved to 88.24%. ViT networks do not have a feature modeling structure similar to that of CNN networks that deepens layer by layer to capture the local features of lung nodules, and thus their classification ability does not have a great advantage over that of CNNs. These networks do not set a more favorable feature extraction structure for lung nodule images, so this section also compares them with some network models in the field of pulmonary nodule classification.

Notably, the Cross-ViT network proposed in this paper outperforms most of the methods in terms of performance. This is because the Cross-ViT network fully combines the ability of the CNN network for extracting local features with the ability of the ViT network for extracting context-dependent information, which results in a better performance, but some of the performance metrics are not as superior as those of DFOF. Considering that the CNN branch uses ResNet50, which may be insufficient for extracting information for complex images such as lung nodules, this paper re-tested the performance after changing the CNN branch to SENet and found that the performance of the improved model is even more superior, exceeding that of the DFOF model. Compared with the results of Lin et al., although the ACC index of the proposed method is lower, the F1-Score of this paper is higher. In general, the results of this method are similar to those of Lin et al.

Limitations

Although the method in this paper integrates CNN branch and Transformer branch at the feature layer, the output of each branch is still retained in the output results. For these two outputs, the most likely part is selected in the method of this paper (results greater than 0.5 are calculated as the probability of a positive sample, and samples less than 0.5 are calculated as the probability of a negative sample). In the future work, a more intelligent result selection method can be studied to achieve more accurate threshold division and result output.

Conclusion

In this paper, to tackle the issue of insufficient feature extraction in the classification model for benign and malignant lung nodules, a Cross-ViT network is proposed to fuse the benign and malignant nature of pulmonary nodules. Cross-ViT network uses CNN branch and Transformer branch to extract the local and global features of the lung nodules respectively, and it fuses the two kinds of features to discriminate the benign and malignant nature to a certain extent. The results show that the Cross-ViT network proposed in this paper is superior for classifying the lung nodules as benign or malignant. In future work, we intend to assess the network on additional medical image datasets and refine the fusion of local and global features according to classification outcomes across different datasets to improve generality and accuracy.

Data Availability

The data supporting the findings in this paper are available from LIDC-IDRI database(https://www.cancerimagingarchive.net/collection/lidc-idri).

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians. 2021; 71(3): 209–249. doi: 10.3322/caac.21660 [DOI] [PubMed] [Google Scholar]
  • 2.National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. New England Journal of Medicine. 2011; 365(5): 395–409. doi: 10.1056/NEJMoa1102873 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Swensen SJ, Jett JR, Hartman TE, Midthun DE, Sloan JA, Sykes AM, et al. Lung cancer screening with CT: Mayo Clinic experience. Radiology. 2003; 226(3): 756–761. doi: 10.1148/radiol.2263020036 [DOI] [PubMed] [Google Scholar]
  • 4.Doi K. Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Computerized Medical Imaging and Graphics. 2007; 31(4–5): 198–211. doi: 10.1016/j.compmedimag.2007.02.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wu W, Hu H, Gong J, Li X, Huang G, Nie S, et al. Malignant-benign classification of pulmonary nodules based on random forest aided by clustering analysis. Physics in Medicine & Biology. 2019; 64(3): 035017. doi: 10.1088/1361-6560/aafab0 [DOI] [PubMed] [Google Scholar]
  • 6.Han F, Wang H, Zhang G, Han H, Song B, Li L, et al. Texture feature analysis for computer-aided diagnosis on pulmonary nodules. Journal of Digital Imaging. 2015; 28: 99–115. doi: 10.1007/s10278-014-9718-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Farahani FV, Ahmadi A, Zarandi MHF, et al. Lung nodule diagnosis from CT images based on ensemble learning. 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE; 2015. pp. 1–7.
  • 8.Chen X, Wang X, Zhang K, Fung KM, Thai TC, Moore K, et al. Recent advances and clinical applications of deep learning in medical image analysis. Medical Image Analysis. 2022: 102444. doi: 10.1016/j.media.2022.102444 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  • 10.Xie Y, Xia Y, Zhang J, Song Y, Feng D, Fulham M, et al. Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest CT. IEEE Transactions on Medical Imaging. 2018; 38(4): 991–1004. doi: 10.1109/TMI.2018.2876510 [DOI] [PubMed] [Google Scholar]
  • 11.Al-Shabi M, Shak K, Tan M. ProCAN: Progressive growing channel attentive non-local network for lung nodule classification. Pattern Recognition. 2022; 122: 108309. [Google Scholar]
  • 12.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017; 30. [Google Scholar]
  • 13.Wang H, Zhu H, Ding L. Accurate classification of lung nodules on CT images using the TransUnet. Frontiers in Public Health. 2022; 10: 1060798. doi: 10.3389/fpubh.2022.1060798 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Armato III SG, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, et al. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics. 2011; 38(2): 915–931. doi: 10.1118/1.3528204 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2016. pp. 770–778.
  • 16.Wang X, Girshick R, Gupta A, He K. Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. IEEE; pp. 7794–7803.
  • 17.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arxiv preprint arxiv:1409.1556, 2014.
  • 18.Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE; 2015. pp. 1–9.
  • 19.Hu J, Shen L, Sun G. Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE; 2018. pp. 7132–7141.
  • 20.Zhu W, Liu C, Fan W, Xie X. Deeplung: Deep 3d dual path nets for automated pulmonary nodule detection and classification. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE; 2018. pp. 673–681.
  • 21.Fu J. Application of modified inception-resnet and condensenet in lung nodule classification. 3rd International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2019). Atlantis Press; 2019. pp.186-194.
  • 22.Huang H, Li Y, Wu R, Li Z, Zhang J. Benign-malignant classification of pulmonary nodule with deep feature optimization framework. Biomedical Signal Processing and Control, 2022; 76: 103701. [Google Scholar]
  • 23.Tang W, Yang Z, Song Y. Disease-grading networks with ordinal regularization for medical imaging. Neurocomputing. 2023;545: 126245. [Google Scholar]
  • 24.Lin C-Y, Guo S-M, Lien J-JJ, Lin W-T, Liu Y-S, Lai C-H, et al. Combined model integrating deep learning, radiomics, and clinical data to classify lung nodules at chest CT. Radiol Med. 2023;129: 56–69. doi: 10.1007/s11547-023-01730-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Sarada Prasad Dakua

4 Oct 2024

PONE-D-24-34082Cross-ViT based benign and malignant classification of pulmonary nodulesPLOS ONE

Dear Dr. Zhu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 18 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Sarada Prasad Dakua

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

Additional Editor Comments:

The authors are advised to carefully address the reviewers' comments.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper has addressed a nice research problem; however, I have the below comments: when applying the proposed Cross-ViT network for benign and malignant classification of pulmonary nodules, there seems several limitations that the authors need to discuss in their revision:

1. The performance of Cross-ViT is highly reliant on the quality and size of the training dataset. If the dataset lacks sufficient examples of either benign or malignant nodules, the model may not generalize well.

2. Medical datasets often suffer from class imbalance, which can lead to biased predictions favoring the more prevalent class. This is critical in nodule classification, where benign cases might outnumber malignant ones.I would suugest the authors to discuss by citing the below papers

“Perspectives on the technological aspects and biomedical applications of Virus-like-particles/ Nanoparticles in reproductive biology: Insights on the medicinal and toxicological outlook,” Advanced NanoBiomed Research, Wiley, 2:2200010, 2022.

"Synergistic and Additive Effects of Menadione in Combination with Antibiotics on Multidrug-Resistant Staphylococcus aureus: Insights from Structure-Function Analysis of Naphthoquinones," ChemMedChem, Chemistry Eurpoe vol. 18, no. 24, 2023.

"Leveraging hallmark Alzheimer’s molecular targets using phytoconstituents: Current perspective and emerging trends,"Biomedicine & Pharmacotherapy, Elsevier, vol. 139, no. 111634, 2021

3. The integration of local features from CNNs with global features from Transformers may not always align well in the context of medical images, potentially leading to suboptimal feature fusion and classification performance.

4. The complex architecture can lead to overfitting, especially when trained on limited datasets, which is common in medical imaging.

5. The model's architecture requires significant computational resources, which may not be feasible in clinical settings with limited hardware. Complexity remains an issue, the authors can refer the below papers while discussing this: “Real-time Automated Image Segmentation Technique for Cerebral Aneurysm on Reconfigurable System-On-Chip,” Journal of Computational Science, Elsevier, vol. 27, pp 35-45, 2018.

“Lattice-Boltzmann Interactive Blood Flow Simulation Pipeline,” International Journal of Computer Assisted Radiology and Surgery, Springer, vol.15, pp. 629-639, 2020.

“Zynq SoC based Acceleration of the Lattice Boltzmann Method,” Concurrency and Computation: Practice and Experience, Wiley, col. 31, issue 17, 2019.

“Heterogeneous System-on-Chip based Lattice- Boltzmann Visual Simulation System,” Systems Journal, IEEE, vol. 14, no. 2, pp. 1592-1601, 2020

6. Medical images can contain artifacts or noise, and the attention mechanisms may not always effectively filter out irrelevant information, impacting classification accuracy.

7. Effective feature extraction relies on the quality of preprocessing steps (e.g., normalization, augmentation). Poor preprocessing can adversely affect both local and global feature extraction. The authors could cite the below papers and discuss if pre-processing could be of help: “A Lightweight Neural Network with Multiscale Feature Enhancement for Liver CT Segmentation,” Scientific Reports, Nature, vol. 12, no. 14153, pp. 1-12, 2022.

“Re- routing drugs to blood brain barrier: A comprehensive analysis of Machine Learning approaches with fingerprint amalgamation and data balancing,” IEEE Access, vol. 11, pp. 9890-9906, 2023.

“Dense-PSP-UNet: A Neural Network for Fast Inference Liver Ultrasound Segmentation,” Computers in Biology and Medicine, ScienceDirect, vol. 153, pp. 106478, 2023.

8. Please include the potential limitations of the paper.

Reviewer #2: 1. Many methods have used CNNs for preliminary feature extraction and then Vision Transformers (ViTs) for capturing long-range dependencies in the feature maps. How is this work different from the other works that have been proposed for medical image segmentation?

2. Authors have mentioned performance on the LUNA 16 dataset at the end of the abstract, but how much is the improvement over the best-performing state-of-the-art model compared to the work proposed in this paper? This should be highlighted numerically at the end of the abstract.

3. Lines 50 to 54 claim CNNs have been used. More references specifically for applications should be cited here. Some of them that should be included are as follows:

a. A lightweight neural network with multiscale feature enhancement for liver CT segmentation

b. A lightweight neural network with multiscale feature enhancement for liver CT segmentation

c. Enhancing ECG-based heart age: impact of acquisition parameters and generalization strategies for varying signal morphologies and corruptions

d. Advancements in Deep Learning for B-Mode Ultrasound Segmentation: A Comprehensive Review

e. Unveiling the future of breast cancer assessment: a critical review on generative adversarial networks in elastography ultrasound

f. Neural network-based fast liver ultrasound image segmentation

4. Add the structure of the paper at the end of the introduction.

5. Explanation of local and global features in lines 137 to 142 should be improved by explaining them with specific examples in the medical images that the work is based on.

6. Lines 142 to 150 reiterate what has already been stated in the introduction regarding CNN transformers. This section of the network overview should cover the overall architecture of the CrossVIT model, not reintroduce the contributions of the paper.

7. The size difference between the CNN encoder and the transformer encoder is not the only problem. These two encoders are capturing features at different scales. How is the feature coupling block handling the feature maps that are generated at completely different scales?

8. How can the probability generated from the two branches be summed? This can ruin the probability distribution, as the result may not remain between 0 and 1. This approach needs to be clarified or corrected.

9. How many transformer blocks were used? What is the configuration of the transformer block in terms of the number of heads and other hyperparameters? More details regarding the hyperparameters and the configuration of both the CNN and transformer branches need to be provided.

10. The textual description of the feature coupling module does not correspond to the diagram shown. Authors talk about feedback between the branches. What do they mean by this? Please rewrite the entire description of the feature coupling module for better clarity.

11. According to the CFA diagram, the cross-branch feature map serves as the query, and the current branch provides the key and the value. Is that correct? If so, please clarify why this design choice was made for the cross-fusion attention module.

12. Add a GitHub link with proper documentation explaining the network. It should contain all the necessary files to reproduce the results on the LUNA 16 dataset for this network.

13. The paper lacks ablation studies to show the performance impact of cross-vision attention and feature coupling, and the role of individual encoders. Ablation studies are necessary to show the performance contribution of each component.

14. The network has been compared with fairly old architectures like VGG-16, GoogleNet, ResNet, and ViTs. Please compare it with more state-of-the-art architectures from the literature for a fairer comparison.

15. There is no discussion on the limitations of this work. Please include a limitations section in the paper, discussing any potential shortcomings or areas where the method could be improved.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Feb 5;20(2):e0318670. doi: 10.1371/journal.pone.0318670.r002

Author response to Decision Letter 0


8 Nov 2024

Response to the comments

Dear Editor and Reviewer:

Thank you for your letter and the reviewer’s comments concerning our manuscript entitled Cross-ViT based benign and malignant classification of pulmonary nodules. Those comments are all valuable and very helpful for revising and improving our paper, we have studied comments carefully and made corrections which we hope to meet with approval. And we carefully listened to the reviewers and added a new table (Table 2), four new pictures (Fig 2,4,7,8,11) and replaced some tables and pictures (Fig 8, Table 3). In addition, we have carefully checked and modified some of the content (marked red in text) in the paper to make the article more rigorous. The responses to the reviewers’ comments are as follows.

Reviewer #1: The paper has addressed a nice research problem; however, I have the below comments: when applying the proposed Cross-ViT network for benign and malignant classification of pulmonary nodules, there seems several limitations that the authors need to discuss in their revision:

1. The performance of Cross-ViT is highly reliant on the quality and size of the training dataset. If the dataset lacks sufficient examples of either benign or malignant nodules, the model may not generalize well.

[Response]: Thank you so much for your suggestion. The method proposed in this paper is based on deep learning, and the development of deep learning inevitably needs to rely on big data. Many of today's mature deep learning models are supported by large amounts of data, such as chatgpt. How to train better results with less data and smaller models is also the focus of current research in the field of deep learning. In the method presented in this paper, we validate our results using a publicly available data set, and for further generalization problems, a larger data set should be reconstructed in real applications to retrain the model.

2. Medical datasets often suffer from class imbalance, which can lead to biased predictions favoring the more prevalent class. This is critical in nodule classification, where benign cases might outnumber malignant ones.I would suugest the authors to discuss by citing the below papers

“Perspectives on the technological aspects and biomedical applications of Virus-like-particles/ Nanoparticles in reproductive biology: Insights on the medicinal and toxicological outlook,” Advanced NanoBiomed Research, Wiley, 2:2200010, 2022.

"Synergistic and Additive Effects of Menadione in Combination with Antibiotics on Multidrug-Resistant Staphylococcus aureus: Insights from Structure-Function Analysis of Naphthoquinones," ChemMedChem, Chemistry Eurpoe vol. 18, no. 24, 2023.

"Leveraging hallmark Alzheimer’s molecular targets using phytoconstituents: Current perspective and emerging trends,"Biomedicine & Pharmacotherapy, Elsevier, vol. 139, no. 111634, 2021

[Response]: Thank you so much for your suggestion. Our data contained a total of 1004 data, of which 450 were malignant and 554 were benign. The data are almost in balance. We added relevant instructions and references to the literature in the Dataset section.

3. The integration of local features from CNNs with global features from Transformers may not always align well in the context of medical images, potentially leading to suboptimal feature fusion and classification performance.

[Response]: Thank you so much for your suggestion. The CNN network used in this paper takes resnet as the backbone and takes the output of stage1-3 as the extracted features. Then, the decoder part of unet is also quoted in this paper to fuse the features of different scales in stage 1-3. Since the input of resnet will first go through a 7×7 convolution with stride 2 and a maxpool with stride 2, the output size of CNN in this paper is 1/4 of the original image, while the input and output of Transformer are also 1/4 of the original image. Therefore, input scales of CNN branch and Transformer branch in the Feature Coupling module are the same. We have made some changes to the description of the method section.

4. The complex architecture can lead to overfitting, especially when trained on limited datasets, which is common in medical imaging.

[Response]: Thank you so much for your suggestion. Because the information contained in the limited data is not comprehensive, the overfitting problem is inevitable in the training of neural networks. To avoid overfitting, we applied ExponentialLR in our training, which uses exponential-adjusted learning rates that decline exponentially with each epoch. We added the training loss change graph to the Implementation details section. We can see that ExponentialLR effectively reduces the fluctuation of training losses.

5. The model's architecture requires significant computational resources, which may not be feasible in clinical settings with limited hardware. Complexity remains an issue, the authors can refer the below papers while discussing this: “Real-time Automated Image Segmentation Technique for Cerebral Aneurysm on Reconfigurable System-On-Chip,” Journal of Computational Science, Elsevier, vol. 27, pp 35-45, 2018.

“Lattice-Boltzmann Interactive Blood Flow Simulation Pipeline,” International Journal of Computer Assisted Radiology and Surgery, Springer, vol.15, pp. 629-639, 2020.

“Zynq SoC based Acceleration of the Lattice Boltzmann Method,” Concurrency and Computation: Practice and Experience, Wiley, col. 31, issue 17, 2019.

“Heterogeneous System-on-Chip based Lattice- Boltzmann Visual Simulation System,” Systems Journal, IEEE, vol. 14, no. 2, pp. 1592-1601, 2020

[Response]: Thank you so much for your suggestion. The main task of this paper is the classification of pulmonary nodules. The input of the method in this paper is the sliced pulmonary nodules image, the size of which is only 48×48×3, which is quite small compared with other objects, resulting in a very small computational load for the network with this image as input. In the experiment in this paper, the average time for processing a graph is 6.34ms. We have added references and explanations in the Implementation details section.

6. Medical images can contain artifacts or noise, and the attention mechanisms may not always effectively filter out irrelevant information, impacting classification accuracy.

[Response]: Thank you so much for your suggestion. Artifacts and noise can also be learned from the training of the network. The method in this paper integrates CNN and Transformer, so that local features and global features can be considered in our method at the same time. Similarly, artifacts and noise are classified as global and local, and the methods in this paper can also consider global and local noise more fully. We have revised the method section of this paper to make it more accurate.

7. Effective feature extraction relies on the quality of preprocessing steps (e.g., normalization, augmentation). Poor preprocessing can adversely affect both local and global feature extraction. The authors could cite the below papers and discuss if pre-processing could be of help: “A Lightweight Neural Network with Multiscale Feature Enhancement for Liver CT Segmentation,” Scientific Reports, Nature, vol. 12, no. 14153, pp. 1-12, 2022.

“Re- routing drugs to blood brain barrier: A comprehensive analysis of Machine Learning approaches with fingerprint amalgamation and data balancing,” IEEE Access, vol. 11, pp. 9890-9906, 2023.

“Dense-PSP-UNet: A Neural Network for Fast Inference Liver Ultrasound Segmentation,” Computers in Biology and Medicine, ScienceDirect, vol. 153, pp. 106478, 2023.

[Response]: Thank you so much for your suggestion. The image used in this paper is a slice of the lung nodule, whose size is only 48×48×3. Such a narrow resolution makes each pixel contain more information than a large image. However, overly complex preprocessing will destroy the structure between pixels, so it is not suitable to use more complex preprocessing methods in the case of large information density. Therefore, in the method proposed in this paper, we simply use the normalization of CT images and the online random flipping data enhancement method. We added the relevant instructions in the Dataset section.

8. Please include the potential limitations of the paper.

[Response]: Thank you so much for your suggestion. We have added limitations in the Experiments and discussion section.

Reviewer #2:

1. Many methods have used CNNs for preliminary feature extraction and then Vision Transformers (ViTs) for capturing long-range dependencies on the feature maps. How is this work different from the other works that have been proposed for medical image segmentation?

[Response]: Thank you so much for your suggestion. This paper proposes a Cross-ViT network that integrates CNN and transformer. The fusion network in this paper fuses features extracted from CNN and transformer, which has not been studied in previous pulmonary nodule classification works. At the beginning of the network, inputs are fed into the CNN branch and transformer branch respectively. After extracting the features of each branch, this paper designs a unique Feature Coupling module and a Cross-fusing attention to fuse the features of different branches.

2. Authors have mentioned performance on the LUNA 16 dataset at the end of the abstract, but how much is the improvement over the best-performing state-of-the-art model compared to the work proposed in this paper? This should be highlighted numerically at the end of the abstract.

[Response]: Thank you so much for your suggestion. We made changes to the summary as required.

3. Lines 50 to 54 claim CNNs have been used. More references specifically for applications should be cited here. Some of them that should be included are as follows:

a. A lightweight neural network with multiscale feature enhancement for liver CT segmentation

b. A lightweight neural network with multiscale feature enhancement for liver CT segmentation

c. Enhancing ECG-based heart age: impact of acquisition parameters and generalization strategies for varying signal morphologies and corruptions

d. Advancements in Deep Learning for B-Mode Ultrasound Segmentation: A Comprehensive Review

e. Unveiling the future of breast cancer assessment: a critical review on generative adversarial networks in elastography ultrasound

f. Neural network-based fast liver ultrasound image segmentation

[Response]: Thank you so much for your suggestion. We add more literature reviews on the application of CNN in the medical field in lines 54 to 62.

4. Add the structure of the paper at the end of the introduction.

[Response]: Thank you so much for your suggestion. We have added the structure of this article at the end of the Introduction chapter.

5. Explanation of local and global features in lines 137 to 142 should be improved by explaining them with specific examples in the medical images that the work is based on.

[Response]: Thank you so much for your suggestion. We have added a new example figure of lung nodules to the Network overview section, along with an explanation of the global and local features of lung nodules.

7. The size difference between the CNN encoder and the transformer encoder is not the only problem. These two encoders are capturing features at different scales. How is the feature coupling block handling the feature maps that are generated at completely different scales?

[Response]: Thank you so much for your suggestion. The output of STAGE1-3 of ResNet in CNN is used as the features in this paper, and the output of stage3 will be upsampled and combined with the output of stage2, and then the combined result will be convolved and upsampled and combined with the output of stage1, and finally the result will be output through convolution. This structure is similar to Unet, but since the input of ResNet will undergo a 7×7 convolution with stride 2 and a maximum pooling layer, the output of the input after passing through the CNN part will be 1/4 of the original size, and the input of Transformer part will also be 1/4. In other words, the output scale of CNN and Transformer is the same. We have added new instructions in the CNN branch section.

8. How can the probability generated from the two branches be summed? This can ruin the probability distribution, as the result may not remain between 0 and 1. This approach needs to be clarified or corrected.

[Response]: Thank you so much for your suggestion. The output of the network presented in this article will contain the output of two branches. In the training stage, the output of each branch will calculate the difference with the label, retain the result with the furthest difference, calculate the crossentropy loss of the retained result, and sum the crossentropy losses of the output result of each branch. The sum of the three will be the final loss of the method in this paper. In the test phase, the output of the two branches is farthest from 0.5 will be the final classification result. We have added new instructions and formulas at the end of the Cross-fusion Attention module chapter.

9. How many transformer blocks were used? What is the configuration of the transformer block in terms of the number of heads and other hyperparameters? More details regarding the hyperparameters and the configuration of both the CNN and transformer branches need to be provided.

[Response]: Thank you so much for your suggestion. This paper uses 12 consecutive transformer encoder layers, each containing 12 headers. We have made changes to the explanation of the Transformer branch section.

10. The textual description of the feature coupling module does not correspond to the diagram shown. Authors talk about feedback between the branches. What do they mean by this? Please rewrite the entire description of the feature coupling module for better clarity.

[Response]: Thank you so much for your suggestion. The feature coupling module is designed to fuse the output of the CNN branch to the Tranformer branch and the output of the Tranformer branch to the CNN branch. Therefore, the output of different branches in this module is processed similarly, so that the output of different branches can be merged into the relative branch in later operations. We have rewritten parts of the Feature Coupling module chapter.

11. According to the CFA diagram, the cross-branch feature map serves as the query, and the current branch provides the key and the value. Is that correct? If so, please clarify why this design choice was made for the cross-fusion attention module.

[Response]: Thank you so much for your suggestion. The cross-fusing attention module proposed in this paper is applied to CNN branch and Transformer branch respectively. The main function of this module is to fuse relative branch information in each branch. The relative branch information is obtained from the output of the Feature Coupling module, as shown in Figure 6. In the cross-fusing attention module of each branch, the current branch acts as the main body, while the output of the Feature Coupling module acts as supplementary information. Therefore, the current branch acts as k and v in this paper. The output of the Feature Coupling module is q.

12. Add a GitHub link with proper documentation explaining the network. It should contain all the necessary files to reproduce the results on the LUNA 16 dataset for this network.

[Response]: Thank you so much for your suggestion. We provide a link to the code at the end of the Introduction.

13. The paper lacks ablation studies to show the performance impact of cross-vision attention and feature coupling, and the role of individual encoders. Ablation studies are necessary to show the performance contribution of each component.

[Response]: Thank you so much for your suggestion. The results of ablation Experiments without FC and CFA modules were added to the Experiments and discussion section. Since FC and CFA modules are bound to each other, only one set of ablation experiments was conducted.

14. The network has been compared with fairly old architectures like VGG-16, GoogleNet, ResNet, and ViTs. Please compare it with more state-of-the-art architectures from the literature for a fairer comp

Attachment

Submitted filename: Response to the comments.docx

pone.0318670.s001.docx (27.1KB, docx)

Decision Letter 1

Mohammad Amin Fraiwan

6 Dec 2024

PONE-D-24-34082R1Cross-ViT based benign and malignant classification of pulmonary nodulesPLOS ONE

Dear Dr. Zhu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 20 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Mohammad Amin Fraiwan

Academic Editor

PLOS ONE

Additional Editor Comments:

As noted previously by the Journal Office, in the previous round of review the reviewers recommended that you cite specific previously published works. Members of the editorial team have determined that the works referenced are not directly related to the submitted manuscript. As such, please note that it is not necessary or expected to cite the works requested by the reviewers, and we suggest that you remove references 6-9 and 22-30, unless you feel they are particularly relevant to your manuscript. 

[Note: HTML markup is below. Please do not edit.]

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Feb 5;20(2):e0318670. doi: 10.1371/journal.pone.0318670.r004

Author response to Decision Letter 1


8 Dec 2024

Response: Thank you for your letter and the comment concerning our manuscript entitled Cross-ViT based benign and malignant classification of pulmonary nodules. Your comment is valuable and very helpful for revising and improving our paper, we have carefully considered your concerns about references and have removed some of the references added in the last revision. Specifically, we deleted the references from the original labels [6-10] and [22-30].

Decision Letter 2

Mohammad Amin Fraiwan

12 Jan 2025

PONE-D-24-34082R2Cross-ViT based benign and malignant classification of pulmonary nodulesPLOS ONE

Dear Dr. Zhu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Feb 26 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Mohammad Amin Fraiwan

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: (No Response)

Reviewer #4: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

Reviewer #4: No

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: Major:

- The writing does not adhere to conventional scientific writing format making it difficult and unpleasant to read. I would suggest structuring the article as introduction, methods, results, discussion. There are also many subheadings that do not really have a core message. Additionally, the article consists of 11 figures each with only 1 panel. Can you condense this into less figures with more panels on each figure?

- Can you describe more about the LUNA16 dataset? In particular, what makes this dataset the gold standard for malignant vs benign pulmonary nodule discrimination? This is crucial for the study as it is what you are using for benchmarking

Minor:

- Table 3 should be presented as a proper figure with graphs. Benchmarking your method against other existing methods is a crucial part of the paper and it should be highlighted accordingly

- Can you also perform leave-one-out cross validation in addition to 5 fold cross validation?

- Lengthy descriptions of what basic statistical methods you used are unnecessary e.g. five-fold cross validation, detracting from the core message of your research and further contribute to the unpleasant readability of the paper.

Reviewer #4: The author presents a good manuscript.However , I feel the article can be improved by the use a of professional English language editor. for example the last sentence of line 47 and 48 is not grammatically correct.

Secondly would the addition of sensitivity, specifity, positive predictive value and the negative predictive value provide better results for comparison of the two methods better than as currently presented? Would a Kappa test be useful?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Feb 5;20(2):e0318670. doi: 10.1371/journal.pone.0318670.r006

Author response to Decision Letter 2


18 Jan 2025

Response to the comments

Dear Editor and Reviewer:

Thank you for your letter and the reviewer’s comments concerning our manuscript entitled Cross-ViT based benign and malignant classification of pulmonary nodules. Those comments are all valuable and very helpful for revising and improving our paper, we have studied comments carefully and made corrections which we hope to meet with approval. We changed the structure of the article and tweaked the order and structure of some graphs. The responses to the reviewers’ comments are as follows.

Reviewer #3: Major:

- The writing does not adhere to conventional scientific writing format making it difficult and unpleasant to read. I would suggest structuring the article as introduction, methods, results, discussion. There are also many subheadings that do not really have a core message. Additionally, the article consists of 11 figures each with only 1 panel. Can you condense this into less figures with more panels on each figure?

[Response]: Thank you so much for your suggestion. We merged Chapters 1 and 2 and placed the Dataset section at the beginning of Methods. In addition, we optimized some titles. We reduced the number of images by merging some images together. The rearranged sections include: 'Introduction', 'Material and methods', ' Results', and ' Conclusion'.

- Can you describe more about the LUNA16 dataset? In particular, what makes this dataset the gold standard for malignant vs benign pulmonary nodule discrimination? This is crucial for the study as it is what you are using for benchmarking

[Response]: Thank you so much for your suggestion. We modified the description of the LUNA16 dataset. The LUNA16 dataset stands for Lung Nodule Analysis 16. This dataset was introduced in 2016 to develop a CAD system that can automatically detect lung nodules in CT scans.

Minor:

- Table 3 should be presented as a proper figure with graphs. Benchmarking your method against other existing methods is a crucial part of the paper and it should be highlighted accordingly

[Response]: Thank you so much for your suggestion. We plotted Table 3 as a histogram and added Fig 9.

- Can you also perform leave-one-out cross validation in addition to 5 fold cross validation?

[Response]: Thank you so much for your suggestion. The processed data in this paper contains 1004 lung nodules. If leave-one-out cross validation is used, 1004 models need to be trained, which requires a lot of time for image-type data. We believe that 5-fold cross validation is good enough to validate the results of this paper, while leave-one-out cross validation would be difficult to implement.

- Lengthy descriptions of what basic statistical methods you used are unnecessary e.g. five-fold cross validation, detracting from the core message of your research and further contribute to the unpleasant readability of the paper.

[Response]: Thank you so much for your suggestion. We removed the description of some common methods in ‘Evaluation metrics’ and ‘5-Fold cross-validation’.

Reviewer #4: The author presents a good manuscript.However , I feel the article can be improved by the use a of professional English language editor. for example the last sentence of line 47 and 48 is not grammatically correct.

[Response]: Thank you so much for your suggestion. We modified the syntax where you pointed out and checked the syntax of the full paper.

Secondly would the addition of sensitivity, specifity, positive predictive value and the negative predictive value provide better results for comparison of the two methods better than as currently presented? Would a Kappa test be useful?

[Response]: Thank you so much for your suggestion. Sensitivity, specifity, positive predictive value and the negative predictive value are generally not used together with ACC,PRE and REC because these two groups of indicators have similar effects. We add the kappa index in Table II, and the results show good consistency, which also reflects the better recognition effect of the proposed method for positive samples.

Decision Letter 3

Mohammad Amin Fraiwan

21 Jan 2025

Cross-ViT based benign and malignant classification of pulmonary nodules

PONE-D-24-34082R3

Dear Dr. Zhu,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Mohammad Amin Fraiwan

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Mohammad Amin Fraiwan

25 Jan 2025

PONE-D-24-34082R3

PLOS ONE

Dear Dr. Zhu,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Mohammad Amin Fraiwan

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to the comments.docx

    pone.0318670.s001.docx (27.1KB, docx)

    Data Availability Statement

    The data supporting the findings in this paper are available from LIDC-IDRI database(https://www.cancerimagingarchive.net/collection/lidc-idri).


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES