Abstract
The accurate categorization of lung nodules in CT scans is an essential aspect in the prompt detection and diagnosis of lung cancer. The categorization of grade and texture for nodules is particularly significant since it can aid radiologists and clinicians to make better-informed decisions concerning the management of nodules. However, currently existing nodule classification techniques have a singular function of nodule classification and rely on an extensive amount of high-quality annotation data, which does not meet the requirements of clinical practice. To address this issue, we develop an anthropomorphic diagnosis system of pulmonary nodules (PN) based on deep learning (DL) that is trained by weak annotation data and has comparable performance to full-annotation based diagnosis systems. The proposed system uses DL models to classify PNs (benign vs. malignant) with weak annotations, which eliminates the need for time-consuming and labor-intensive manual annotations of PNs. Moreover, the PN classification networks, augmented with handcrafted shape features acquired through the ball-scale transform technique, demonstrate capability to differentiate PNs with diverse labels, including pure ground-glass opacities, part-solid nodules, and solid nodules. Through 5-fold cross-validation on two datasets, the system achieved the following results: (1) an Area Under Curve (AUC) of 0.938 for PN localization and an AUC of 0.912 for PN differential diagnosis on the LIDC-IDRI dataset of 814 testing cases, (2) an AUC of 0.943 for PN localization and an AUC of 0.815 for PN differential diagnosis on the in-house dataset of 822 testing cases. In summary, our system demonstrates efficient localization and differential diagnosis of PNs in a resource limited environment, and thus could be translated into clinical use in the future.
Keywords: Pulmonary Nodules, Detection, Classification, Weak Annotation
1. Introduction
Pulmonary nodules (PN) are opacities with a diameter of less than 30 mm in the lungs (Setio et al., 2017). In the United States, the incidence of SPN is ~1.5 million per year (Gould et al., 2015; Tan et al., 2003). PN can be classified based on their radiologic morphology into three subtypes: pure ground-glass opacity (GGO), part-solid nodules (PSN), and solid nodules (SN) (Chen et al., 2021). PN appears in ~50% of lung computed tomography (CT) scans and can manifest in both benign and malignant diseases (Khan et al., 2019), and the incidence of malignant PN (MPN) is ~40%. Notably, in the best-case scenarios (<20 mm, GGO), the 5-year overall survival is ~95% (Saji et al., 2022), otherwise, the overall survival rate drops to ~45% when MPN progresses to a typical stage-I invasive lung cancer (Shaw et al., 2011), which indicates that definitive and timely diagnosis is crucial for a cure for MPN (Chen et al., 2021). However, differentiation between MPN and benign PN (BPN) remains challenging. While some MPN exhibit characteristic radiological features, such as spiculated margins and inhomogeneous density, certain BPN closely resemble MPN. Therefore, the misdiagnosis rate for MPN remains alarmingly high with a range of ~30% to ~70%, varying with the nodule’s diameter and subtype (Del Ciello et al., 2017). To improve diagnostic accuracy, surveillance imaging at intervals of 3, 6, or 12 months is recommended until a definitive diagnosis is reached. Although Brock University, Mayo Clinic, and ADELAIDE models are applicable for assessing the malignancy risk of PN (Al-Ameri et al., 2015; Gould et al., 2013; Swensen et al., 1997), their accuracy is only fair. The Brock University model focuses on clinical data. The other two models encounter inconsistencies in radiologists’ interpretation of imaging features due to variations in expertise. However, the prolonged interval and extended follow-ups can lead to delayed surgery and increase the risk of metastasis. Moreover, manual diagnosis is prone to cognitive variabilities, making it less efficient and reliable. Although tissue biopsy can aid in imaging-based diagnosis, its use remains controversial due to a lack of consensus on optimal size thresholds and sampling techniques. Limitations such as small nodule size, multiplicity, localization challenges, and sampling biases further restrict the clinical utility of tissue biopsy. Due to these limitations, there is a strong demand for a computer-aided diagnostic system that can assist radiologists in clinical diagnosis of MPN. This system should be designed to automatically catch potential PNs present in CT scans, extract salient features for computing, and subsequently provide accurate diagnoses for MPN (Gu et al., 2021). However, the existing systems rarely encompass all three modules. The challenge arises from two aspects: the model and the data. In the absence of an advanced model or a substantial amount of data for training, computer-aided systems may struggle to provide accurate diagnoses.
Depending on the feature extraction technique used, the PN classification systems can be categorized as by usage of either traditional models or deep learning (DL) models. Various traditional models have been developed based on imaging processing models, handcrafted features (Arai et al., 2012; El-Baz et al., 2013; Xu et al., 2013), and feature-based classifiers (Chen et al., 2010; Gurcan et al., 2002; Suzuki et al., 2005), which require manual delineation of nodules in CT images as the initial input for systems (Way et al., 2006; Zhang et al., 2013). The PN classification systems then extract the shape and texture features from the region of interest (ROI) and classify the nodule label through classifiers. Notably, manual labeling is incapable of efficiently annotating a massive amount of data for training purposes. While methods have been developed to address this limitation (Lee et al., 2010; Madero Orozco et al., 2015), these systems still face the challenging of limited robustness in processing background noise and variations in PN sizes. In brief, traditional models suffer from a lack of automation and poor computing efficiency, resulting in suboptimal accuracy that fails to meet the demands of clinical practice (Mehta et al., 2021; Nibali et al., 2017).
Recently, DL, known for its powerful feature learning capability, has been used to enhance automatic PN classification (Gu et al., 2021; Mastouri et al., 2020). By leveraging an end-to-end learning strategy, DL-based models can automatically extract discriminative features that are highly correlated with the task of PN classification. Moreover, DL-based models exhibit high computational efficiency due to the efficient forward propagation algorithm and the parallel computing framework inherent to DL. These features allow for faster and more efficient processing of PN classification tasks, enabling real-time analysis of large volumes of imaging data. Among DL models, Convolutional Neural Networks (CNN) and Generative Adversarial Networks are widely used (Kang et al., 2017; Nishio et al., 2018; Onishi et al., 2019; Tyagi and Talbar, 2022). The latest Vision Transformer (ViT) has also gained popularity and is increasingly being utilized in PN diagnosis (Dosovitskiy et al., 2020; Khademi et al., 2023; Mkindu et al., 2023; Wang et al., 2022). These models typically involve either of three stages: (1) a nodule detection model in 3D CT scans using object detection CNNs and precise nodule labels, (2) a false-positive reduction model to enhance accuracy of nodule detection, and (3) a nodule classification model for classifying the nodule grade and subtypes (El-Regaily et al., 2020; Ferreira et al., 2018; Ozdemir et al., 2019). The three-stage architecture has shown remarkable improvements in PN detection and classification accuracy. However, the impressive performance of these DL-based models still relies on the availability of large amounts of high-quality and full-annotated data (e.g. Fig. 1a and 1b). Such annotations, which closely align with the nodule shape, necessitate labor-intensive manual labeling on each 2D slice of the nodule. In contrast, the weak-annotated data (e.g. Fig. 1c and 1d), which is easier to handle and more user-friendly to radiologists, can be annotated on only a few 2D slices of the nodule. However, weak-annotated data has been rarely used to train DL-based PN detection models due to the absence of accurate location and size information for PN. Moreover, the existing DL-based systems primarily focus on a singular classification task and lack the ability for hierarchical classification of grading nodules and subtypes. Therefore, these DL-based systems encounter limitations in their clinical translation.
Fig. 1.

Examples of nodule annotation methods. (a) Contour-based annotation. (b) Box-based annotation. (c) Circle-based annotation. (d)
To address these limitations, we developed a CNN-based PN classification system that leverages weak annotations for automated differential diagnosis of PN using axial CT scans. Our training dataset consists of CT scans from the public LIDC-IDRI dataset, comprising 1,018 patients, and another cohort of 2,740 patients with PNs. The dataset is used for training, validation, and testing of the proposed system (Fig. 2). Each nodule was annotated with a location label using only a single slice located at the center of the nodule, a grade label (benign and malignant) and a morphological subtype label (GGO, PSN, and SN), respectively (Fig. 2c). CT images were processed using thresholding and morphological segmentation to detect ROIs and standardize the size of the 3D lung tissue (Fig. 2b). A nodule detection network, i.e., Nodule Center-point based U-Net (NC-UNet), was developed by combining the corner-point concept with the U-Net to accurately localize nodules within the ROI of lung (Fig. 2c) (Ronneberger et al., 2015). Subsequently, to classify the grade and subtype of the nodule in the ROI, two multi-scale deep CNN models with identical architectures, i.e., Ball-scale transform based Multi-scale CNN (BM-CNN), were trained. Notably, the Ball scale (B-scale) transform or filter was used to extract significant shape features from the ROI of nodule (Bagci et al., 2010). The hand-crafted features were then integrated with DL features to enhance the accuracy of nodule classification (Fig. 2d). Overall, our system demonstrated ~97% accuracy in nodule localization and achieves over 80% accuracy in the differential diagnosis of PN. The promising results suggest that our system has the potential for future clinical implementation.
Fig. 2.

The pipeline of the proposed PN classification system. (a) Weak data annotation. The nodule was represented by a red circle. (b) Pre-processing method for the CT scans. (c) Nodule detection network for automatically locating the PN in the CT scans. (d) Nodule classification networks for classifying the grade and subtype labels of nodule.
The initial report of this work was published in the proceedings of the 11th European Workshop on Visual Information Processing. This current paper presents significant enhancements over the conference version, which include: (i) A comprehensive literature review, focused on nodule classification in CT scans. (ii) A comprehensive exposition of our approach, encompassing weak annotation of nodule, pre- and post-processing methods applied to CT images, insights into nodule detection results, and an intricate depiction of the design for both the nodule detection and classification networks. (iii) A comprehensive summary of our experimental findings, featuring illustrative examples of nodule detection, quantitative evaluation metrics, results from ablation studies, and a meticulous comparison with other approaches documented in the existing literature.
The arrangement of the following sections is as follows: In Section 2, we review the relevant literature on weakly supervised DL approaches in medical image processing, as well as the feature fusion models combining DL and hand-crafted features to improve performance. In Section 3, we present a comprehensive overview of our approach, encompassing data acquisition, the PN detection model, and the PN classification network design. In Section 4, we showcase our results, including quantitative metric for nodule detection and classification performance, as well as a comparative analysis between our approach and related studies. Moving to Section 5, we delve into an in-depth examination of the experimental data, discussing both the strengths and limitations of our approach. Finally, our conclusions are summarized in Section 5.
2. Related Works
2.1. Weakly supervised deep learning
Weakly supervised DL methods have gained popularity for training machine learning models in resource limited condition. These methods allow training models with noisy or incomplete annotations obtained from various weak sources, reducing the need for costly and time-consuming fully annotated training data (Shen et al., 2023; Zhang et al., 2021). Weakly supervised learning has been successfully applied in various domains, including medical imaging and microscopy, where pixel-wise and bounding-box annotations are overwhelming and require expert resources (Kandemir and Hamprecht, 2015; Rony et al., 2023). These approaches have shown promising results, achieving recognition accuracy competitive with or surpassing state-of-the-art models trained under full supervision. For example, Anoshina et al. (Anoshina and Sorokin, 2022) proposed a weakly-supervised approach for cell segmentation using tracking annotations and image registration to extend the segmentation training data, Guo et al. (Guo et al., 2022) presented a DL system that uses weak annotation from diagnosis reports to accurately detect multiple head disorders from CT scans. Zhang et al. (Zhang et al., 2023) constructed a contrast-based variational model using point annotations to generate segmentation results, which serve as reliable complementary supervision to train a deep segmentation model for histopathology images. Xie et al. (Zhang et al., 2023) utilized the bounding-box annotations to develop the end-to-end medical image segmentation models, outperforming some state-of-the-art supervised models. These approaches have the potential to be practical alternatives to fully supervised methods, as they can achieve accurate and generalizable performance without the need for large amounts of fully annotated training data.
2.2. Fusion of deep learning features and handcrafted features
The fusion of DL features and handcrafted features has been explored in various domains, improving the performance of DL models. For example, in breast cancer diagnosis, a novel CAD system combines DL features extracted by a CNN with handcrafted features such as Histogram of Oriented Gradients (HOG) and Uniform Local Binary Pattern (ULBP), achieving improved performance in mammography and ultrasound studies (Cruz-Ramos et al., 2023). In the automated diagnosis of chest diseases, a fusion model of handcrafted features and deep CNNs achieves high accuracy in classifying different chest diseases using chest X-rays (Malik et al., 2023). In another study (Mendes and Krohling, 2022), the authors investigated the combination of features from CNN, handcrafted features, and patient clinical information for skin cancer diagnosis, efficiently improving the classification accuracy. Similarly, a hybrid feature extraction approach was proposed for an efficient COVID-19 classification system using two types of features, i.e., DL and handcrafted features, which performed better in terms of accuracy and turnaround time (Habib et al., 2022). Especially, the global (high-level) features are more easily acquired from human expert (natural intelligence) and tend to have better generalization performance, which might be very challenging for DL as it predominantly focuses on local patches (Bento et al., 2022; Udupa et al., 2022). In brief, the combination of local and global information of images can enhance the representation and performance of image analysis tasks (Kabbai et al., 2019). Motivated by above studies, we integrated the Ball scale features into the DL models to improve the PN grade and subtype classification accuracy.
3. Method
3.1. Dataset and Weak Annotation
In this study, we utilized two lung CT datasets to evaluate the performance of our method:
(1). LIDC-IDRI dataset:
Clinical chest CT scans with lung nodules from 1,018 patients were annotated by experienced radiologists, including the location of lung nodules, nodule malignancy score ranges from 1 to 5, nodule texture score ranges from 1 to 5, and nodule size range from 3 mm to 30 mm. There were 2,632 nodule cases in the dataset and each nodule in scans was represented by the contour and bounding-box (Table 1). To generate the circle and point annotations for nodules, we calculated the coordinates of center point of bounding-box in the middle slice of the nodule as the point annotation, and draw a circle on the nodule center point with a diameter of the length of diagonal of bounding-box. To simulate the relaxed annotation behavior, we added a random value range from 0~5 mm to the diameter of the circle. Follow the previous research work (Riquelme and Akhloufi, 2020; Shen et al., 2019; Xie et al., 2018), we computed the average malignancy score (AMS) of each nodule and labeled a nodule whose AMS < 3 as benign, a nodule whose AMS > 3 as malignant, and a nodule whose AMS = 3 as uncertain. Note that the nodule cases of uncertain class would be removed from the dataset when training the nodule grade classification models. Similarly, we calculated the average texture score (ATS) of each nodule and labeled a nodule whose ATS < 3 as GGO, a nodule whose ATS > 3 as SN, and a nodule whose ATS = 3 as PSN.
Table 1.
Summary of the public LIDC-IDRI dataset
| Benign | Malignant | Uncertain | Total | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GGO | PSN | SN | GGO | PSN | SN | GGO | PSN | SN | ||
| Training set | 62 | 744 | 0 | 44 | 0 | 348 | 68 | 311 | 0 | 1577 |
| Validation set | 10 | 124 | 0 | 7 | 0 | 58 | 11 | 52 | 0 | 262 |
| Testing set | 32 | 374 | 0 | 22 | 0 | 174 | 35 | 156 | 0 | 793 |
| Total | 104 | 1242 | 0 | 73 | 0 | 580 | 114 | 519 | 0 | 2632 |
(2). In-house dataset:
Axial CT images (695×695×5-46) from 2,740 subjects were collected and weak-annotated (Table 2). We utilized a straightforward, effective, and user-friendly annotation approach in which each nodule in the CT image was marked with a 2D red circle in the middle slice of the nodule, without strict delineating the exact boundary or bounding-box (tight rectangular in 2D space) of the nodule. Additionally, we did not assign strict regulations governing the annotation behavior of radiologists. Our new strategy can enhance the efficiency of data annotation and facilitate the creation of a more authentic and varied experimental dataset. In addition, we labeled each nodule with a grade class (benign and malignant) and a morphological subtype class (GGO, PSN, and SN).
Table 2.
Summary of the in-house CT dataset
| Benign | Malignant | Total | |||||
|---|---|---|---|---|---|---|---|
| GGO | PSN | SN | GGO | PSN | SN | ||
| Training set | 61 | 63 | 240 | 470 | 517 | 293 | 1644 |
| Validation set | 10 | 11 | 40 | 78 | 86 | 49 | 274 |
| Testing set | 30 | 31 | 120 | 235 | 259 | 147 | 822 |
| Total | 101 | 105 | 400 | 783 | 862 | 489 | 2740 |
The nodule cases in above datasets were divided into training, validation, and testing sets according to a ratio of 6:1:3. We applied four data augmentation approaches of annotation position drift, contrast adjustment, image rotation, and image resizing on training sets, which resulted in a training dataset consisting of a total of 25,208 CT images on the LIDC-IDRI dataset and 26,304 CT images on the In-house dataset.
3.2. Pre-processing for Lung CT Images
The tissue surrounding the lungs produces abundant unnecessary computation. To solve this problem, we designed pre-processing operations including: i) segmenting the low-intensity regions using simple thresholding for lung tissue which leads to a separation of the of background pixels, and lung object pixels, ii) searching the two regions, excluding the background region and highlighting remaining lung object regions as the lung mask, and finally iii) calculating rectangular boundaries of lung masks and extracting the ROIs of the lung images, which were then resized to 448×448×16 via a linear interpolation and used as the input of deep CNN model for nodule localization (Fig. 2b).
3.3. Nodule Detection Network
In this study, we only had the rough location of the nodule from the weak annotations indicated by a single red circle of suitable size at the center 2D slice of the nodule (Fig. 2b) to represent the coarse spatial information of a nodule all done to save annotation time in practice. For this data, the red circle annotations were often larger than the center nodule, and the center of the circle is often not the accurate center-point of a nodule. Thus, the common bounding-box based object detection methods cannot be directly suitable for this dataset such as the Faster R-CNN and YOLOv3 detection methods (Liu et al., 2020; Su et al., 2021). To address this problem, instead of only detecting the unique center of a nodule, we designed a center-point (of a nodule) region based object detection network, for which we counted the center of the red circle and the horizontal and vertical distance offset between the top-left and bottom-right points of circle annotation and the center region () together as the ground truth labels. The architecture of the proposed CNN (Fig. 3) is based on U-Net and VGG-19 (Ronneberger et al., 2015; Simonyan and Zisserman, 2015).
Fig. 3.

The structure of the proposed PN detection network.
The proposed network includes four modules: i) the backbone network (modified VGG19 (Simonyan and Zisserman, 2014)) for learning the multi-scale and level feature from the input image, including 16 convolutional layers with kernel size of 3✕3, stride of 2, and ReLU activation function, 4 max-pooling layers with kernel size of 2✕2 and stride of 2, 5 batch-normalization layers, and 3 dropout layers with the dropout ratio of 0.6, ii) the feature fusion network for integrating the multi-scale feature maps with size 448✕448✕32, 224✕224✕32, 112✕128✕32, 56✕56✕32, and 28✕28✕32 together as a feature pyramid, including 4 up-sampling layer based on linear interpolation method with the scale factor of 2, 8 convolutional layers with kernel size of 3x3 and stride of 1, 4 convolutional layers with kernel size of 1x1, stride of 2 and output channel number of 32, and 4 concatenation layers based on channel stacking operation, and iii) the pixel-wise classifier for predicting the probability of each pixel attributed to nodule center-point category, including 1 channel attention module (CAM) (Hu et al., 2018), 2 convolutional layers with kernel size of 3✕3 and stride of 1, 2 dropout layers with ratio of 0.6, and 1 soft-max function, and iv) the pixel-wise regressor for predicting the distance between each pixel and top-left and bottom-right points, with the same structure of pixel-wise classifier but without soft-max function.
To train this model, we defined the loss function by combing the cross-entropy with L2-norm loss functions as shown below:
| (1) |
where and represent the ground truth of the center-point label (with 1 for being a center point and 0 for not being a center point) and distance offset value (in pixel) at pixel . represents the possibility of pixel classified as ground truth denotes the predicted distance offset at pixel and represent the parameters of the network and image domains, is the total number of pixels, and are the L1-norm and L2-norm, and and serve as trade-off parameters among the three terms. In our experiments, we calculated the center coordinate of the manual annotation (red circle) and created a circle region with a radius of 5 pixels around the center coordinate as the ground truth of center-points. In addition, we computed the Euclidean distance between the coordinate of the top-left and bottom-right points of the manual annotations, as well as each pixel in the image as the ground truth for distance offset values.
3.4. Post-processing for Nodule Detection
The nodule center-point pixel-wise classification result by our method can be used to calculate the coarse coordinate of the nodule. However, it is difficult to estimate the size of nodule from the center-point segmentation result. To improve the nodule detection performance, we proposed a post-processing method by fusing the center-point and distance offset information. Given the nodule center-point pixel-wise classification probability map and the distance offset map , our model would search the local-maximum point in as the center-point of nodule and compute the coordinate of top-left and bottom-right points of nodule ROI by adding the coordinate of center-point with the offset value as following:
| (2) |
where represents the distance scale factor with a constant value. Then, we employed the coordinate of top-left and bottom-right points to extract the nodule ROI from the lung ROI images. However, the probability map often presents with numerous local-maximum points, resulting in an escalation of the false positive rate for nodule detection outcomes. Consequently, to address this issue, we opted to utilize a high threshold value of 0.9, which would aid in the filtration of certain candidate nodules that may have a low probability. This approach was implemented with the intention of improving the overall accuracy and reliability of the nodule detection process.
3.5. Nodule Classification Network
We proposed a nodule classification network using the VGG-19 network (Simonyan and Zisserman, 2015) and the B-scale feature map (Fig. 4), with 2 modules: i) the backbone network for extracting the DL feature from the multi-scale nodule ROIs based on the nodule detection result and the B-scale feature map of the ROI, including 18 convolutional layers with kernel size of 3×3, stride of 2, and ReLU activation function, 6 max-pooling layers with kernel size of 2×2 and stride of 2, 6 batch-normalization layers, 6 dropout layers with the dropout ratio of 0.6, 1 concatenation layer, 2 CAM modules, and 1 flatten layer, ii) the nodule grade classifier using 2 fully connected layers, 1 dropout layer, and 1 soft-max function, and iii) the nodule sub-type classifier using 2 fully connected layers, 1 dropout layer, and 1 soft-max function. The nodule classification network was built based on VGG-19 network by adding additional the normalization layer and channel attention module into the network to improve the feature learning ability. We utilized the fully connected layer and soft-max function by predicting the possibility of the input nodule ROI belonging to different nodule grades and sub-type labels. To solve the class-imbalance problem in training data, we utilized the focal loss (Lin et al., 2017) to define the loss function of the nodule classification process as below:
| (3) |
where represents the ground truth label of image , denotes the possibility value of image as ground truth label and are the category weight value and modulating factor, is the total number of samples, and represents the parameters of the nodule classification network.
Fig. 4.

The structure of the proposed PN classification network.
By analyzing the experimental results, we found that the nodule ROI size has an obvious effect on nodule classification accuracy. In general, the predicted classification possibility value of nodule varies with ROI size. To improve the nodule classification performance, we trained 5 nodule classification networks using 5 different ROI sizes, including 192×192×16, 160×160×16, 128×128×16, 96×96×16, and 64×64×16, and fused the predicted classification possibility by the soft-voting method as shown below:
| (4) |
where and represent the output classification probability of the -th classifier and multi-scale classifier for the image , and denotes the weighted value of the -th classifier. The weight value can be calculated as follows:
| (5) |
where represents the AUC (Area Under Curve) value of the -th classifier on the validation data set.
3.6. Model training
In this study, we utilized the open source library TensorFlow to implement the proposed nodule detection and classification models and the training data in Table 1 and 2 to train models. The hyper-parameters for optimizing the models were as follows: learning rate (0.00001), batch size of training data (50), and number of iterations (500). In addition, our proposed approach is much flexible and allow to utilize different deep learning tools, and including U-Net (Ronneberger et al., 2015), Nodulenet (Tang et al., 2019), SANet (Mei et al., 2021), MSANet (Guo et al., 2021), AlexNet (Krizhevsky et al., 2017), VGG-19 (Simonyan and Zisserman, 2015), ResNet-18 (He et al., 2016), and ViT (Dosovitskiy et al., 2020). To compare the performance of PN detection methods using different annotations, we utilized the contour annotations from the LIDC-IDRI dataset to train U-Net, the bounding-box annotations from the LIDC-IDRI dataset to train Nodulenet, SANet, and MSANet, the circle annotations from two datasets to train Nodulenet, SANet, MSANet, and our model, and the point annotations from two datasets to train our model. When using the circle annotations to train the Nodulenet, SANet, and MSANet, we calculated the coordinate and size of circle to generate the synthetic bounding-box annotations. Besides, we conducted ablation experiments for proving the effeteness of proposed method.
4. Experiments and Results
4.1. Evaluation metrics
We employed our model and comparison algorithms to detect and classify the nodule on the intendent testing set for evaluating the performance of all models. The metrics to quantitatively analyze the nodule detection performance are as following:
| (6) |
| (7) |
where TP, FP, and FN represent the true positive, false positive, and false negative results, respectively. In our experiments, when the center-point of the detected nodule locates in the red circle, it would be viewed as a true positive one otherwise a false positive one. Similarly, when a manual annotated nodule is not detected by the automatic method, it would be viewed as one false negative one.
To quantitatively analyze the nodule classification performance, we computed the classification accuracy (Acc) as below:
| (8) |
Besides, we changed the classification thresholding value from 0.05 to 0.95 and computed the true positive rate (TPR) and false positive rate (FPR). In this study, the ROC curve of nodule detection and classification performance is illustrated, in which the horizontal and vertical axis are the Precision and Recall values for nodule detection models and TPR and FPR values for nodule classification models. In addition, we computed the area under the curve (AUC) of ROC as the evaluation metric.
4.2. PN Detection results
Our NC-UNet model, Nodulenet model (Tang et al., 2019), SANet model (Mei et al., 2021), and MSANet (Guo et al., 2021) were compared to evaluate their respective performance in nodule detection on LIDC-IDRI and In-house datasets, as shown in Table 3. Our model trained by circle annotations obtained high precision values of 0.929 and 0.936, recall values of 0.932 and 0.959, and AUC values of 0.938 and 0.943 on LIDC-IDRI dataset, which are comparative to the SANet and MSANet models trained by bounding-box annotations. The low standard deviation in these metrics indicates that the proposed approach has good stability and robustness. The Precision-Recall (P-R) curve demonstrated that our model exhibited significantly larger AUC values of 0.932 and 0.964 on LIDC dataset and In-house dataset, respectively, compared to the other three models using circle annotations, indicating our model has a lower false-positive rate and false-negative rate in PN detection (Fig. 5). In addition, it is difficult for U-Net model trained by contour annotations to detect the nodules with weak edge and small size. Though the NC-UNet model trained by point annotations achieved good precision and recall values, the detected points have an obvious distance offset with the ground truth and fail to provide the size information of the nodules, which is unsuitable for the nodule classification task.
Table 3.
Nodule detection performance on LIDC and In-house dataset. Each value is represented as “mean±standard deviation”.
| Annotation | Method | LIDC-IDRI dataset (793 testing cases) | In-house dataset (822 testing cases) | ||||
|---|---|---|---|---|---|---|---|
| Precision | Recall | PR-AUC | Precision | Recall | PR-AUC | ||
| Contour annotation | U-Net | 0.893±0.058 | 0.907±0.041 | 0.915±0.033 | - | - | - |
| Bounding-box annotation | Nodulenet | 0.945±0.049 | 0.965±0.045 | 0.945±0.037 | - | - | - |
| SANet | 0.974±0.042 | 0.951±0.053 | 0.937±0.039 | - | - | - | |
| MSANet | 0.956±0.045 | 0.956±0.037 | 0.952±0.036 | - | - | - | |
| Circle annotation | Nodulenet | 0.786±0.085 | 0.824±0.083 | 0.833±0.057 | 0.545±0.089 | 0.775±0.065 | 0.687±0.072 |
| SANet | 0.823±0.090 | 0.887±0.084 | 0.862±0.059 | 0.652±0.095 | 0.864±0.084 | 0.726±0.067 | |
| MSANet | 0.854±0.067 | 0.868±0.058 | 0.875±0.048 | 0.76±0.083 | 0.894±0.068 | 0.804±0.058 | |
| NC-UNet | 0.929±0.058 | 0.932±0.049 | 0.938±0.046 | 0.936±0.046 | 0.959±0.041 | 0.943±0.035 | |
| Point annotation | NC-UNet | 0.855±0.057 | 0.914±0.054 | 0.884±0.065 | 0.837±0.064 | 0.853±0.053 | 0.834±0.042 |
Fig. 5.

P-R curve of nodule detection performance for the proposed and comparative models. (a) LIDC-IDRI dataset. (b) Private dataset.
The visualized nodule detection results on LIDC-IDRI and in-house datasets show the excellent agreement between our model and the manual ground truth (Fig. 6 and 7). By observing the results on LIDC-IDRI dataset, we found that the location and size information of detected nodules for the supervised models trained by bounding-box annotations is more accurate than the models trained by circle annotations, but the difference is not obvious. In particular, the U-Net failed to detect the contour the nodule with blurring boundary. In addition, the nodule detection results on the In-house dataset demonstrate that our model outperforms other supervised models in terms of nodule detection when using weak annotations to train models. Conversely, both the Nodulenet, SANet, and MSANet models, which are dependent on high-quality data, need to address concerns regarding false positives.
Fig. 6.

Examples of nodule detection on LIDC-IDRI dataset. (a) Bounding-box annotations. (b) nodule segmentation results using U-Net. (c)-(e) nodule detection results using Nodulenet, SANet, and MSANet trained by bounding-box annotations. (f)-(i) nodule detection results using Nodulenet, SANet, MSANet, and NC-UNet trained by circle annotations. (j) nodule detection results using NC-UNet trained by point annotations.
Fig. 7.

Examples of nodule detection on private dataset. (a) Circle annotations. (b)-(e) nodule detection results using Nodulenet, SANet, MSANet, and NC-UNet trained by circle annotations. (f) nodule detection results using NC-UNet trained by point annotations.
4.3. PN Classification Results
We conducted a comparison of the PN grade performance between our BM-CNN model and other models, including AlexNet (Krizhevsky et al., 2017), VGG-19 (Simonyan and Zisserman, 2015), ResNet-18 (He et al., 2016), and ViT (Dosovitskiy et al., 2020). The ROC curves for nodule classification (Fig. 8a and 8b) on LIDC-IDRI and In-house datasets demonstrate that our model has better nodule classification performance on both two datasets compared with other supervised models. To analyze the difference in PN grade classification performance among different subgroups, we conducted a comparative experiment, in which we utilized the trained nodule grading model to predict the PN grade within the GGO, PSN, and SN subtypes (Fig. 8c and 8d). We found that the accuracy of PN grade classification on PSN and SN subtypes for the LIDC-IDRI dataset is better than the GGO subtype, while the difference on the In-house dataset is small. The main reason is that the PSN and SN nodules separately existed in the benign and malignant nodules of the LIDC-IDRI dataset, leading to a high relationship between PSN and benign and a high relationship between SN and malignant. In contrast, the benign and malignant nodules of In-house dataset uniformly distributed in three subtypes without strong specificity. Therefore, the difference of data annotation among training datasets has a serious effect on the performance evaluation of models.
Fig. 8.

Nodule grade classification performance of proposed system. (a) ROC curve for nodule grade classification on LIDC-IDRI dataset. (b) ROC curve for nodule grade classification on Private dataset. (c) ROC curve for nodule grade classification in GGO, PSN, and SN subgroups on LIDC-IDRI dataset. (d) ROC curve for nodule grade classification in GGO, PSN, and SN subgroups on In-house dataset.
As shown in Table 4, our model achieved the highest accuracy values of 0.927 and 0.768 and highest AUC values of 0.912 and 0.815 on LIDC-IDRI and In-house datasets, indicating a low false positive rate and a high true positive rate in nodule grade classification on both two datasets. While other models achieved higher accuracy and AUC values (Al-Shabi et al., 2022; Ferreira et al., 2018; Fu et al., 2022; Harsono et al., 2022; Jiang et al., 2021; Ozdemir et al., 2019; Xie et al., 2018), we need to interpret these reported values cautiously (Table. 5). Factors including the size of dataset, class imbalance problems, and the quality and variability of manual annotation all affect the final classification results. Briefly, the previous studies we reviewed had relatively smaller datasets, especially in terms of their testing datasets, which were significantly smaller than ours (Al-Shabi et al., 2022; Ferreira et al., 2018; Fu et al., 2022; Harsono et al., 2022; Jiang et al., 2021; Xie et al., 2018). In most of these studies, the training datasets were 8-10 times larger than the testing datasets (Al-Shabi et al., 2022; Ferreira et al., 2018; Fu et al., 2022; Jiang et al., 2021; Xie et al., 2018), whereas the training datasets was 2 times larger than the testing datasets in our study. Moreover, previous studies had a more balanced MPN:BPN ratio of ~1-2 (Al-Shabi et al., 2022; Fu et al., 2022; Harsono et al., 2022; Jiang et al., 2021; Xie et al., 2018), in contrast the ratio in our study was ~4. Additionally, these previous studies always used accurate location label and/or nodule contours for their training 32, while we only used weak annotations in this study. Nevertheless, the AUC of 0.912 and 0.815 achieved by our approach is comparable to others. The weak annotations in our approach can tolerate more deviations of localization in practical labeling operations, and thus facilitate the robustness of the system (Gu et al., 2021). Accordingly, in potential clinical implementation of our model, radiologists will have a significantly reduced burden in labeling the CT images.
Table 4.
Nodule grade classification performance. Each value is represented as “mean±standard deviation”.
| Method | LIDC-IDRI dataset | In-house dataset | ||||||
|---|---|---|---|---|---|---|---|---|
| Acc | TPR | FPR | ROC-AUC | Acc | TPR | FPR | ROC-AUC | |
| AlexNet | 0.746±0.064 | 0.694±0.044 | 0.217±0.029 | 0.712±0.063 | 0.635±0.045 | 0.567±0.061 | 0.334±0.056 | 0.672±0.052 |
| VGG-19 | 0.782±0.050 | 0.763±0.032 | 0.124±0.015 | 0.816±0.054 | 0.667±0.050 | 0.612±0.040 | 0.312±0.046 | 0.722±0.077 |
| ResNet-18 | 0.854±0.056 | 0.818±0.039 | 0.113±0.012 | 0.845±0.073 | 0.737±0.072 | 0.635±0.059 | 0.323±0.038 | 0.707±0.046 |
| ViT | 0.819±0.062 | 0.782±0.048 | 0.157±0.018 | 0.798±0.057 | 0.714±0.047 | 0.542±0.035 | 0.267±0.034 | 0.694±0.055 |
| BM-CNN | 0.927±0.039 | 0.898±0.032 | 0.055±0.022 | 0.912±0.037 | 0.768±0.054 | 0.753±0.036 | 0.275±0.029 | 0.815±0.043 |
Table 5.
Comparison with other models
| Approach | Data Type, File Format, and Dataset Name | Number of Training/Testing Cases and Label Imbalance | Nodule Annotation Method | Nodule Classification Performance |
|---|---|---|---|---|
| Ref (Ozdemir et al., 2019) | Chest CT, DICOM, LUNA16 [1] and 2017 Data Science Bowl on Kaggle datasets | 2101/506 (Unknown) | i) accurate bounding-box and contour; ii) grade label (“Malignant” vs. “Benign”) |
grade classification: AUC=0.87 |
| Ref (Ferreira et al., 2019) | Chest CT, DICOM, LIDC-IDRI dataset [46] | 2334/334 (GGO: PSN: SN = 1.37: 1 : 11.04) | i) accurate bounding-box; ii) subtype label (“GGO”, “PSN”, and “SN”) |
subtype classification: Acc = 0.833 |
| Ref (Xie et al., 2018) | Chest CT, DICOM, LIDC-IDRI dataset | 1750/195 (Malignant: Benign = 1: 2.02) | i) accurate bounding-box ii) grade label (“Malignant” vs. “Benign”) |
grade classification: AUC=0.96, Acc=0.916 |
| Ref (Fu et al., 2022) | Chest CT, DICOM, LIDC-IDRI dataset | 1484/165 (Malignant: Benign = 1: 2.04) | i) accurate bounding-box ii) grade label (“Malignant” vs. “Benign”) |
grade classification: AUC=0.836, Acc=0.871 |
| Chest CT, DICOM, LCID dataset | 1622/180 (Malignant: Benign = 1.68: 1) | grade classification: AUC=0.968, Acc=0.932 | ||
| Ref (Al-Shabi et al., 2022) | Chest CT, DICOM, LIDC-IDRI dataset | 763/85 (Malignant: Benign = 1: 1.04) | i) accurate bounding-box ii) grade label (“Malignant” vs. “Benign”) |
grade classification: AUC=0.981, Acc=0.953 |
| Ref (Jiang et al., 2021) | Chest CT, DICOM, LIDC-IDRI dataset | 900/100 (Malignant: Benign = 1.22: 1) | i) accurate bounding-box ii) grade label (“Malignant” vs. “Benign”) |
grade classification: Acc=0.908 |
| Ref (Harsono et al., 2022) | Chest CT, DICOM, LIDC-IDRI dataset | 605/202 (Unknown) | i) accurate bounding-box ii) grade label (“Malignant” vs. “Benign”) |
grade classification: AUC=0.818 |
| Our model | Chest CT, DICOM, LIDC-IDRI dataset | 1577/793 (Malignant: Benign = 0.49: 1, GGO: PSN: SN = 1:6.05:1.99) | i) coarse location ii) grade label (“Malignant” vs. “Benign”) iii) subtype label (“GGO” vs. “PSN” vs. “SN”) |
i) grade classification: AUC=0.912±0.037, Acc=0.927±0.039, ii) subtype classification: AUC_GGO=0.912±0.038, AUC_PSN=0.884±0.046, AUC_SN=0.879±0.052, Acc=0.868±0.032. |
| Chest CT without tissue density value in Hounsfield units (HUs), JPEG, In-house dataset | 1644/822 (Malignant: Benign = 3.56: 1, GGO: PSN: SN = 1:1.09:1) | i) grade classification: AUC=0.815±0.043, Acc=0.768±0.054, ii) subtype classification: AUC_GGO=0.968±0.024, AUC_PSN=0.932±0.035, AUC_SN=0.953±0.037, Acc=0.885±0.037. |
AUC_GGO, AUC_PSN, and AUC_SN: AUC values for GGO, PSN, and SN classification models
Our model can also predict the morphological subtypes of PN, achieving an overall accuracy of 0.893 and 0.872 on LIDC-IDRI and In-house datasets, respectively. It was the first attempt that hierarchical classification, i.e., nodule grading and subtype classification, was achieved in a single model. The confusion matrix for subtype classification demonstrated that our model achieved a high true-positive rate and a low false-positive rate for all morphological subtypes. The Receiver Operating Characteristic (ROC) curve further revealed that the high accuracy of our model in predicting the morphological subtypes of PN (Fig. 9a and 9d). It is noteworthy that distinct differences in the ROC curves were observed between grade classification and subtype classification. This distinction could be attributed to the complexity of classification. The subtype classifications may rely on morphology-related factors, such as the edge and shape information of the nodule. In contrast, nodule grade classification is much more complex, and tissue density, local textures, and other factors can influence the accuracy. Next, a comparative experiment was performed to assess the PN subtype classification performance among different subgroups. We used the model to classify the subtypes within BPN subgroup (Fig. 9b and 9e) and MPN subgroup (Fig. 9c and 9f). Our findings revealed that the grade classification for GGO outperformed the other two subtypes, which indicated that GGO is easier identified than others within each grade. The subtype classification for MPN also outperformed that for BPN.
Fig. 9.

Nodule subtype classification performance of proposed system. (a) ROC curve for nodule subtype classification on LIDC-IDRI dataset. (b)-(c) ROC curve for nodule subtype classification on the benign and malignant groups of LIDC-IDRI dataset. (d) ROC curve for nodule subtype classification on Private dataset. (e)-(f) ROC curve for nodule subtype classification on the benign and malignant groups of In-house dataset.
Additionally, we conducted an analysis of PN classification performance across nodules with different diameters in our study. Each annotated nodule was measured for its diameter, and we divided the nodules into three subgroups, i.e., small nodules (<10 mm), medium nodules (10-20 mm), and large nodules (20-30 mm). We then used our model to classify the grade (Table 6 and Fig. 10) and morphological subtype (Table 7 and Fig. 11) of nodules in the testing datasets. Our model demonstrated a remarkable grade classification accuracy of 0.879 and 0.712 specifically for small nodules on LIDC-IDRI and In-house datasets, surpassing the average ~ 30% accuracy of manual nodule classification (Saji et al., 2022). Moreover, our model achieved a high accuracy ranging from 0.845 to 0.908 for the small nodules with GGO, PSN or SN on In-house dataset. Furthermore, we observed a consistent improvement in nodule classification accuracy as the nodule diameter increased. This is likely due to that larger nodules contain more intricate contour and texture information compared to small nodules. This observation suggests that enhancing the clarity of the nodule could enhance nodule classification performance.
Table 6.
Nodule grade classification performance in nodules with different diameters. Each value is represented as “mean±standard deviation”.
| Nodule Diameter (dn) | LIDC-IDRI dataset | In-house dataset | ||||||
|---|---|---|---|---|---|---|---|---|
| Acc | TPR | FPR | AUC | Acc | TPR | FPR | AUC | |
| <10 mm | 0.879±0.042 | 0.885±0.038 | 0.292±0.024 | 0.864±0.023 | 0.712±0.045 | 0.695±0.041 | 0.313±0.038 | 0.778±0.039 |
| 10-20 mm | 0.908±0.039 | 0.905±0.032 | 0.242±0.025 | 0.895±0.037 | 0.763±0.054 | 0.753±0.036 | 0.275±0.029 | 0.807±0.043 |
| 20-30 mm | 0.923±0.045 | 0.935±0.036 | 0.257±0.026 | 0.933±0.030 | 0.824±0.042 | 0.825±0.030 | 0.352±0.025 | 0.820±0.034 |
Fig. 10.

Nodule grade classification performance of proposed system in PN with different diameters. (a) ROC curve on LIDC-IDRI dataset. (b) ROC curve on In-house dataset.
Table 7.
Nodule subtype classification performance in nodules with different diameters. Each value is represented as “mean±standard deviation”.
| Nodule Diameter (dn) | Nodule Subtypes | LIDC-IDRI dataset | In-house dataset | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Acc | TPR | FPR | AUC | Acc | TPR | FPR | AUC | ||
| <10 mm | GGO | 0.842±0.036 | 0.889±0.038 | 0.306±0.024 | 0.882±0.035 | 0.845±0.032 | 0.968±0.015 | 0.085±0.013 | 0.978±0.025 |
| PSN | 0.835±0.047 | 0.871±0.054 | 0.326±0.036 | 0.863±0.042 | 0.864±0.037 | 0.862±0.024 | 0.167±0.024 | 0.917±0.034 | |
| SN | 0.907±0.025 | 0.939±0.047 | 0.352±0.016 | 0.854±0.061 | 0.908±0.048 | 0.887±0.027 | 0.128±0.029 | 0.956±0.042 | |
| 10-20 mm | GGO | 0.874±0.045 | 0.879±0.062 | 0.350±0.014 | 0.879±0.072 | 0.905±0.023 | 0.932±0.038 | 0.096±0.016 | 0.968±038 |
| PSN | 0.866±0.052 | 0.929±0.027 | 0.286±0.017 | 0.857±0.038 | 0.851±0.042 | 0.882±0.034 | 0.163±0.032 | 0.945±0.34 | |
| SN | 0.903±0.018 | 0.924±0.023 | 0.305±0.021 | 0.886±0.049 | 0.912±0.052 | 0.945±0.042 | 0.045±0.012 | 0.972±0.028 | |
| 20-30 mm | GGO | 0.912±0.014 | 0.916±0.017 | 0.143±0.009 | 0.957±0.031 | 0.972±0.027 | 0.987±0.038 | 0.042±0.010 | 0.985±0.016 |
| PSN | 0.921±0.024 | 0.929±0.019 | 0.286±0.015 | 0.928±0.042 | 0.863±0.037 | 0.872±0.024 | 0.182±0.024 | 0.928±0.034 | |
| SN | 0.935±0.029 | 0.938±0.022 | 0.346±0.032 | 0.903±0.017 | 0.917±0.019 | 0.923±0.055 | 0.041±0.009 | 0.985±0.012 | |
Fig. 11.

Nodule subtype classification performance of proposed system in PN with different diameters. (a)-(c) ROC curves for small, medium, and large nodules on LIDC-IDRI dataset. (d)-(f) ROC curves for small, medium, and large nodules on In-house dataset.
4.4. Ablation study
B-scale as a novel image feature plays a crucial role in establishing a relationship between the segmented training objects and their corresponding images (Bagci et al., 2010). In our approach, we used this feature to incorporate morphology information into the object information. This was achieved by weighting the intensity value of each voxel based on the radius of the largest ball of homogeneous intensity surrounding it. The incorporation of the B-scale feature in our PN classification network served as prior knowledge, allowing us to enhance the classification accuracy by considering the divergence in nodule size, shape, morphologic characteristics, and texture among different grades. We examined the distribution of B-scale values across different subgroups and conducted statistical analyses to determine the significance (Fig. 12a). We observed a significant divergence in the distribution of B-scale values among the different grade and morphologic subtypes. This finding suggested that the B-scale value of PN in CT images has the potential to serve as a discriminative feature for both PN grade and morphological classification. We conducted a further investigation into the impact of the B-scale feature and soft-voting methods on PN grade and morphological subtype classification. (Fig. 12b and Table 8). We found that both can improve the grade and morphological subtype classification performance. Specifically, the soft-voting method incorporated the parameter of nodule ROI size, which was found to be a critical factor in influencing the accuracy of PN classification. To determine the optimal ROI size, we trained multiple PN classification networks using various ROI size ranging from 32 to 224. The soft-voting method was then used to combine the classification results obtained from these networks. The experimental data provided strong evidence that the multi-scale nodule classification method yielded the highest AUC value of 0.815 when the ROI size was set to [64, 96, 128, 160, 192]. However, the AUC value may decrease as more nodule classification results of different scales are fused. This is because larger nodule ROI sizes may contain redundant information that does not contribute significantly to the classification task. Therefore, finding the optimal balance between capturing relevant features and avoiding information redundancy is crucial for achieving the highest classification performance.
Fig. 12.

Ablation study on B-scale feature and soft-voting for nodule classification. (a) Boxplots for B-scale value distribution in different sub-groups. (b) ROC curve of ablation experiments for nodule grade classification.
Table 8.
Ablation study for nodule grade and subtype classification method on In-house dataset. Each value is represented as “mean±standard deviation”.
| Experiments | B-scale feature | Soft-voting of multi-scale classifiers | Grade classification | GGO Classification | PSN Classification | SN Classification | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Acc | AUC | Acc | AUC | Acc | AUC | Acc | AUC | |||
| E1 | ✕ | ✕ | 0.708±0.056 | 0.712±0.042 | 0.852±0.052 | 0.907±0.047 | 0.794±0.062 | 0.884±0.058 | 0.877±0.031 | 0.913±0.038 |
| E2 | √ | ✕ | 0.739±0.062 | 0.735±0.036 | 0.875±0.038 | 0.924±0.023 | 0.828±0.032 | 0.893±0.046 | 0.896±0.027 | 0.937±0.029 |
| E3 | √ | ROI size = [64,96,128] | 0.747±0.051 | 0.768±0.039 | 0.892±0.049 | 0.951±0.034 | 0.847±0.043 | 0.906±0.037 | 0.912±0.026 | 0.942±0.024 |
| E4 | √ | ROI size = [64,96,128,160,192] | 0.768±0.054 | 0.815±0.043 | 0.904±0.035 | 0.972±0.047 | 0.855±0.052 | 0.929±0.039 | 0.926±0.027 | 0.968±0.019 |
| E5 | √ | ROI size = [32,64,96,128,160,192,224] | 0.746±0.065 | 0.786±0.048 | 0.909±0.042 | 0.978±0.051 | 0.858±0.062 | 0.934±0.058 | 0.0921±0.035 | 0.959±0.026 |
4.5. Tradeoff analysis of model performance and annotation time
As shown in Table 9, we compared four annotation methods regarding annotation time, nodule detection accuracy, and nodule classification accuracy. We utilized the online annotation tool ImgLab1 to annotate 10 CT images (including 45 nodules) and time the annotation procedure. Note that the annotation time indicates the total operation time using the annotation tool, excluding the observing time. In particular, to train the nodule classification models using contour annotations, we transformed the nodule contour detection results into ROI of nodule by computing the width and height of nodule contours. Similarly, we utilized the detected nodule points as the center point of nodules and drew a square with a side length of 10 mm as the nodule ROIs. We found that the PN detection and classification models using circle annotations have approximate accuracy and AUC values with the models using the full annotations, reducing around 30%~80% of annotation time. In addition, the PN detection and classification models using point annotations have the worst performance due to that the models are unable to learn the nodule size information from the point annotations. Therefore, the circle annotation method has a better tradeoff between data annotation time and PN diagnosis performance.
Table 9.
Comparison of the annotation time, PN detection, and classification performance using different annotation methods.
| Annotation method | Contour | Bounding-box | Circle | Point |
|---|---|---|---|---|
| Annotation time/slice | 14.34±4.91 secs | 8.94±3.34 secs | 4.98±1.55 secs | 2.56±1.03 secs |
| Annotation time/nodule | 95.61±44.37 secs | 59.6±10.67 secs | 19.87±6.3 secs | 12.58±4.87 secs |
| Nodule Detection Performance | U-Net: Precision = 0.893±0.058, Recall = 0.907±0.041, AUC = 0.915±0.033 |
MSANet: Precision =0.956±0.045, Recall =0.976±0.037, AUC = 0.952±0.036 |
NC-UNet: Precision =0.929±0.058, Recall =0.932±0.049, AUC = 0.938±0.046 |
NC-UNet: Precision = 0.855±0.057, Recall = 0.914±0.054, AUC = 0.884±0.065 |
| Nodule Grade Classification Performance | Acc=0.942±0.018, AUC=0.936±0.023 |
Acc=0.936±0.026, AUC=0.925±0.023 |
Acc=0.927±0.039, AUC=0.912±0.037 |
Acc=0.862±0.074, AUC=0.856±0.065 |
| Nodule Grade Classification Performance | Acc=0.887±0.032, AUC_GGO=0.926±0.024, AUC_PSN=0.892±0.047, AUC_SN=0.905±0.054. |
Acc=0.894±0.032, AUC_GGO=0.932±0.041, AUC_PSN=0.876±0.037, AUC_SN=0.893±0.043. |
Acc=0.868±0.032, AUC_GGO=0.912±0.038, AUC_PSN=0.884±0.046, AUC_SN=0.879±0.052. |
Acc=0.815±0.032, AUC_GGO=0.829±0.033, AUC_PSN=0.784±0.058, AUC_SN=0.812±0.055. |
4.6. Comparison of model complexity
As shown in Table 10, to compare the model complexity of different DL models, we computed the number of parameters (Params), Floating-point operations per second (FLOPS), and Multiply accumulate operations (MACs) of networks using an open-source tool2. We find that there is no obvious relationship between the model performance and model complexity. Compared with other DL models, our network ranks at the middle level regarding Params, FLOPs, and MACs, achieving significant advantages in nodule detection and classification. For example, the Params of VGG-19 is twice the proposed model, but the AUC of nodule classification is 0.096 fewer than the proposed model.
Table 10.
Comparison of model parameters, FLOPs, and MACs.
| Application | Model | Input Resolution | Params (M) | FLOPs (G) | MACs (G) | AUC on LIDC-IDRI dataset |
|---|---|---|---|---|---|---|
| Nodule Detection | U-Net | 448×448 | 28.09 | 185.15 | 92.48 | 0.915±0.033 |
| Nodulenet | 448×448 | 9.17 | 213.39 | 106.47 | 0.833±0.057 | |
| SANet | 448×448 | 29.59 | 304.46 | 152.02 | 0.862±0.059 | |
| MSANet | 448×448 | 40.84 | 105.76 | 52.74 | 0.875±0.048 | |
| Proposed | 448×448 | 15.42 | 186.25 | 98.43 | 0.938±0.046 | |
| Nodule Classification | AlexNet | 128×128 | 61.10 | 0.50 | 0.25 | 0.712±0.063 |
| VGG-19 | 128×128 | 143.667 | 12.99 | 6.49 | 0.816±0.054 | |
| ResNet-18 | 128×128 | 11.69 | 1.12 | 0.58 | 0.845±0.073 | |
| ViT | 128×128 | 19.06 | 11.35 | 5.67 | 0.798±0.057 | |
| Proposed | 128×128 | 70.74 | 12.65 | 6.32 | 0.912±0.037 |
4.7. Comparison with traditional machine learning methods
In previous works, the traditional machine learning (ML) models and the hand-crafted feature were widely utilized to classify the nodule label, including Support vector machines (SVM) (Farag et al., 2017), Radom Forest (Lee et al., 2010), AdaBoost (Muzammil et al., 2021), K-Nearest Neighbors (KNN) (Saikia et al., 2022), and Multiple-layer perceptron (MLP) (Pereira et al., 2006). To compare the PN grade performance between our BM-CNN model and other ML models, we used the proposed network to detect the nodule ROIs with size 128×128 on the LIDC-IDRI dataset and extracted the multi-block local binary pattern (LBP) texture features from nodule ROIs for training and testing the ML-based nodule grade classifier, as shown in Table 11. We find that our method significantly outperforms other ML-based approaches in terms of AUC, TPR, FPR, and Acc. In addition, we computed the P-value of the predicted probability value between our method and other ML-based approaches. All p-values are lower than 0.01, demonstrating that the disparity between the proposed and ML-based models is statistically significant.
Table 11.
Comparison of nodule grade classification with traditional ML methods. Each value is represented as “mean±standard deviation”.
| Method | Image feature | Acc | TPR | FPR | AUC | P-value |
|---|---|---|---|---|---|---|
| SVM | Multi-block local binary pattern | 0.702±0.062 | 0.731±0.046 | 0.225±0.034 | 0.659±0.037 | < 0.01 |
| Random Forest | 0.713±0.075 | 0.714±0.033 | 0.164±0.092 | 0.735±0.031 | < 0.01 | |
| AdaBoost | 0.748±0.081 | 0.814±0.064 | 0.157±0.069 | 0.802±0.035 | < 0.01 | |
| KNN | 0.756±0.045 | 0.763±0.041 | 0.174±0.042 | 0.694±0.079 | < 0.01 | |
| MLP | 0.816±0.057 | 0.807±0.048 | 0.189±0.039 | 0.787±0.036 | < 0.01 | |
| Proposed | DL feature | 0.927±0.039 | 0.898±0.032 | 0.055±0.022 | 0.912±0.037 |
5. Discussion
5.1. Data quality of annotation
The majority of present DL approaches rely on extensive and high-quality training datasets (Armato III et al., 2011; Setio et al., 2017), which significantly increases the effort amplify the burden of labeling. Medical images pose additional challenges in precise annotation due to the complex anatomical structures, low signal-to-noise ratio, and low contrast among other factors. While using poorly annotated data alleviates these issues, it may produce biased or erroneous models. Therefore, striking a balance between model performances and labeling burden is crucial for an effective nodule classification system. Our weakly supervised PN detection network is capable of accurately locating PN based on coarse annotation results, making our model own higher efficiency and fault tolerance compared to other models that use large and strictly labeled datasets. Therefore, this approach minimizes the time and the cost associated with annotation and enables the construction of an expansive dataset. These insights derived from this work can also contribute to the development of an efficient system for accurate diagnosis of PN.
5.2. Class-imbalance and image quality in PN classification
We collected and annotated CT images from 2,740 patients with PN and used 822 cases from this dataset to train our model. These dataset sizes are significantly larger than those used in other studies, especially in terms of the testing dataset. As such, our model is more robust compared to other models. Notably, our model showed superior performance in classifying nodule subtypes compared to nodule grade for two primary reasons. The training dataset exhibited an imbalance between MPN and BPN with a ratio of 1:3.56 (Table 5), while the ratio of GGO, PSN, and SN respectively was 1:1.09:1. Compared to other approaches, the class imbalance issue was more challenge in grading nodules in this study. To address the class imbalance concern in nodule grade classification, we used data augmentation techniques to increase the number of BPN and utilized the focal-loss strategy to assign higher weights to the benign label in the loss function. Moreover, we used data in JPEG formatting for the training purpose, which captured the appearance features of the nodules along with compressed tissue attenuation information. The grayscale of the CT images used in this study was only 8-bit, whereas the grayscale of raw CT images typically is 16-bit. This reduction in grayscale resolution by 64-fold can affect the classification of nodule grades. However, the classification of nodule subtypes, which rely on the shape and texture features, would not be impaired by the reduced resolution. Nevertheless, despite these limitations, our model still achieved highly competitive AUC scores for PN grade and subtypes.
5.3. Handcrafted feature and multi-scale strategy
Our results demonstrated that the incorporation of a handcrafted feature and a multi-scale strategy can significantly enhance the performance of nodule classification. To shed light on this improvement, we used a statistical method to quantitatively assess the variation in B-scale values across different PN grades and subtypes. We found that the use of the B-scale value as prior knowledge guided the model to pay greater attention to shape differences among nodules. In addition, the usage of a multi-scale strategy mitigated the model’s sensitivity to the selection of ROI size, thereby increasing the system’s robustness to variations in nodule size. Our findings may inspire further research in identifying handcrafted features, which were found to be strongly correlated with nodule classification. New strategies can also be proposed for integrating multi-scale features and classification outcomes, leading to enhanced performance in nodule classification.
5.4. Comparison with the state-of-the-arts
This model encompasses PN detection, PN grade classification, and PN morphologic subtype classification, which closely mimic the manual diagnosis process for PN in clinical settings. Other models have not demonstrated comparable performance thus far. Consequently, our model is better suited for differential diagnosis of PN. Moreover, our model can process CT images in various formats, including DICOM, JEPG, PNG, screen snapshots, and images caught by devices such as those done by cell phone cameras. The versatility allows our model to be implemented in resource-limited settings where different image formats may be encountered. Furthermore, this advantage enhances the speed and safety of medical image transmissions compared to models that solely rely on DICOM files. This breakthrough in the field of medical imaging thus holds tremendous potential for revolutionizing medical data processing and analysis.
6. Conclusion
We developed a fully automatic system for diagnosis of PN in CT scans by leveraging a suitable DL model and weak annotations. In this system, a center-point based multi-object detection approach was used to identify the location of PN in CT scans, achieving high precision and recall rates. Moreover, a multi-scale PN classification network was developed, incorporating B-scale features and soft-voting to classify the grade and subtypes of PN in its classification process. Overall, our system achieves remarkable proficiency in the diagnosis of PN. In future works, a PN segmentation technique will be developed which uses weak annotations and a weakly supervised DL model to learn the morphological information of PN. This technique can thus further enhance the accuracy and robustness of the PN classification system. The proposed method is also generable, and can be propagated to any localization problems with limited resources.
Highlights.
An automatic and accurate system for nodule grade and subtype classification in CT
Design of network architecture to detect the nodules using weak annotation
The hand-crafted shape feature is helpful for nodule classification
Combining multi-scale classification results to improve the system’s robustness
Acknowledgement
The authors contributed equally: L.X. and Y.X. This work was supported by the National Cancer Institute R01CA230339 and R37CA255948, Natural Science Foundation of Jiangsu Province (BK20210068), and Mega-Project of Wuxi Commission of Health (Z202216).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- Al-Ameri A, Malhotra P, Thygesen H, Plant PK, Vaidyanathan S, Karthik S, Scarsbrook A, Callister ME, 2015. Risk of malignancy in pulmonary nodules: a validation study of four prediction models. Lung Cancer 89, 27–30. [DOI] [PubMed] [Google Scholar]
- Al-Shabi M, Shak K, Tan M, 2022. ProCAN: Progressive growing channel attentive non-local network for lung nodule classification. Pattern Recognition 122, 108309. [Google Scholar]
- Anoshina NA, Sorokin DV, 2022. Weak supervision using cell tracking annotation and image registration improves cell segmentation, 2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA). IEEE, pp. 1–5. [Google Scholar]
- Arai K, Herdiyeni Y, Okumura H, 2012. Comparison of 2D and 3D local binary pattern in lung cancer diagnosis. International Journal of Advanced Computer Science and Applications 3. [Google Scholar]
- Armato III SG, McLennan G, Bidaut L, McNitt - Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, 2011. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics 38, 915–931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bagci U, Udupa JK, Chen X, 2010. Ball-scale based hierarchical multi-object recognition in 3D medical images. Medical Imaging 2010: Image Processing. SPIE, pp. 1267–1278. [Google Scholar]
- Bento N, Rebelo J, Barandas M, Carreiro AV, Campagner A, Cabitza F, Gamboa H, 2022. Comparing handcrafted features and deep neural representations for domain generalization in human activity recognition. Sensors 22, 7324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen H, Xu Y, Ma Y, Ma B, 2010. Neural network ensemble-based computer-aided diagnosis for differentiation of lung nodules on CT images: clinical evaluation. Academic radiology 17, 595–602. [DOI] [PubMed] [Google Scholar]
- Chen K, Bai J, Reuben A, Zhao H, Kang G, Zhang C, Qi Q, Xu Y, Hubert S, Chang L, 2021. Multiomics analysis reveals distinct immunogenomic features of lung cancer with ground-glass opacity. American journal of respiratory and critical care medicine 204, 1180–1192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cruz-Ramos C, García-Avila O, Almaraz-Damian J-A, Ponomaryov V, Reyes-Reyes R, Sadovnychiy S, 2023. Benign and malignant breast tumor classification in ultrasound and mammography images via fusion of deep learning and handcraft features. Entropy 25, 991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Del Ciello A, Franchi P, Contegiacomo A, Cicchetti G, Bonomo L, Larici AR, 2017. Missed lung cancer: when, where, and why? Diagnostic and interventional radiology 23, 118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Geliy S, 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, International Conference on Learning Representations. [Google Scholar]
- El-Baz A, Beache GM, Gimel’farb G, Suzuki K, Okada K, Elnakib A, Soliman A, Abdollahi B, 2013. Computer-aided diagnosis systems for lung cancer: challenges and methodologies. International journal of biomedical imaging 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- El-Regaily SA, Salem MAM, Aziz MHA, Roushdy MI, 2020. Multi-view Convolutional Neural Network for lung nodule false positive reduction. Expert systems with applications 162, 113017. [Google Scholar]
- Farag AA, Ali A, Elshazly S, Farag AA, 2017. Feature fusion for lung nodule classification. International journal of computer assisted radiology and surgery 12, 1809–1818. [DOI] [PubMed] [Google Scholar]
- Ferreira CA, Cunha A, Mendonça AM, Campilho A, 2018. Convolutional neural network architectures for texture classification of pulmonary nodules, Iberoamerican congress on pattern recognition. Springer, pp. 783–791. [Google Scholar]
- Ferreira CA, Cunha A, Mcndonça AM, Campilho A, 2019. Convolutional Neural Network Architectures for Texture Classification of Pulmonary Nodules. Springer International Publishing, Cham, pp. 783–791. [Google Scholar]
- Fu Y, Xue P, Xiao T, Zhang Z, Zhang Y, Dong E, 2022. Semi-Supervised Adversarial Learning for Improving the Diagnosis of Pulmonary Nodules. IEEE Journal of Biomedical and Health Informatics. [DOI] [PubMed] [Google Scholar]
- Gould MK, Donington J, Lynch WR, Mazzone PJ, Midthun DE, Naidich DP, Wiener RS, 2013. Evaluation of individuals with pulmonary nodules: When is it lung cancer?: Diagnosis and management of lung cancer: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 143, e93S–e120S. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gould MK, Tang T, Liu I-LA, Lee J, Zheng C, Danforth KN, Kosco AE, Di Fiore JL, Suh DE, 2015. Recent trends in the identification of incidental pulmonary nodules. American journal of respiratory and critical care medicine 192, 1208–1214. [DOI] [PubMed] [Google Scholar]
- Gu Y, Chi J, Liu J, Yang L, Zhang B, Yu D, Zhao Y, Lu X, 2021. A survey of computer-aided diagnosis of lung nodules from CT scans using deep learning. Computers in biology and medicine 137, 104806. [DOI] [PubMed] [Google Scholar]
- Guo Y, He Y, Lyu J, Zhou Z, Yang D, Ma L, Tan H.-t., Chen C, Zhang W, Hu J, 2022. Deep learning with weak annotation from diagnosis reports for detection of multiple head disorders: a prospective, multicentre study. The Lancet Digital Health 4, e584–e593. [DOI] [PubMed] [Google Scholar]
- Guo Z, Zhao L, Yuan J, Yu H, 2021. Msanet: multiscale aggregation network integrating spatial and channel information for lung nodule detection. IEEE Journal of Biomedical and Health Informatics 26, 2547–2558. [DOI] [PubMed] [Google Scholar]
- Gurcan MN, Sahiner B, Petrick N, Chan HP, Kazerooni EA, Cascade PN, Hadjiiski L, 2002. Lung nodule detection on thoracic computed tomography images: Preliminary evaluation of a computer—aided diagnosis system. Medical Physics 29, 2552–2558. [DOI] [PubMed] [Google Scholar]
- Habib M, Ramzan M, Khan SA, 2022. A Deep Learning and Handcrafted Based Computationally Intelligent Technique for Effective COVID-19 Detection from X-ray/CT-scan Imaging. Journal of Grid Computing 20, 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harsono IW, Liawatimena S, Cenggoro TW, 2022. Lung nodule detection and classification from Thorax CT-scan using RetinaNet with transfer learning. Journal of King Saud University-Computer and Information Sciences 34, 567–577. [Google Scholar]
- He K, Zhang X, Ren S, Sun J, 2016. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. [Google Scholar]
- Hu J, Shen L, Sun G, 2018. Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141. [Google Scholar]
- Jiang H, Shen F, Gao F, Han W, 2021. Learning efficient, explainable and discriminative representations for pulmonary nodules classification. Pattern Recognition 113, 107825. [Google Scholar]
- Kabbai L, Abdellaoui M, Douik A, 2019. Image classification by combining local and global features. The Visual Computer 35, 679–693. [Google Scholar]
- Kandemir M, Hamprecht FA, 2015. Computer-aided diagnosis from weak supervision: A benchmarking study. Computerized medical imaging and graphics 42, 44–50. [DOI] [PubMed] [Google Scholar]
- Kang G, Liu K, Hou B, Zhang N, 2017. 3D multi-view convolutional neural networks for lung nodule classification. PloS one 12, e0188290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khademi S, Heidarian S, Afshar P, Naderkhani F, Oikonomou A, Plataniotis KN, Mohammadi A, 2023. Spatio-Temporal Hybrid Fusion of CAE and SWin Transformers for Lung Cancer Malignancy Prediction, ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 1–5. [Google Scholar]
- Khan T, Usman Y, Abdo T, Chaudry F, Keddissi JI, Youness HA, 2019. Diagnosis and management of peripheral lung nodule. Annals of Translational Medicine 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krizhevsky A, Sutskever I, Hinton GE, 2017. Imagenet classification with deep convolutional neural networks. Communications of the ACM 60, 84–90. [Google Scholar]
- Lee SLA, Kouzani AZ, Hu EJ, 2010. Random forest based lung nodule classification aided by clustering. Computerized medical imaging and graphics 34, 535–542. [DOI] [PubMed] [Google Scholar]
- Lin T-Y, Goyal P, Girshick R, He K, Dollár P, 2017. Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision, pp. 2980–2988. [Google Scholar]
- Liu C, Hu S-C, Wang C, Lafata K, Yin F-F, 2020. Automatic detection of pulmonary nodules on CT images with YOLOv3: development and evaluation using simulated and patient data. Quantitative Imaging in Medicine and Surgery 10, 1917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madero Orozco H, Vergara Villegas OO, Cruz Sánchez VG, Ochoa Domínguez H.d.J., Nandayapa Alfaro M.d.J., 2015. Automated system for lung nodules classification based on wavelet feature descriptor and support vector machine. Biomedical engineering online 14, 1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malik H, Anees T, Chaudhry MU, Gono R, Jasiński M, Leonowicz Z, Bemat P, 2023. A Novel Fusion Model of Hand-Crafted Features with Deep Convolutional Neural Networks for Classification of Several Chest Diseases using X-ray Images. IEEE Access. [Google Scholar]
- Mastouri R, Khlifa N, Neji H, Hantous-Zannad S, 2020. Deep learning-based CAD schemes for the detection and classification of lung nodules from CT images: A survey. Journal of X-ray Science and Technology 28, 591–617. [DOI] [PubMed] [Google Scholar]
- Mehta K, Jain A, Mangalagiri J, Menon S, Nguyen P, Chapman DR, 2021. Lung nodule classification using biomarkers, volumetric radiomics, and 3D CNNs. Journal of Digital Imaging, 1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mei J, Cheng M-M, Xu G, Wan L-R, Zhang H, 2021. SANet: A slice-aware network for pulmonary nodule detection. IEEE transactions on pattern analysis and machine intelligence 44, 4374–4387. [DOI] [PubMed] [Google Scholar]
- Mendes C.F.S.d.F., Krohling RA, 2022. Deep and handcrafted features from clinical images combined with patient information for skin cancer diagnosis. Chaos, Solitons & Fractals 162, 112445. [Google Scholar]
- Mkindu H, Wu L, Zhao Y, 2023. Lung nodule detection in chest CT images based on vision transformer network with Bayesian optimization. Biomedical Signal Processing and Control 85,104866. [Google Scholar]
- Muzammil M, Ali I, Haq IU, Amir M, Abdullah S, 2021. Pulmonary nodule classification using feature and ensemble learning-based fusion techniques. IEEE Access 9, 113415–113427. [Google Scholar]
- Nibali A, He Z, Wollersheim D, 2017. Pulmonary nodule classification with deep residual networks. International journal of computer assisted radiology and surgery 12, 1799–1808. [DOI] [PubMed] [Google Scholar]
- Nishio M, Sugiyama O, Yakami M, Ueno S, Kubo T, Kuroda T, Togashi K, 2018. Computer-aided diagnosis of lung nodule classification between benign nodule, primary lung cancer, and metastatic lung cancer at different image size using deep convolutional neural network with transfer learning. PloS one 13, e0200721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Onishi Y, Teramoto A, Tsujimoto M, Tsukamoto T, Saito K, Toyama H, Imaizumi K, Fujita H, 2019. Automated pulmonary nodule classification in computed tomography images using a deep convolutional neural network trained by generative adversarial networks. BioMed research international 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ozdemir O, Russell RL, Berlin AA, 2019. A 3D probabilistic deep learning system for detection and diagnosis of lung cancer using low-dose CT scans. IEEE transactions on medical imaging 39, 1419–1429. [DOI] [PubMed] [Google Scholar]
- Pereira CS, Alexandre LA, Mcndonça AM, Campilho A, 2006. A multiclassifier approach for lung nodule classification. Image Analysis and Recognition: Third International Conference, ICIAR 2006, Povoa de Varzim, Portugal, September 18-20, 2006, Proceedings, Part II 3. Springer, pp. 612–623. [Google Scholar]
- Riquelme D, Akhloufi MA, 2020. Deep learning for lung cancer nodules detection and classification in CT scans. Ai 1, 28–67. [Google Scholar]
- Ronneberger O, Fischer P, Brox T, 2015. U-net: Convolutional networks for biomedical image segmentation, International Conference on Medical image computing and computer-assisted intervention. Springer, pp. 234–241. [Google Scholar]
- Rony I, Belharbi S, Dolz J, Ben Ayed I, McCaffrey L, Granger E, 2023. Deep Weakly-Supervised Learning Methods for Classification and Localization in Histology Images: A Survey. Machine Learning for Biomedical Imaging 2, 96–150. [Google Scholar]
- Saikia T, Hansdah M, Singh KK, Bajpai MK, 2022. Classification of lung nodules based on transfer learning with K-Nearest Neighbor (KNN), 2022 IEEE international conference on imaging systems and techniques (IST). IEEE, pp. 1–6. [Google Scholar]
- Saji H, Okada M, Tsuboi M, Nakajima R, Suzuki K, Aokage K, Aoki T, Okami I, Yoshino I, Ito H, 2022. Segmentectomy versus lobectomy in small-sized peripheral non-small-cell lung cancer (JCOG0802/WJOG4607L): a multicentre, open-label, phase 3, randomised, controlled, non-inferiority trial. The Lancet 399, 1607–1617. [DOI] [PubMed] [Google Scholar]
- Setio AAA, Traverso A, De Bel T, Berens MS, Van Den Bogaard C, Cerello P, Chen H, Dou Q, Fantacci ME, Geurts B, 2017. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Medical image analysis 42, 1–13. [DOI] [PubMed] [Google Scholar]
- Shaw AT, Yeap BY, Solomon BJ, Riely GJ, Gainor J, Engelman JA, Shapiro GI, Costa DB, Ou S-HI, Butaney M, 2011. Effect of crizotinib on overall survival in patients with advanced nonsmall-cell lung cancer harbouring ALK gene rearrangement: a retrospective analysis. The lancet oncology 12, 1004–1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shen S, Han SX, Aberle DR, Bui AA, Hsu W, 2019. An interpretable deep hierarchical semantic convolutional neural network for lung nodule malignancy classification. Expert systems with applications 128, 84–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shen W, Peng Z, Wang X, Wang H, Cen I, Jiang D, Xie L, Yang X, Tian Q, 2023. A survey on label-efficient deep image segmentation: Bridging the gap between weak supervision and dense prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence. [DOI] [PubMed] [Google Scholar]
- Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556. [Google Scholar]
- Simonyan K, Zisserman A, 2015. Very deep convolutional networks for large-scale image recognition, 3rd International Conference on Learning Representations (ICLR 2015). Computational and Biological Learning Society. [Google Scholar]
- Su Y, Li D, Chen X, 2021. Lung nodule detection based on faster R-CNN framework. Computer Methods and Programs in Biomedicine 200, 105866. [DOI] [PubMed] [Google Scholar]
- Suzuki K, Li F, Sone S, Doi K, 2005. Computer-aided diagnostic scheme for distinction between benign and malignant nodules in thoracic low-dose CT by use of massive training artificial neural network. IEEE Transactions on Medical Imaging 24, 1138–1150. [DOI] [PubMed] [Google Scholar]
- Swensen SJ, Silverstein MD, Ilstrup DM, Schleck CD, Edell ES, 1997. The probability of malignancy in solitary pulmonary nodules: application to small radiologically indeterminate nodules. Archives of internal medicine 157, 849–855. [PubMed] [Google Scholar]
- Tan BB, Flaherty KR, Kazerooni EA, Iannettoni MD, 2003. The solitary pulmonary nodule. Chest 123, 89S–96S. [DOI] [PubMed] [Google Scholar]
- Tang H, Zhang C, Xie X, 2019. Nodulenet: Decoupled false positive reduction for pulmonary nodule detection and segmentation, Medical Image Computing and Computer Assisted Intervention 2019, Proceedings, Part VI 22. Springer, pp. 266–274. [Google Scholar]
- Tyagi S, Talbar SN, 2022. CSE-GAN: A 3D conditional generative adversarial network with concurrent squeeze-and-excitation blocks for lung nodule segmentation. Computers in Biology and Medicine 147, 105781. [DOI] [PubMed] [Google Scholar]
- Udupa JK, Liu T, Jin C, Zhao L, Odhner D, Tong Y, Agrawal V, Pednekar G, Nag S, Kotia T, 2022. Combining natural and artificial intelligence for robust automatic anatomy segmentation: Application in neck and thorax auto - contouring. Medical physics 49, 7118–7149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang R, Zhang Y, Yang J, 2022. TransPND: A Transformer Based Pulmonary Nodule Diagnosis Method on CT Image, Pattern Recognition and Computer Vision: 5th Chinese Conference, PRCV 2022, Shenzhen, China, November, 2022, Proceedings, Part II. Springer, pp. 348–360. [Google Scholar]
- Way TW, Hadjiiski LM, Sahiner B, Chan HP, Cascade PN, Kazerooni EA, Bogot N, Zhou C, 2006. Computer-aided diagnosis of pulmonary nodules on CT scans: Segmentation and classification using 3D active contours. Medical physics 33, 2323–2337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie Y, Xia Y, Zhang J, Song Y, Feng D, Fulham M, Cai W, 2018. Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest CT. IEEE transactions on medical imaging 38, 991–1004. [DOI] [PubMed] [Google Scholar]
- Xu T, Cheng I, Long R, Mandal M, 2013. Novel coarse-to-fine dual scale technique for tuberculosis cavity detection in chest radiographs. EURASIP Journal on Image and Video Processing 2013, 1–18. [Google Scholar]
- Zhang D, Han J, Cheng G, Yang M-H, 2021. Weakly supervised object localization and detection: A survey. IEEE transactions on pattern analysis and machine intelligence 44, 5866–5885. [DOI] [PubMed] [Google Scholar]
- Zhang F, Song Y, Cai W, Lee M-Z, Zhou Y, Huang H, Shan S, Fulham MJ, Feng DD, 2013. Lung nodule classification with multilevel patch-based context analysis. IEEE Transactions on Biomedical Engineering 61, 1155–1166. [DOI] [PubMed] [Google Scholar]
- Zhang H, Burrows L, Meng Y, Sculthorpe D, Mukherjee A, Coupland SE, Chen K, Zheng Y, 2023. Weakly supervised segmentation with point annotations for histopathology images via contrast-based variational model. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15630–15640. [Google Scholar]
