Predicting Unnecessary Nodule Biopsies from a Small, Unbalanced, and Pathologically Proven Dataset by Transfer Learning

Fangfang Han; Linkai Yan; Junxin Chen; Yueyang Teng; Shuo Chen; Shouliang Qi; Wei Qian; Jie Yang; William Moore; Shu Zhang; Zhengrong Liang

doi:10.1007/s10278-019-00306-z

. 2020 Mar 6;33(3):685–696. doi: 10.1007/s10278-019-00306-z

Predicting Unnecessary Nodule Biopsies from a Small, Unbalanced, and Pathologically Proven Dataset by Transfer Learning

Fangfang Han ^1,², Linkai Yan ², Junxin Chen ², Yueyang Teng ², Shuo Chen ², Shouliang Qi ², Wei Qian ^3,^✉, Jie Yang ⁴, William Moore ⁵, Shu Zhang ⁵, Zhengrong Liang ^5,^✉

PMCID: PMC7256141 PMID: 32144499

Abstract

This study explores an automatic diagnosis method to predict unnecessary nodule biopsy from a small, unbalanced, and pathologically proven database. The automatic diagnosis method is based on a convolutional neural network (CNN) model. Because of the small and unbalanced samples, the presented method aims to improve the transfer learning capability via the VGG16 architecture and optimize the related transfer learning parameters. For comparison purpose, a traditional machine learning method is implemented, which extracts the texture features and classifies the features by support vector machine (SVM). The database includes 68 biopsied nodules, 16 are pathologically proven benign and the remaining 52 are malignant. To consider the volumetric data by the CNN model, each image slice from each nodule volume is selected randomly until all image slices of each nodule are utilized. The leave-one-out and 10-folder cross validations are applied to train and test the randomly selected 68 image slices (one image slice from one nodule) in each experiment, respectively. The averages over all the experimental outcomes are the final results. The experiments revealed that the features from both the medical and the natural images share the similarity of focusing on simpler and less-abstract objects, leading to the conclusion that not the more the transfer convolutional layers, the better the classification results. Transfer learning from other larger datasets can supply additional information to small and unbalanced datasets to improve the classification performance. The presented method has shown the potential to adapt CNN architecture to improve the prediction of unnecessary nodule biopsy from small, unbalanced, and pathologically proven volumetric dataset.

Keywords: Lung cancer screening, Small and unbalanced dataset, Decrease unnecessary biopsy, Transfer learning, Convolutional neural networks

Introduction

As a multitude of survey data and conclusions published in the whole world, lung cancer has been the most threatening cancer to human health [1–5]. And early detection would increase the survival rates of lung cancer patients significantly. For the last 30 years, advancement in radiology brought a transformative tool to a noninvasive detection of the cancer in early stage. Valuable information can be derived from the radiological volumetric images, so that radiologist experts could detect and even predict the cancer at an early stage. While significant progress has been made, challenge of desiring as early as possible in detecting the cancer remains. This challenge has been motivating great research effort to develop sophisticated computer algorithms to extract as much valuable information as possible from the volumetric images to achieve the goal of detecting the cancer as early as possible. A brief review of the past effort for the goal of detecting early lung cancer is given below.

The detection of early lung cancer includes two steps: In the first step, the task is to detect the pulmonary nodules (PNs), which have been recognized as the precursor of the lung cancer. So far, the developed computer algorithms for PN detection have achieved very high sensitivity. However, among the detected PNs, more than 95% are benign [6]. Identifying the malignant from the benign nodules and characterizing the identified malignant nodules become the task of the second step for the goal of detecting early lung cancer. In the following presentation, we will focus on the second step, also called computer-aided diagnosis (CADx).

In the initial stage of developing computer algorithms to assist experts to perform the task of differentiating malignant from benign nodules, the efforts were devoted to calculate the features, described by the experts, from the images, and then design adequate classifiers on the features for the differentiation task. There are host of representative papers published during this initial stage period. For example, Iwano et al. [7] calculated the shape of PNs for CADx; Saito et al. [8] and El-Baz et al. [9, 10] described both two-dimensional (2D) and three-dimensional (3D) characters of surface morphology of PNs for CADx; Yankelevitz and Kostis et al. [11, 12] classified small nodules according to the growth-rates estimation. Although these efforts considered the experts’ experiences as a priori knowledge, the extracted features only included very few special characters which are limited by human’s recognitions.

Later on, researchers found that the radiological images might contain much more useful information than those that could be described by the experts. Therefore, more quantitative image features were described by mathematics and calculated by computer technologies beyond the prior knowledge, such as the texture features described by the co-occurrence matrices or filters [13], histograms of oriented gradients (HOG) [14], local binary patterns (LBP) [15], and other methods [16, 17]. Then the features and the associated labels were trained by a classification module, such as support vector machine (SVM) [18], decision trees [19], and random forest [20]. This later effort could be seemed as the second stage of developing CADx. However, as multiple features have been extracted and the number of features increased significantly, then how to determine the contributions of the individual features for classification and how to combine them as integrated features are becoming extremely arduous problems. Although these computed features may contain more useful information that could not be described by experts’ prior experience, these features are still artificial features in computer vision, and they are picked up in a more or less manual manner. We cannot guarantee these features are the right/proper features for the CADx task in different images/dataset and different situations.

Recently, with the development and successful applications of machine deep learning techniques, the image features could be analyzed and categorized by the computer techniques automatically for an optimal solution. This could be seemed as the third stage of developing CADx. For early lung cancer diagnosis, one widely cited example of the machine deep learning techniques, called convolutional neural network (CNN) [21], has shown encouraging classification results on PNs as compared to other conventional classification methods, such as the deep belief network (DBN) [22], the stacked denoising auto-encoder (SDAE) [23], and the hand-tuned features [24, 25]. But till now, they all generated good results on the publicly available datasets, such as LIDC-IDRI and LUNA [26, 27]. These observations may be understood as follows. Although these public datasets have great amount of useful information, few of them has the pathological biopsy reports for ground truth on the nodule diagnosis, instead the radiologist experts’ visual assessment on the nodules is treated as the ground truth. Thus, even though the CNN classifications could achieve high accuracies, it can only demonstrate that the deep learning for nodule malignancy classifications could reach the best doctors’ level of visual diagnosis on images, as the conventional classification methods try to achieve. Unfortunately, as we all know, the expert doctor’s visual diagnosis is not the golden standard. Therefore, experiments on the nodules with pathological diagnosis are exceedingly indispensable.

However, it is truly challenging to get enough data with the pathological diagnosis (or biopsy reports) to satisfy the need of large volume of training data for the deep learning networks. Therefore, confronted with this problem, transfer learning strategy emerges as an option to relieve the challenge with encouraging outcome. For example, Shin et al. [28] have demonstrated that transfer learning from pre-trained ImageNet (via fine-tuning) could be useful for the computer-aided detection (CADe) of lung nodules and the CADx of interstitial lung diseases. Hosny et al. [29, 30] have tested transfer learning and AlexNet on small datasets for skin lesions and obtained a significant improvement. To the best of our knowledge, there are few reports (if not all) for the malignant diagnosis of lung nodules published until now. Therefore, in this paper, we aim to explore whether and how deep learning with transfer learning techniques could be useful to the malignant diagnosis of a small number of pulmonary nodules with pathologically proven reports.

As previously mentioned in [21], CNN is most suitable for image recognitions. Driving by different ideas and applications, now there are several architectures built on CNN, such as AlexNet, VGG, GoogleNet, and ResNet, all are widely cited architectures [31–34]. Which one could be selected for our preliminary study in this paper has been considered gravely. Our own dataset with biopsy reports only contains 68 lung nodules (16 benign and 52 malignant proved by pathological diagnosis on biopsied nodule specimens) whose diameters are from 9.1 mm to 130.8 mm (mean size of 31.5 mm). With consideration of the spacing of the 2D CT (computed tomography) images of 1.25 mm, this means that the smallest nodule may occupy only 8×8 pixels or less in the CT images. To study the features of these small nodules, a CNN architecture containing smaller convolution kernels and less convolution layers might be the best choice. The reasons are firstly, smaller receptive fields mean fewer parameters; secondly, more activation functions followed by smaller convolution kernel could bring more local features and increase the ability of discrimination; finally, the nodule images are too small to improve the performance of the CNN architecture by adding more convolution layers. Therefore, we selected the VGG16 architecture in our experiments and considered very small 3×3 receptive fields and performed convolution with stride 1 pixel.

In order to apply the VGG16 architecture on a small dataset, transfer learning strategy was adapted for our datasets with the pathological biopsy ground truth. More concretely, we adapted the parameters trained by the public database containing a large number of natural images – ImageNet as the initial features to initialize the VGG16 architecture. Then we add part of our pulmonary nodule dataset (labeled with the ground truth) into the initial features (as the training dataset) to fine-tune the VGG16 architecture parameters. To evaluate the efficiency of this transfer learning strategy, we compared the classification results from the presented deep transfer learning with a typical conventional method which extracts the gray-level co-occurrence texture features and classifies the texture features by SVM [16]. Both leave-one-out and 10-fold validation experiments were performed to evaluate the robust of the classification results, which makes the conclusion more stable and credible.

The main contributions of this paper could be summarized to three points below. First, we designed a deep transfer learning method on a small and unbalanced CT image dataset of the PNs and obtained higher efficiency than the traditional machine learning method for malignant classifications. Second, we validated that more transfer convolutional layers for the small and unbalanced data would not guarantee better classification results. Third, all the CT images of PNs used in this paper were scanned by automatic exposure low-radiation-dose protocol, all of the subjects were recruited randomly, and all the PNs have the pathological diagnosis as the ground truth labels.

The rest of this paper is organized as follows. Section 2 describes the main steps for deep learning training and transfer learning, and the details of our adaption of machine deep learning for the task of classifying PNs on a small dataset with pathological ground truth. The experimental design and algorithm implementation are then presented in Section 3. The comparisons of the classification results from the presented deep transfer learning method and the traditional machine learning method are reported in Section 4. Finally, the contributions of the current discoveries and the plan of future research work are discussed in Section 5 and concluded in Section 6.

Methods

In this study, we adopt the VGG16 architecture to explore the potential of classifying PN malignancy and compare it to a traditional machine learning method. From the limited amount of clinical data we can gather with pathological ground truth, we further adopt the transfer learning strategy to fine-tune the parameters trained by public ImageNet and investigate the different performances of different layers of the VGG16 architecture. The flowchart of the deep transfer learning method is shown in Fig. 1.

Fig. 1 — The flowchart of deep transfer learning method

VGG16 Architectures Improved by Different Transfer Layers

VGG [32] is the pronoun of very deep convolutional networks, named by Visual Geometry Group who proposed the concept. The main idea is to improve the convolutional network performance through increasing the depth of architecture design with very small (3 × 3) convolution filters in all layers. For illustration purpose, a typical VGG16 architecture was constructed by five different stacks of convolutional layers, with each stack followed by a max-pooling layer. After the last max-pooling layer, three full-connected layers (the first two each has 4096 channels and the third one has 1000 channels) and one soft-max layer were attached. According to the different combinations of convolutional layers, the architecture was divided into six ConvNet configurations, named by A to E in order. As many studies referred that the configuration D (viz, VGG16 as abbreviation in public) showed the best performance on most types of medical data, thus VGG16 was selected as the basic study architecture of the deep learning network in this paper.

Since the purpose of this paper is intended to explore an efficient deep transfer learning method for classification of a small dataset of PNs, thus, the choice of transfer learning architecture types imitates the classical configurations as proposed in reference [32]: A (11 transfer weight layers), B (13 transfer weight layers), and D (16 transfer weight layers) of the original ConvNet configurations, as well as the simplest configuration (8 transfer weight layers) named by S hereafter. All these four transfer learning architectures of VGG16 named by S, A, B, and D are shown in Fig. 2 and will be described in the following sections.

Fig. 2 — The transfer learning architectures of VGG16 named by S, A, B, and D

Transfer Learning

Transfer learning is an ability of a machine learning system, which is similar to the concept that people can solve new problems faster or with better solutions by the knowledge learned previously [35]. The aim of transfer learning strategy is trying to transfer the knowledge from the previous tasks to a target task when the latter has fewer high-quality training data. Obviously, there should exist some explicit or implicit relationships between the features of the previous and target tasks, thus the transfer learning strategy could be used.

By mathematics, suppose χ is a feature space containing {x₁, x₂, …, x_n}, the corresponding marginal probability distribution is P(X), and then a domain D = {χ, P(X)} can be given. A task T should be performed based on that D consists of two components, a label space L and a predictive function f(·) denoted by

T = \{L, f (\cdot)\} \cdot

If defining a source domain D_S and the corresponding learning task T_S, a target domain D_T (D_T ≠ D_S) and learning task T_T (T_T ≠ T_S), transfer learning is used to help improve the learning efficiency of the target predictive function f_T(·) in T_T with the knowledge from D_S and T_S. More details on transfer learning are interpreted in the paper of Pan and Yang [35]. Because it is extremely challenging to obtain large volume medical data with annotations for deep learning techniques to train, and in order to learn more efficient features by CNN architectures, transfer learning was proposed as an option to relieve the challenge. How many layers should be transferred is according to the features from different levels of abstraction. More theoretical studies and experiments will be needed for future works.

Our Training Protocol with Transfer Learning

The hypothesis of this study is that the CNN trained by the well-labeled natural images from ImageNet could be useful for the recognition of medical images. To test the hypothesis, we train a classification model by a small medical image dataset of PNs with the prior knowledge acquired from the labeled natural images of ImageNet. With the limitation of our computing resource (Intel Core i7–6700@3.40GHz quad-core, Dell 0Y7WYT, 8GB Hynix DDR4 213 MHz, and Nvidia GeForce GTX 10502GB), we could not train a database as large as ImageNet. To circumvent this limitation, the public parameters of the convolutional layers of the VGG16 network are downloaded from the online resources (http://download.tensorflow.org /models/vgg_16_2016_08_28.tar.gz). Then a method for normalizing initialization [36] is applied to the downloaded parameters to set the parameters of the full-connected layers of the VGG16 architecture in order to ensure that the output and input have as similar probability distribution as possible. For example, the Gaussian random initialization method with mean 0 and variance 0.01 was used for the parameter setting. We set the data of each dimension on the same scale in order to (1) avoid the parameters trained by the model being biased toward some dimensions, and also to (2) accelerate the gradient descent algorithm converge during the multiple updating steps. In other words, our goal is to prevent the gradient explosion or disappearance of the output from losing gradient of the activation function in the forward propagation of the deep neural network. The reason is that if the gradient is too big or too small, the network would take very long time or need lot computing power to converge. Our software architecture runs in Python 3.5 environment. The deep learning framework is based on TensorFlow. The python libraries in the deep learning framework include TensorFlow 1.0, Numpy, Math, Matplotlib, Pandas, and PlIL.

To optimize the training process, we used the well-known gradient descent algorithm for the CNN model. In addition, the dropout method proposed in [37] is also adopted to avoid “overfitting” by randomly omitting some feature detectors. Although the dropout method has been shown more efficient on large datasets than small ones, based on our comparison experiments, it has better performance than the batch normalization method [38] on our small dataset of PNs. The comparison tests of the above two methods on 68 cases (45 cases for training and 23 cases for testing) for ten classifications are shown in Table 1. Based on the accuracies of these ten classifications, we can conclude that dropout method is superior to the batch normalization for our data. Therefore, we set the dropout functions in the first and second full-connected layers to avoid “overfitting.” At last stage of VGG16 architectures of Fig. 2, the SoftMax program is used as a classifier for the CNN model to divide the nodules into two classes of benign and malignant. Based on the concept of transfer learning, part of the pathologically labeled nodule images is used to fine-tune the parameters of the CNN model so that the CNN model would be more adaptable to the classification of the PNs. Then the optimized CNN model by the fine-tune training is used to test the rest part of the PNs.

Table 1.

Comparison tests of dropout and batch normalization methods on 68 nodule cases

	Dropout	Batch normalization
1	0.782609	0.695652
2	0.826087	0.695652
3	0.869565	0.652174
4	0.869565	0.695652
5	0.869565	0.73913
6	0.826087	0.608696
7	0.913043	0.695652
8	0.826087	0.695652
9	0.869565	0.608696
10	0.826087	0.739130
Average	0.847826	0.682609
Max	0.913043	0.739130
Min	0.782609	0.608696

Open in a new tab

Although it is shown that the parameters trained by the ImageNet data could be transferred for lung cancer study in [28], the question still remains open which layer features are the efficient ones for PN classification. In other words, whether transferring different layer features could affect the efficiency of PN classification. Inspired by the multiple architectures of VGG network and our experiments, we transferred the parameters of 8, 11, 13, and 16 layers from the initial ones trained by ImageNet by the use of our pathologically labeled PN image samples. In general, transfer learning strategy usually starts on the features from the bottom, middle, or top of the network [39]. As described in the paper [35], the transferability of features from each layer of a neural network could be quantified, but the performance degradation occurs frequently during the transferring at the higher layers. So far, there is no study showing how the performance changes as the repeated convolutional operations vary while retaining the same parameters. Because of these complexities in transferring learning, we follow a logic principle of decreasing the complexities as much as possible for each group of convolutional layers before reaching each max-pooling layer by only transferring the features among different combinations of the convolutional layers. The width of the convolutional layers is started from 64 in the first group of the convolutional layers up to 512 in the last group.

Experiments Design and Implementation

As mentioned above, the purpose of this study is to explore a more efficient method by computer machine learning techniques to distinguish the malignant and benign PNs from a limited volume CT images with pathological reports on the PNs. Thus, the VGG16 architecture plus transfer learning strategy is adapted as a novel computer machine learning technique for the task of CADx of the PNs. To evaluate the efficiency of the proposed technique, the classification experiments are designed and carried out as follows. The flowchart of the main experiments is shown in Fig. 3.

Fig. 3 — The flowchart of the main experiments

Data Preparations

Total of 68 patients, who were scheduled for PN biopsy by medical reasons, were recruited to this study under informed consent after approval by the Institutional Review Board. The average age of the patients is 69.5 which is ranging from 33 to 91 years old, and 52% is male and 48% is female. Each patient underwent a routine clinical CT scanning for needle preparation. The CT images were obtained by a clinical routine protocol of tube voltage of 120kVp and modulated tube current or mAs level with automatic low-exposure dose control. The thickness of the original CT slices in our dataset varies from 1.00 to 1.25 mm, depending on the coverage of axial length. The border of each nodule in the routine patient CT images was drawn by an experienced radiologist. The routing CT image, the drawn nodule border, and the biopsy pathological report of each PN were collected for the CNN-based machine learning classification of PNs. The smallest PN contains 3 slices, the largest one contains 49 slices, and the average is approximately 16 slices. The diameters of the nodules in these 68 datasets are ranged from 9.1 mm to 130.8 mm (mean size of 31.5 mm). These patients have nodule description on their CT reports, where nodule size is given. Therefore, the range of the nodules size occupies from 10 × 10 to 145 × 145 pixels (average of 35 × 35 pixels) according to the mean distance 0.9 mm of the neighbor pixels on x/y direction. The average number of nodule image pixels 35 × 35 was calculated by the summation of all the 68 nodules’ pixels and then division by 68 (which is the total number of nodules). To satisfy the unity size requirement for the input images of the CNN architecture, an image of 48 × 48 pixels containing the entire nodule area as the central part was extracted from the routing CT scan of each patient. The center of the to-be-extracted rectangle area would be the center of the candidate nodule region. Thus, cutting images would ensure that the main area of each nodule is centered in the extracted rectangle area.

In addition, the CNN architecture (i.e., the VGG16 here) adapted by this study is implemented in TensorFlow. In order to utilize the above architecture designed for color images and the parameters transferred from the ImageNet, the extracted nodule CT images should be transformed from gray to colored. After normalization of the whole data set, the color images were generated by the picked colors corresponding to the distribution of the entire range of gray images. The color-transforming operations of three examples are shown in Fig. 4.

Fig. 4 — The color transforming operations of three examples

To keep the overall consistency of the datasets and follow the rule of natural distribution, we performed a preprocessing operation on the color images by scaling the whole image sets to satisfy the normal distribution with the mean value being zero and the standard deviation being a unit norm.

VGGNet Training and Transfer Learning

As stated above, the VGG16 Network was implemented in the TensorFlow framework, and the configurations corresponding to 8, 11, 13, and 16 transfer convolutional layers were tested and compared. All the parameters of the CNN model were initialized with the ones trained by the public large-scale ImageNet datasets of nature images with 1000 classes, which were trained for 200 epochs with the mini-batch size of 16 image instances after initialization. The other hyper-parameters of the CNN model are the learning rate: 0.001 without decreasing; the convergence: 1000 steps; and the omitted probability: 0.5. The parameters of the CNN model were fine-tuned by the training with our small dataset from the 68 PNs on NVidia Quadro P2000 GPUs.

To discover the performance contribution from different model layers, we used a visualization method [40]. Figure 5 shows the feature maps of each convolutional layer extracted from two examples containing one benign and one malignant nodule. In order to make the observations clearer, we zoomed each feature map to the same size.

Fig. 5 — Visualization of different convolutional layer (cov-layer) feature maps of the example nodules (one malignant and one benign)

Classification and Evaluation Experiments

Because the VGG16 implemented in this study is based on 2D images, and each nodule usually contains a different number of slices, thus, it is necessary to consider how to select the image slices of each nodule for training and testing to avoid information bias. In our dataset, the smallest nodule contains 3 slices, the largest one contains 49 slices, and the average slice number of each nodule is 16 slices approximately. Therefore, we make a principle for image selection that only one slice is selected randomly from each nodule for every training and testing experiment. Thus, no matter how big or small of one nodule, the weight of each nodule will be the same. To get robust results, we had hundred random selections to form the training and testing sets. After the slice selection and dataset preparation, leave-one-out and 10-fold cross validation methods were used to measure the efficiency of the VGG16 architecture plus transfer learning strategy separately. Then the total number of training and testing experiments of leave-one-out method is 68 × 100 = 6800. Considering the possible effects brought by the large bias of 16 negative (benign) and 52 positive (malignant) samples of our dataset, we also added the classification experiments on a dataset formed by 16 negative and 8 positive samples (which were selected randomly) for comparison. For the leave-one-out method, we selected one slice from one nodule randomly to do the verification each time and repeated 50 times to get the best hyper-parameters. For the 10-fold cross validation experiments, our training, verification, and testing sets are divided into 6:2:2.

Traditional Classification on Texture Features as Baseline for Comparison

For comparison purpose, we implemented a traditional machine learning method as a baseline reference. This traditional method uses the SVM classifier to train and test statistical texture features of the 68 pathologically proven PNs. The 2D and 3D texture features were calculated from the gray-tone spatial-dependence co-occurrence matrices (GTSDMs) of four, five, and thirteen directions for the center image and volume data [13, 16]. The features were sent to the SVM classifier for training and testing. The training and testing sets were formed randomly by half of all the nodules respectively for each time. And we also randomized the training and testing process for 100 times. This traditional classification method was tested on the large public LIDC-IDRI database, and the outcome was excellent as reported in our prior study [16].

Results

Three receiver operating characteristics (ROC) curves of the traditional SVM classifications on the GTSDM texture features calculated from the 4, 5, and 13 directions of 68 nodules were drawn in Fig. 6. The mean and standard deviation values of the area under the average ROC curve (AUC) measures for 100 training and testing times are shown in Table 2. Each time, half of the dataset was selected randomly as the training set, and the other half was used as the testing set. The average accuracies based on the texture features calculated from the GTSDMs of 4, 5, and 13 directions are all 76.47%.

Fig. 6 — The ROC curves of the three types of texture features from 4, 5, and 13 directions

Table 2.

The AUC information about the performances of the three types of texture features

	4 Directions	5 Directions	13 Directions
Mean	0.5492	0.5462	0.5971
Standard deviation	0.0722	0.0551	0.0369

Open in a new tab

The outcomes from the experiments of evaluating the presented machine deep transfer learning of 68 nodules are shown by Fig. 7 and reported in Table 3, where Fig. 7 shows the average ROC curves of 100 random selections for leave-one-out training and testing and Table 3 lists the mean and standard deviation (SD) values of the AUC measures. The average accuracies of 8, 11, 13, and 16 transfer learning layers are 85.32%, 85.76%, 71.13%, and 70.69%, respectively.

Table 3.

The AUC information about the performances of the four transfer layers

	8 Layers	11 Layers	13 Layers	16 Layers
Mean	0.7712	0.7435	0.7375	0.6762
Standard deviation	0.0591	0.0834	0.0776	0.1216

Open in a new tab

The confusion matrices for the 68 nodules (16 benign and 52 malignance) classified by VGG16 with different transfer convolutional layers are shown in Table 4. The statistical numbers of prediction nodules are the sum of ten classification experiments of 10-fold cross validation. For each classification experiment, there are totally 7 test samples (5 malignant and 2 benign).

Table 4.

Confusion matrices for 68 nodules (52 malignant and 16 benign) by VGG16 with different transfer layers

Actual	Prediction				Prediction
	8 Layers	Benign	Malignant	Actual	11 Layers	Benign	Malignant
	Benign	16	4		Benign	16	4
	Malignant	14	36		Malignant	9	41
Actual	Prediction				Prediction
	13 Layers	Benign	Malignant	Actual	16 Layers	Benign	Malignant
	Benign	10	10		Benign	12	8
	Malignant	11	39		Malignant	12	38

Open in a new tab

The classification measures of VGG16 with different transfer convolutional layers on the 68 nodules are shown in Table 5.

Table 5.

The classification measures of VGG16 with different transfer convolutional layers on 68 nodules

	Sensitivity	Specificity	Precision	F1-Score	Accuracy
8 Layers	0.5333	0.9000	0.8000	0.6400	0.7428
11 Layers	0.6400	0.9111	0.8000	0.7111	0.8142
13 Layers	0.4762	0.7959	0.5000	0.4878	0.7000
16 Layers	0.5000	0.8261	0.6000	0.5455	0.7142

Open in a new tab

To further investigate the performance of the presented classification method on different distributions of benign and malignant nodules in our dataset, a distribution of 16 benign and 8 malignant nodules was selected randomly from the 68 nodules (16 benign and 52 malignant) and classified by the presented transfer learning VGG16 architectures. We did 10-fold cross validation for 5 times, and over-fitting occurred for 2 times. Finally, we counted the average of the results of three classifications when eliminating the two over-fittings. The confusion matrices for the subset 24 nodules classified by VGG16 with different transfer convolutional layers are shown in Table 6. The statistical numbers of prediction nodules are the sum of ten classification experiments of 10-fold cross validation. For each classification experiment, there are totally 3 test samples (1 malignant and 2 benign).

Table 6.

Confusion matrices for 24 nodules (8 malignant and 16 benign) by VGG16 with different transfer layers

Actual	Prediction				Prediction
	8 Layers	Benign	Malignant	Actual	11 Layers	Benign	Malignant
	Benign	18	2		Benign	19	1
	Malignant	3	7		Malignant	2	8
Actual	Prediction				Prediction
	13 Layers	Benign	Malignant	Actual	16 Layers	Benign	Malignant
	Benign	18	2		Benign	19	1
	Malignant	4	6		Malignant	5	5

Open in a new tab

The classification measures of VGG16 with different transfer convolutional layers on the subset 24 nodules are shown in Table 7.

Table 7.

The classification measures of VGG16 with different transfer convolutional layers on 24 nodules

	Sensitivity	Specificity	Precision	F1-Score	Accuracy
8 Layers	0.8571	0.7778	0.9000	0.8780	0.8333
11 Layers	0.9048	0.8889	0.9500	0.9268	0.9000
13 Layers	0.8182	0.7500	0.9000	0.8571	0.8000
16 Layers	0.7917	0.8333	0.9500	0.8636	0.8000

Open in a new tab

It is interesting to see the different classification performances from the benign/malignant ratio (16:8) of the subset 24 nodules and the benign/malignant ratio (16:52) of the original 68 nodules. The sensitivity and specificity go different directions when the ratio goes different directions. This interesting observation indicates that both the sample size and ratio may affect the prediction outcome, because both contribute to the parameters fitting in the CNN model. Despite the different performances from different ratios, the presented transfer learning shows the benefit to the CNN performance.

Discussion

The performance of the adapted CNN model with the transfer learning strategy was investigated in this study for differentiating pulmonary nodules from a small sample size (68 nodules) and unbalanced dataset (16 benign and 52 malignant) with pathological reports as the ground truth. Because of the small sample size of PNs, we adopted the parameters from successful machine learning of the large natural image dataset – the ImageNet as the initial parameters, and tested the feasibility of transfer learning from the ImageNet to our small PN medical images based on a popular CNN model – the VGG16 architecture. We adopted the best architecture with 16 layers (as reported in references [31]) and initialized the parameters the presented machine deep learning architecture by the training from the ImageNet database, followed by refining or optimizing the parameters of the presented architecture with transferring learning on our PN dataset. From the experimental outcomes, we found that the VGG16 architecture with the presented transfer learning strategy is efficient for classification of small, unbalanced, and pathological-proven benign and malignant nodule dataset from routine medical CT images in terms of the ROC curves, AUC measures, confusion matrices, and other classification measures, such as accuracy, sensitivity, specificity, precision, and F1-score. The experiment results showed improvement over the traditional machine learning on the GTSDM texture features.

As mentioned in the introduction section above, there are multiple designs of the VGG16 architecture, which can be divided by the width of the convolutional layers, such as 64, 128, 256, and 512. According to these different designs, there are different repeated convolutional layers before each max-pool layer. Each of these multiple designs will encounter the difficulty of determining the corresponding complexity degree of the transfer learning features at different levels. To avoid this difficulty, we took the logic of simplifying the transfer learning features by reducing the initial parameters of repeated convolutional layers. Then we compared the accuracy, sensitivity, specificity, F1-score, precision, the average ROC curves, and the AUC values by evaluating different transfer learning options. By the experiments, we found that not the more the repeated convolutional layers, the better the classification performance. The best performance to which the transfer convolutional layers can achieve may be relying on the dataset, so it would be a learning process to find the adequate transfer convolutional layers for a particular application.

To get more robust and reliable evaluation results, we performed a large number of experiments for high statistical power. All of the CT image slices covering the entire volume of each nodule were included in the random selections, and each image slice of each nodule was selected with equal possibilities for each model training and testing. This selection and combination of training and testing sets were repeated for 100 times. Thus, we obtained 100 combinations of training and testing images of 68 nodules in total. For each combination, we used the leave-one-out validation method to evaluate the performance of the presented VGG16 with transfer learning the parameters of different convolutional layers for the nodule classification. It is expected that these experiments would take very long time. It is worthy to see that the results are very robust. We also did 10-fold cross validation experiments on the original 68 (52 malignant and 16 benign) and a subset of 24 (8 malignant and 16 benign) nodules, respectively, for comparison of different malignant rates of the datasets. The classification results were shown by confusion matrices and five classification measures, which can also give eloquent proof of the good performance of the transfer learning VGG16 scheme proposed in this paper. It is interesting to see the different classification performances from the benign/malignant ratio (16:8) of the subset 24 nodules and the benign/malignant ratio (16:52) of the original 68 nodules. This interesting observation indicates that both the sample size and ratio may affect the prediction outcome, because both contribute to the parameters fitting in the CNN model. Despite the different performances from different ratios, the presented transfer learning shows the benefit to the CNN performance.

Conclusion

In summary, we have performed experiments to prove that the CNN model can benefit from initialization with features learnt from natural images for the task of distinguishing benign and malignant PNs in medical CT images even the dataset is very small and unbalanced. The benefit was quantified by the use of the VGG16 architecture with initial feature learning from the large public nature image database of ImageNet and then transfer learning from a small and unbalanced medical image dataset. The classification performance for predicting the unnecessary biopsies of 16 benign nodules from the small medical image dataset of pathologically proven 68 nodules (including 52 malignant) is satisfactorily with noticeable improvement over the traditional machine learning method. In addition, the satisfactory performance on the benign/malignant ratio (16:52) was also seen for a different benign/malignant ratio (16:8). Therefore, we can have a conclusion that the CNN model with transfer learning has the ability to classify small and unbalanced medical image dataset with the potential to outperform the traditional machine learning method. Furthermore, the experiments revealed that the features between the medical and the natural images share the similarity of focusing on the simpler and less-abstract objects, leading to the experimental outcomes of not the more repeated convolutional layers, the better the classification results. Because of the limitation on data and computing resource, the questions on how the present method would work on larger datasets and how the complexity of the convolutional transfer learning would affect the classification performance remain open and will be our future research topics.

Funding Information

This work was supported by the National Natural Science Foundation of China under Grant No. 81671773, 61672146, and 61802055, the Fundamental Research Funds for the Central Universities of China under grant N151903001, N171902001 and N171903003, and Natural Science Foundation of Liaoning Province of China 20180550182. Zhengrong Liang was supported by NIH Grant #CA206171 of the National Cancer Institute, USA.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Fangfang Han, Email: hanff1@smu.edu.cn.

Linkai Yan, Email: link980417@gmail.com.

Junxin Chen, Email: chenjx@bmie.neu.edu.cn.

Yueyang Teng, Email: tengyy@bmie.neu.edu.cn.

Shuo Chen, Email: chenshuo@bmie.neu.edu.cn.

Shouliang Qi, Email: qisl@bmie.neu.edu.cn.

Wei Qian, Email: wqian@utep.edu.

Jie Yang, Email: jie.yang@stonybrookmedicine.edu.

William Moore, Email: William.Moore@nyumc.org.

Shu Zhang, Email: Shu.Zhang@stonybrookmedicine.edu.

Zhengrong Liang, Email: jerome.liang@sunysb.edu.

References

1.Siegel Rebecca L., Miller Kimberly D., Jemal Ahmedin. Cancer statistics, 2016. CA: A Cancer Journal for Clinicians. 2016;66(1):7–30. doi: 10.3322/caac.21332. [DOI] [PubMed] [Google Scholar]
2.Trama Annalisa, Botta Laura, Foschi Roberto, Ferrari Andrea, Stiller Charles, Desandes Emmanuel, Maule Milena Maria, Merletti Franco, Gatta Gemma. Survival of European adolescents and young adults diagnosed with cancer in 2000–07: population-based data from EUROCARE-5. The Lancet Oncology. 2016;17(7):896–906. doi: 10.1016/S1470-2045(16)00162-5. [DOI] [PubMed] [Google Scholar]
3.Allemani Claudia, Matsuda Tomohiro, Di Carlo Veronica, Harewood Rhea, Matz Melissa, Nikšić Maja, Bonaventure Audrey, Valkov Mikhail, Johnson Christopher J, Estève Jacques, Ogunbiyi Olufemi J, Azevedo e Silva Gulnar, Chen Wan-Qing, Eser Sultan, Engholm Gerda, Stiller Charles A, Monnereau Alain, Woods Ryan R, Visser Otto, Lim Gek Hsiang, Aitken Joanne, Weir Hannah K, Coleman Michel P, Bouzbid S, Hamdi-Chérif M, Zaidi Z, Meguenni K, Regagba D, Bayo S, Cheick Bougadari T, Manraj S S, Bendahhou K, Fabowale A, Bradshaw D, Somdyala N I M, Kumcher I, Moreno F, Calabrano G H, Espinola S B, Carballo Quintero B, Fita R, Diumenjo M C, Laspada W D, Ibañez S G, Lima C A, De Souza P C F, Del Pino K, Laporte C, Curado M P, de Oliveira J C, Veneziano C L A, Veneziano D B, Latorre M R D O, Tanaka L F, Rebelo M S, Santos M O, Galaz J C, Aparicio Aravena M, Sanhueza Monsalve J, Herrmann D A, Vargas S, Herrera V M, Uribe C J, Bravo L E, Garcia L S, Arias-Ortiz N E, Morantes D, Jurado D M, Yépez Chamorro M C, Delgado S, Ramirez M, Galán Alvarez Y H, Torres P, Martínez-Reyes F, Jaramillo L, Quinto R, Castillo J, Mendoza M, Cueva P, Yépez J G, Bhakkan B, Deloumeaux J, Joachim C, Macni J, Carrillo R, Shalkow Klincovstein J, Rivera Gomez R, Poquioma E, Tortolero-Luna G, Zavala D, Alonso R, Barrios E, Eckstrand A, Nikiforuk C, Noonan G, Turner D, Kumar E, Zhang B, McCrate F R, Ryan S, MacIntyre M, Saint-Jacques N, Nishri D E, McClure C A, Vriends K A, Kozie S, Stuart-Panko H, Freeman T, George J T, Brockhouse J T, O'Brien D K, Holt A, Almon L, Kwong S, Morris C, Rycroft R, Mueller L, Phillips C E, Brown H, Cromartie B, Schwartz A G, Vigneau F, Levin G M, Wohler B, Bayakly R, Ward K C, Gomez S L, McKinley M, Cress R, Green M D, Miyagi K, Ruppert L P, Lynch C F, Huang B, Tucker T C, Deapen D, Liu L, Hsieh M C, Wu X C, Schwenn M, Gershman S T, Knowlton R C, Alverson G, Copeland G E, Bushhouse S, Rogers D B, Jackson-Thompson J, Lemons D, Zimmerman H J, Hood M, Roberts-Johnson J, Rees J R, Riddle B, Pawlish K S, Stroup A, Key C, Wiggins C, Kahn A R, Schymura M J, Radhakrishnan S, Rao C, Giljahn L K, Slocumb R M, Espinoza R E, Khan F, Aird K G, Beran T, Rubertone J J, Slack S J, Garcia L, Rousseau D L, Janes T A, Schwartz S M, Bolick S W, Hurley D M, Whiteside M A, Miller-Gianturco P, Williams M A, Herget K, Sweeney C, Johnson A T, Keitheri Cheteri M B, Migliore Santiago P, Blankenship S E, Farley S, Borchers R, Malicki R, Espinoza J R, Grandpre J, Wilson R, Edwards B K, Mariotto A, Lei Y, Wang N, Chen J S, Zhou Y, He Y T, Song G H, Gu X P, Mei D, Mu H J, Ge H M, Wu T H, Li Y Y, Zhao D L, Jin F, Zhang J H, Zhu F D, Junhua Q, Yang Y L, Jiang C X, Biao W, Wang J, Li Q L, Yi H, Zhou X, Dong J, Li W, Fu F X, Liu S Z, Chen J G, Zhu J, Li Y H, Lu Y Q, Fan M, Huang S Q, Guo G P, Zhaolai H, Wei K, Zeng H, Demetriou A V, Mang W K, Ngan K C, Kataki A C, Krishnatreya M, Jayalekshmi P A, Sebastian P, Nandakumar A, Malekzadeh R, Roshandel G, Keinan-Boker L, Silverman B G, Ito H, Nakagawa H, Sato M, Tobori F, Nakata I, Teramoto N, Hattori M, Kaizaki Y, Moki F, Sugiyama H, Utada M, Nishimura M, Yoshida K, Kurosawa K, Nemoto Y, Narimatsu H, Sakaguchi M, Kanemura S, Naito M, Narisawa R, Miyashiro I, Nakata K, Sato S, Yoshii M, Oki I, Fukushima N, Shibata A, Iwasa K, Ono C, Nimri O, Jung K W, Won Y J, Alawadhi E, Elbasmi A, Ab Manan A, Adam F, Sanjaajmats E, Tudev U, Ochir C, Al Khater A M, El Mistiri M M, Teo Y Y, Chiang C J, Lee W C, Buasom R, Sangrajrang S, Kamsa-ard S, Wiangnon S, Daoprasert K, Pongnikorn D, Leklob A, Sangkitipaiboon S, Geater S L, Sriplung H, Ceylan O, Kög I, Dirican O, Köse T, Gurbuz T, Karaşahin F E, Turhan D, Aktaş U, Halat Y, Yakut C I, Altinisik M, Cavusoglu Y, Türkköylü A, Üçüncü N, Hackl M, Zborovskaya A A, Aleinikova O V, Henau K, Van Eycken L, Valerianova Z, Yordanova M R, Šekerija M, Dušek L, Zvolský M, Storm H, Innos K, Mägi M, Malila N, Seppä K, Jégu J, Velten M, Cornet E, Troussard X, Bouvier A M, Guizard A V, Bouvier V, Launoy G, Arveux P, Maynadié M, Mounier M, Woronoff A S, Daoulas M, Robaszkiewicz M, Clavel J, Goujon S, Lacour B, Baldi I, Pouchieu C, Amadeo B, Coureau G, Orazio S, Preux P M, Rharbaoui F, Marrer E, Trétarre B, Colonna M, Delafosse P, Ligier K, Plouvier S, Cowppli-Bony A, Molinié F, Bara S, Ganry O, Lapôtre-Ledoux B, Grosclaude P, Bossard N, Uhry Z, Bray F, Piñeros M, Stabenow R, Wilsdorf-Köhler H, Eberle A, Luttmann S, Löhden I, Nennecke A L, Kieschke J, Sirri E, Emrich K, Zeissig S R, Holleczek B, Eisemann N, Katalinic A, Asquez R A, Kumar V, Petridou E, Ólafsdóttir E J, Tryggvadóttir L, Clough-Gorr K, Walsh P M, Sundseth H, Mazzoleni G, Vittadello F, Coviello E, Cuccaro F, Galasso R, Sampietro G, Giacomin A, Magoni M, Ardizzone A, D'Argenzio A, Castaing M, Grosso G, Lavecchia A M, Sutera Sardo A, Gola G, Gatti L, Ricci P, Ferretti S, Serraino D, Zucchetto A, Celesia M V, Filiberti R A, Pannozzo F, Melcarne A, Quarta F, Russo A G, Carrozzi G, Cirilli C, Cavalieri d'Oro L, Rognoni M, Fusco M, Vitale M F, Usala M, Cusimano R, Mazzucco W, Michiara M, Sgargi P, Boschetti L, Borciani E, Seghini P, Maule M M, Merletti F, Tumino R, Mancuso P, Vicentini M, Cassetti T, Sassatelli R, Falcini F, Giorgetti S, Caiazzo A L, Cavallo R, Cesaraccio R, Pirino D R, Contrino M L, Tisano F, Fanetti A C, Maspero S, Carone S, Mincuzzi A, Candela G, Scuderi T, Gentilini M A, Piffer S, Rosso S, Barchielli A, Caldarella A, Bianconi F, Stracci F, Contiero P, Tagliabue G, Rugge M, Zorzi M, Beggiato S, Brustolin A, Berrino F, Gatta G, Sant M, Buzzoni C, Mangone L, Capocaccia R, De Angelis R, Zanetti R, Maurina A, Pildava S, Lipunova N, Vincerževskiené I, Agius D, Calleja N, Siesling S, Larønningen S, Møller B, Dyzmann-Sroka A, Trojanowski M, Góźdź S, Mężyk R, Mierzwa T, Molong L, Rachtan J, Szewczyk S, Błaszczyk J, Kępska K, Kościańska B, Tarocińska K, Zwierko M, Drosik K, Maksimowicz K M, Purwin-Porowska E, Reca E, Wójcik-Tomaszewska J, Tukiendorf A, Grądalska-Lampart M, Radziszewska A U, Gos A, Talerczyk M, Wyborska M, Didkowska J A, Wojciechowska U, Bielska-Lasota M, Forjaz de Lacerda G, Rego R A, Bastos J, Silva M A, Antunes L, Laranja Pontes J, Mayer-da-Silva A, Miranda A, Blaga L M, Coza D, Gusenkova L, Lazarevich O, Prudnikova O, Vjushkov D M, Egorova A G, Orlov A E, Kudyakov L A, Pikalova L V, Adamcik J, Safaei Diba C, Primic-Žakelj M, Zadnik V, Larrañaga N, Lopez de Munain A, Herrera A A, Redondas R, Marcos-Gragera R, Vilardell Gil M L, Molina E, Sánchez Perez M J, Franch Sureda P, Ramos Montserrat M, Chirlaque M D, Navarro C, Ardanaz E E, Guevara M M, Fernández-Delgado R, Peris-Bonet R, Carulla M, Galceran J, Alberich C, Vicente-Raneda M, Khan S, Pettersson D, Dickman P, Avelina I, Staehelin K, Camey B, Bouchardy C, Schaffar R, Frick H, Herrmann C, Bulliard J L, Maspoli-Conconi M, Kuehni C E, Redmond S M, Bordoni A, Ortelli L, Chiolero A, Konzelmann I, Matthes K L, Rohrmann S, Broggio J, Rashbass J, Fitzpatrick D, Gavin A, Clark D I, Deas A J, Huws D W, White C, Montel L, Rachet B, Turculet A D, Stephens R, Chalker E, Phung H, Walton R, You H, Guthridge S, Johnson F, Gordon P, D'Onise K, Priest K, Stokes B C, Venn A, Farrugia H, Thursfield V, Dowling J, Currow D, Hendrix J, Lewis C. Global surveillance of trends in cancer survival 2000–14 (CONCORD-3): analysis of individual records for 37 513 025 patients diagnosed with one of 18 cancers from 322 population-based registries in 71 countries. The Lancet. 2018;391(10125):1023–1075. doi: 10.1016/S0140-6736(17)33326-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.American Cancer Society: Cancer facts & figures 2019. Atlanta: American Cancer Society, 2019. Available at https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancer-facts-figures-2019.html
5.Siegel Rebecca L., Miller Kimberly D., Jemal Ahmedin. Cancer statistics, 2019. CA: A Cancer Journal for Clinicians. 2019;69(1):7–34. doi: 10.3322/caac.21551. [DOI] [PubMed] [Google Scholar]
6.Gary Clayman. Thyroid nodules: Hyperthyroidism and thyroid Cancer. Endocrineweb, November 27th, 2018. Available at https://www.endocrineweb.com/conditions/thyroid/thyroid-nodules
7.Iwano Shingo, Nakamura Tatsuya, Kamioka Yuko, Ikeda Mitsuru, Ishigaki Takeo. Computer-aided differentiation of malignant from benign solitary pulmonary nodules imaged by high-resolution CT. Computerized Medical Imaging and Graphics. 2008;32(5):416–422. doi: 10.1016/j.compmedimag.2008.04.001. [DOI] [PubMed] [Google Scholar]
8.Saito Hajime, Minamiya Yoshihiro, Kawai Hideki, Nakagawa Taku, Ito Manabu, Hosono Yukiko, Motoyama Satoru, Hashimoto Manabu, Ishiyama Koichi, Ogawa Jun-ichi. Usefulness of circumference difference for estimating the likelihood of malignancy in small solitary pulmonary nodules on CT. Lung Cancer. 2007;58(3):348–354. doi: 10.1016/j.lungcan.2007.06.018. [DOI] [PubMed] [Google Scholar]
9.El-Baz Ayman, Nitzken Matthew, Khalifa Fahmi, Elnakib Ahmed, Gimel’farb Georgy, Falk Robert, El-Ghar Mohammed Abo. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg; 2011. 3D Shape Analysis for Early Diagnosis of Malignant Lung Nodules; pp. 772–783. [Google Scholar]
10.El–Baz A, Nitzken M, Vanbogaert E, Gimel’farb G, Falk R, El-Ghar MA: A novel shape-based diagnostic approach for early diagnosis of lung nodules. 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Chicago, IL, USA, 30 March-2 April, 2011. 10.1109/ISBI.2011.5872373
11.Yankelevitz David F., Reeves Anthony P., Kostis William J., Zhao Binsheng, Henschke Claudia I. Small Pulmonary Nodules: Volumetrically Determined Growth Rates Based on CT Evaluation. Radiology. 2000;217(1):251–256. doi: 10.1148/radiology.217.1.r00oc33251. [DOI] [PubMed] [Google Scholar]
12.Kostis W.J., Reeves A.P., Yankelevitz D.F., Henschke C.I. Three-dimensional segmentation and growth-rate estimation of small pulmonary nodules in helical ct images. IEEE Transactions on Medical Imaging. 2003;22(10):1259–1274. doi: 10.1109/TMI.2003.817785. [DOI] [PubMed] [Google Scholar]
13.Prasanna P, Tiwari P, Madabhushi A: Co-occurrence of local anisotropic gradient orientations (CoL1AGe): A new radiomics descriptor. Scientific Reports 6:37241, 2016. 10.1038/srep37241 [DOI] [PMC free article] [PubMed]
14.Dalal N, Triggs B: Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20-25 June, 2005. 10.1109/CVPR.2005.177 [DOI]
15.Ojala T., Pietikainen M., Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002;24(7):971–987. doi: 10.1109/TPAMI.2002.1017623. [DOI] [Google Scholar]
16.Han Fangfang, Wang Huafeng, Zhang Guopeng, Han Hao, Song Bowen, Li Lihong, Moore William, Lu Hongbing, Zhao Hong, Liang Zhengrong. Texture Feature Analysis for Computer-Aided Diagnosis on Pulmonary Nodules. Journal of Digital Imaging. 2014;28(1):99–115. doi: 10.1007/s10278-014-9718-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Wang Huafeng, Zhao Tingting, Li Lihong Connie, Pan Haixia, Liu Wanquan, Gao Haoqi, Han Fangfang, Wang Yuehai, Qi Yifan, Liang Zhengrong. A hybrid CNN feature model for pulmonary nodule malignancy risk differentiation. Journal of X-Ray Science and Technology. 2018;26(2):171–187. doi: 10.3233/XST-17302. [DOI] [PubMed] [Google Scholar]
18.Keshani Mohsen, Azimifar Zohreh, Tajeripour Farshad, Boostani Reza. Lung nodule segmentation and recognition using SVM classifier and active contour modeling: A complete intelligent system. Computers in Biology and Medicine. 2013;43(4):287–300. doi: 10.1016/j.compbiomed.2012.12.004. [DOI] [PubMed] [Google Scholar]
19.Tartar A, Kilic N, Akan A: A new method for pulmonary nodule detection using decision trees. 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Osaka, Japan, 3–7 July 2013. 10.1109/EMBC.2013.6611257 [DOI] [PubMed]
20.Lee S.L.A., Kouzani A.Z., Hu E.J. Random forest based lung nodule classification aided by clustering. Computerized Medical Imaging and Graphics. 2010;34(7):535–542. doi: 10.1016/j.compmedimag.2010.03.006. [DOI] [PubMed] [Google Scholar]
21.Yann L: LeNet-5, convolutional neural networks . NY, USA. [Online], 2013 Available at https://yann.lecun.com/exdb/lenet/
22.Zhang T, Zhao J, Luo J, Qiang Y: Deep belief network for lung nodules diagnosed in CT imaging. International Journal of Performability Engineering 13(8):1358–1370, 2017. 10.23940/ijpe.17.08.p17.13581370
23.Cheng J, Ni D, Chou Y, Qin J, Tiu C, Chang Y, Huang C, Shen D, Chen C: Computer-aided diagnosis with deep learning architecture: Applications to breast lesions in US images and pulmonary nodules in CT scans. Scientific Reports 6:24454, 2016. 10.1038/srep24454 [DOI] [PMC free article] [PubMed]
24.Song QingZeng, Zhao Lei, Luo XingKe, Dou XueChen. Using Deep Learning for Classification of Lung Nodules on Computed Tomography Images. Journal of Healthcare Engineering. 2017;2017:1–7. doi: 10.1155/2017/8314740. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Sun Wenqing, Zheng Bin, Qian Wei. Automatic feature learning using multichannel ROI based on deep structured algorithms for computerized lung cancer diagnosis. Computers in Biology and Medicine. 2017;89:530–539. doi: 10.1016/j.compbiomed.2017.04.006. [DOI] [PubMed] [Google Scholar]
26.Armato Samuel G., McLennan Geoffrey, Bidaut Luc, McNitt-Gray Michael F., Meyer Charles R., Reeves Anthony P., Zhao Binsheng, Aberle Denise R., Henschke Claudia I., Hoffman Eric A., Kazerooni Ella A., MacMahon Heber, van Beek Edwin J. R., Yankelevitz David, Biancardi Alberto M., Bland Peyton H., Brown Matthew S., Engelmann Roger M., Laderach Gary E., Max Daniel, Pais Richard C., Qing David P.-Y., Roberts Rachael Y., Smith Amanda R., Starkey Adam, Batra Poonam, Caligiuri Philip, Farooqi Ali, Gladish Gregory W., Jude C. Matilda, Munden Reginald F., Petkovska Iva, Quint Leslie E., Schwartz Lawrence H., Sundaram Baskaran, Dodd Lori E., Fenimore Charles, Gur David, Petrick Nicholas, Freymann John, Kirby Justin, Hughes Brian, Vande Casteele Alessi, Gupte Sangeeta, Sallam Maha, Heath Michael D., Kuhn Michael H., Dharaiya Ekta, Burns Richard, Fryd David S., Salganicoff Marcos, Anand Vikram, Shreter Uri, Vastagh Stephen, Croft Barbara Y., Clarke Laurence P. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans. Medical Physics. 2011;38(2):915–931. doi: 10.1118/1.3528204. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Jacobs C, Setio AAA, Traverso A, Ginneken BV: Lung nodule analysis 2016. [Online], 2016. Available at https://luna16.grand-challenge.org/home/
28.Shin Hoo-Chang, Roth Holger R., Gao Mingchen, Lu Le, Xu Ziyue, Nogues Isabella, Yao Jianhua, Mollura Daniel, Summers Ronald M. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Transactions on Medical Imaging. 2016;35(5):1285–1298. doi: 10.1109/TMI.2016.2528162. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Hosny KM, Kassem MA, Foaud MM: Skin cancer classification using deep learning and transfer learning. 9th Cairo International Biomedical Engineering Conference (CIBEC2018), Cairo, Egypt, December 20–22, 2018. 10.1109/CIBEC.2018.8641762
30.Hosny Khalid M., Kassem Mohamed A., Foaud Mohamed M. Classification of skin lesions using transfer learning and augmentation with Alex-net. PLOS ONE. 2019;14(5):e0217293. doi: 10.1371/journal.pone.0217293. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Krizhevsky Alex, Sutskever Ilya, Hinton Geoffrey E. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 2017;60(6):84–90. doi: 10.1145/3065386. [DOI] [Google Scholar]
32.Simonyan K and Zisserman A: Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations 2015, San Diego, CA, May 7–9, 2015. Available at https://arxiv.org/pdf/1409.1556.pdf
33.Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, and Rabinovich A: Going deeper with convolutions. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, June 7-12, 2015:1–9. 10.1109/cvpr.2015.7298594
34.He K, Zhang X, Ren S, Sun J: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 27-30, 2016. 10.1109/CVPR.2016.90
35.Pan Sinno Jialin, Yang Qiang. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering. 2010;22(10):1345–1359. doi: 10.1109/TKDE.2009.191. [DOI] [Google Scholar]
36.Glorot X, Bengio Y: Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:249–256, 2010. Available at http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
37.Hinton GE, Srivastava N, Krizhevsky A, Sutskever I and Salakhutdinov RR: Improving neural networks by preventing co-adaptation of feature detectors. [Online]. arXiv: 1207. 0580 [cs. NE], 2012. Available at https://arxiv.org/pdf/1207.0580.pdf
38.Ioffe S, Szegedy C: Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning, July 2015, 37:448–456. Available at http://arxiv.org/abs/1502.03167.pdf
39.Yosinski J, Clune J, Bengio Y, Lipson H: How transferable are features in deep neural networks? Proceedings of the 27th International Conference on Neural Information Processing Systems, December 2014, 2:3320-3328. Available at https://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf
40.Zeiler MD, Fergus R: Visualizing and understanding convolutional networks. European Conference on Computer Vision, Zurich, Switzerland 2014:818–833. 10.1007/978-3-319-10590-1-53

[CR1] 1.Siegel Rebecca L., Miller Kimberly D., Jemal Ahmedin. Cancer statistics, 2016. CA: A Cancer Journal for Clinicians. 2016;66(1):7–30. doi: 10.3322/caac.21332. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Trama Annalisa, Botta Laura, Foschi Roberto, Ferrari Andrea, Stiller Charles, Desandes Emmanuel, Maule Milena Maria, Merletti Franco, Gatta Gemma. Survival of European adolescents and young adults diagnosed with cancer in 2000–07: population-based data from EUROCARE-5. The Lancet Oncology. 2016;17(7):896–906. doi: 10.1016/S1470-2045(16)00162-5. [DOI] [PubMed] [Google Scholar]

[CR4] 4.American Cancer Society: Cancer facts & figures 2019. Atlanta: American Cancer Society, 2019. Available at https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancer-facts-figures-2019.html

[CR5] 5.Siegel Rebecca L., Miller Kimberly D., Jemal Ahmedin. Cancer statistics, 2019. CA: A Cancer Journal for Clinicians. 2019;69(1):7–34. doi: 10.3322/caac.21551. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Gary Clayman. Thyroid nodules: Hyperthyroidism and thyroid Cancer. Endocrineweb, November 27th, 2018. Available at https://www.endocrineweb.com/conditions/thyroid/thyroid-nodules

[CR7] 7.Iwano Shingo, Nakamura Tatsuya, Kamioka Yuko, Ikeda Mitsuru, Ishigaki Takeo. Computer-aided differentiation of malignant from benign solitary pulmonary nodules imaged by high-resolution CT. Computerized Medical Imaging and Graphics. 2008;32(5):416–422. doi: 10.1016/j.compmedimag.2008.04.001. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Saito Hajime, Minamiya Yoshihiro, Kawai Hideki, Nakagawa Taku, Ito Manabu, Hosono Yukiko, Motoyama Satoru, Hashimoto Manabu, Ishiyama Koichi, Ogawa Jun-ichi. Usefulness of circumference difference for estimating the likelihood of malignancy in small solitary pulmonary nodules on CT. Lung Cancer. 2007;58(3):348–354. doi: 10.1016/j.lungcan.2007.06.018. [DOI] [PubMed] [Google Scholar]

[CR9] 9.El-Baz Ayman, Nitzken Matthew, Khalifa Fahmi, Elnakib Ahmed, Gimel’farb Georgy, Falk Robert, El-Ghar Mohammed Abo. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg; 2011. 3D Shape Analysis for Early Diagnosis of Malignant Lung Nodules; pp. 772–783. [Google Scholar]

[CR10] 10.El–Baz A, Nitzken M, Vanbogaert E, Gimel’farb G, Falk R, El-Ghar MA: A novel shape-based diagnostic approach for early diagnosis of lung nodules. 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Chicago, IL, USA, 30 March-2 April, 2011. 10.1109/ISBI.2011.5872373

[CR11] 11.Yankelevitz David F., Reeves Anthony P., Kostis William J., Zhao Binsheng, Henschke Claudia I. Small Pulmonary Nodules: Volumetrically Determined Growth Rates Based on CT Evaluation. Radiology. 2000;217(1):251–256. doi: 10.1148/radiology.217.1.r00oc33251. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Kostis W.J., Reeves A.P., Yankelevitz D.F., Henschke C.I. Three-dimensional segmentation and growth-rate estimation of small pulmonary nodules in helical ct images. IEEE Transactions on Medical Imaging. 2003;22(10):1259–1274. doi: 10.1109/TMI.2003.817785. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Prasanna P, Tiwari P, Madabhushi A: Co-occurrence of local anisotropic gradient orientations (CoL1AGe): A new radiomics descriptor. Scientific Reports 6:37241, 2016. 10.1038/srep37241 [DOI] [PMC free article] [PubMed]

[CR14] 14.Dalal N, Triggs B: Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20-25 June, 2005. 10.1109/CVPR.2005.177 [DOI]

[CR15] 15.Ojala T., Pietikainen M., Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002;24(7):971–987. doi: 10.1109/TPAMI.2002.1017623. [DOI] [Google Scholar]

[CR16] 16.Han Fangfang, Wang Huafeng, Zhang Guopeng, Han Hao, Song Bowen, Li Lihong, Moore William, Lu Hongbing, Zhao Hong, Liang Zhengrong. Texture Feature Analysis for Computer-Aided Diagnosis on Pulmonary Nodules. Journal of Digital Imaging. 2014;28(1):99–115. doi: 10.1007/s10278-014-9718-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Wang Huafeng, Zhao Tingting, Li Lihong Connie, Pan Haixia, Liu Wanquan, Gao Haoqi, Han Fangfang, Wang Yuehai, Qi Yifan, Liang Zhengrong. A hybrid CNN feature model for pulmonary nodule malignancy risk differentiation. Journal of X-Ray Science and Technology. 2018;26(2):171–187. doi: 10.3233/XST-17302. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Keshani Mohsen, Azimifar Zohreh, Tajeripour Farshad, Boostani Reza. Lung nodule segmentation and recognition using SVM classifier and active contour modeling: A complete intelligent system. Computers in Biology and Medicine. 2013;43(4):287–300. doi: 10.1016/j.compbiomed.2012.12.004. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Tartar A, Kilic N, Akan A: A new method for pulmonary nodule detection using decision trees. 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Osaka, Japan, 3–7 July 2013. 10.1109/EMBC.2013.6611257 [DOI] [PubMed]

[CR20] 20.Lee S.L.A., Kouzani A.Z., Hu E.J. Random forest based lung nodule classification aided by clustering. Computerized Medical Imaging and Graphics. 2010;34(7):535–542. doi: 10.1016/j.compmedimag.2010.03.006. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Yann L: LeNet-5, convolutional neural networks . NY, USA. [Online], 2013 Available at https://yann.lecun.com/exdb/lenet/

[CR22] 22.Zhang T, Zhao J, Luo J, Qiang Y: Deep belief network for lung nodules diagnosed in CT imaging. International Journal of Performability Engineering 13(8):1358–1370, 2017. 10.23940/ijpe.17.08.p17.13581370

[CR23] 23.Cheng J, Ni D, Chou Y, Qin J, Tiu C, Chang Y, Huang C, Shen D, Chen C: Computer-aided diagnosis with deep learning architecture: Applications to breast lesions in US images and pulmonary nodules in CT scans. Scientific Reports 6:24454, 2016. 10.1038/srep24454 [DOI] [PMC free article] [PubMed]

[CR24] 24.Song QingZeng, Zhao Lei, Luo XingKe, Dou XueChen. Using Deep Learning for Classification of Lung Nodules on Computed Tomography Images. Journal of Healthcare Engineering. 2017;2017:1–7. doi: 10.1155/2017/8314740. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Sun Wenqing, Zheng Bin, Qian Wei. Automatic feature learning using multichannel ROI based on deep structured algorithms for computerized lung cancer diagnosis. Computers in Biology and Medicine. 2017;89:530–539. doi: 10.1016/j.compbiomed.2017.04.006. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Armato Samuel G., McLennan Geoffrey, Bidaut Luc, McNitt-Gray Michael F., Meyer Charles R., Reeves Anthony P., Zhao Binsheng, Aberle Denise R., Henschke Claudia I., Hoffman Eric A., Kazerooni Ella A., MacMahon Heber, van Beek Edwin J. R., Yankelevitz David, Biancardi Alberto M., Bland Peyton H., Brown Matthew S., Engelmann Roger M., Laderach Gary E., Max Daniel, Pais Richard C., Qing David P.-Y., Roberts Rachael Y., Smith Amanda R., Starkey Adam, Batra Poonam, Caligiuri Philip, Farooqi Ali, Gladish Gregory W., Jude C. Matilda, Munden Reginald F., Petkovska Iva, Quint Leslie E., Schwartz Lawrence H., Sundaram Baskaran, Dodd Lori E., Fenimore Charles, Gur David, Petrick Nicholas, Freymann John, Kirby Justin, Hughes Brian, Vande Casteele Alessi, Gupte Sangeeta, Sallam Maha, Heath Michael D., Kuhn Michael H., Dharaiya Ekta, Burns Richard, Fryd David S., Salganicoff Marcos, Anand Vikram, Shreter Uri, Vastagh Stephen, Croft Barbara Y., Clarke Laurence P. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans. Medical Physics. 2011;38(2):915–931. doi: 10.1118/1.3528204. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Jacobs C, Setio AAA, Traverso A, Ginneken BV: Lung nodule analysis 2016. [Online], 2016. Available at https://luna16.grand-challenge.org/home/

[CR28] 28.Shin Hoo-Chang, Roth Holger R., Gao Mingchen, Lu Le, Xu Ziyue, Nogues Isabella, Yao Jianhua, Mollura Daniel, Summers Ronald M. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Transactions on Medical Imaging. 2016;35(5):1285–1298. doi: 10.1109/TMI.2016.2528162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Hosny KM, Kassem MA, Foaud MM: Skin cancer classification using deep learning and transfer learning. 9th Cairo International Biomedical Engineering Conference (CIBEC2018), Cairo, Egypt, December 20–22, 2018. 10.1109/CIBEC.2018.8641762

[CR30] 30.Hosny Khalid M., Kassem Mohamed A., Foaud Mohamed M. Classification of skin lesions using transfer learning and augmentation with Alex-net. PLOS ONE. 2019;14(5):e0217293. doi: 10.1371/journal.pone.0217293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Krizhevsky Alex, Sutskever Ilya, Hinton Geoffrey E. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 2017;60(6):84–90. doi: 10.1145/3065386. [DOI] [Google Scholar]

[CR32] 32.Simonyan K and Zisserman A: Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations 2015, San Diego, CA, May 7–9, 2015. Available at https://arxiv.org/pdf/1409.1556.pdf

[CR33] 33.Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, and Rabinovich A: Going deeper with convolutions. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, June 7-12, 2015:1–9. 10.1109/cvpr.2015.7298594

[CR34] 34.He K, Zhang X, Ren S, Sun J: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 27-30, 2016. 10.1109/CVPR.2016.90

[CR35] 35.Pan Sinno Jialin, Yang Qiang. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering. 2010;22(10):1345–1359. doi: 10.1109/TKDE.2009.191. [DOI] [Google Scholar]

[CR36] 36.Glorot X, Bengio Y: Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:249–256, 2010. Available at http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf

[CR37] 37.Hinton GE, Srivastava N, Krizhevsky A, Sutskever I and Salakhutdinov RR: Improving neural networks by preventing co-adaptation of feature detectors. [Online]. arXiv: 1207. 0580 [cs. NE], 2012. Available at https://arxiv.org/pdf/1207.0580.pdf

[CR38] 38.Ioffe S, Szegedy C: Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning, July 2015, 37:448–456. Available at http://arxiv.org/abs/1502.03167.pdf

[CR39] 39.Yosinski J, Clune J, Bengio Y, Lipson H: How transferable are features in deep neural networks? Proceedings of the 27th International Conference on Neural Information Processing Systems, December 2014, 2:3320-3328. Available at https://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf

[CR40] 40.Zeiler MD, Fergus R: Visualizing and understanding convolutional networks. European Conference on Computer Vision, Zurich, Switzerland 2014:818–833. 10.1007/978-3-319-10590-1-53

PERMALINK

Predicting Unnecessary Nodule Biopsies from a Small, Unbalanced, and Pathologically Proven Dataset by Transfer Learning

Fangfang Han

Linkai Yan

Junxin Chen

Yueyang Teng

Shuo Chen

Shouliang Qi

Wei Qian

Jie Yang

William Moore

Shu Zhang

Zhengrong Liang

Abstract

Introduction

Methods

Fig. 1.

VGG16 Architectures Improved by Different Transfer Layers

Fig. 2.

Transfer Learning

Our Training Protocol with Transfer Learning

Table 1.

Experiments Design and Implementation

Fig. 3.

Data Preparations

Fig. 4.

VGGNet Training and Transfer Learning

Fig. 5.

Classification and Evaluation Experiments

Traditional Classification on Texture Features as Baseline for Comparison

Results

Fig. 6.

Table 2.

Fig. 7.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Discussion

Conclusion

Funding Information

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases