Automatic segmentation of lung tumors on CT images based on a 2D & 3D hybrid convolutional neural network

Wutian Gan; Hao Wang; Hengle Gu; Yanhua Duan; Yan Shao; Hua Chen; Aihui Feng; Ying Huang; Xiaolong Fu; Yanchen Ying; Hong Quan; Zhiyong Xu

doi:10.1259/bjr.20210038

. 2022 Jul 8;94(1126):20210038. doi: 10.1259/bjr.20210038

Automatic segmentation of lung tumors on CT images based on a 2D & 3D hybrid convolutional neural network

Wutian Gan ^1,2,^1,2, Hao Wang ¹, Hengle Gu ¹, Yanhua Duan ¹, Yan Shao ¹, Hua Chen ¹, Aihui Feng ¹, Ying Huang ¹, Xiaolong Fu ¹, Yanchen Ying ³, Hong Quan ², Zhiyong Xu ^1,^✉

PMCID: PMC9328064 PMID: 34347535

Abstract

Objective:

A stable and accurate automatic tumor delineation method has been developed to facilitate the intelligent design of lung cancer radiotherapy process. The purpose of this paper is to introduce an automatic tumor segmentation network for lung cancer on CT images based on deep learning.

Methods:

In this paper, a hybrid convolution neural network (CNN) combining 2D CNN and 3D CNN was implemented for the automatic lung tumor delineation using CT images. 3D CNN used V-Net model for the extraction of tumor context information from CT sequence images. 2D CNN used an encoder–decoder structure based on dense connection scheme, which could expand information flow and promote feature propagation. Next, 2D features and 3D features were fused through a hybrid module. Meanwhile, the hybrid CNN was compared with the individual 3D CNN and 2D CNN, and three evaluation metrics, Dice, Jaccard and Hausdorff distance (HD), were used for quantitative evaluation. The relationship between the segmentation performance of hybrid network and the GTV volume size was also explored.

Results:

The newly introduced hybrid CNN was trained and tested on a dataset of 260 cases, and could achieve a median value of 0.73, with mean and stand deviation of 0.72 ± 0.10 for the Dice metric, 0.58 ± 0.13 and 21.73 ± 13.30 mm for the Jaccard and HD metrics, respectively. The hybrid network significantly outperformed the individual 3D CNN and 2D CNN in the three examined evaluation metrics (p < 0.001). A larger GTV present a higher value for the Dice metric, but its delineation at the tumor boundary is unstable.

Conclusions:

The implemented hybrid CNN was able to achieve good lung tumor segmentation performance on CT images.

Advances in knowledge:

The hybrid CNN has valuable prospect with the ability to segment lung tumor.

1. introduction

Radiotherapy is one of the main methods for treating lung cancer.^1,2 Two of the most critical tasks in lung cancer radiotherapy are the accurate location of the lung tumor and the delineation of the GTV. Inaccurate delineation of GTV can easily cause insufficient exposure to the tumor boundary and lead to the unnecessary damage of surrounding organs at risk.³ The delineation of GTV is usually performed manually by the radiotherapists in clinical practice, which is time-consuming, subjective while also presenting low reproducibility. The accuracy of radiotherapists' delineation of the target area varies significantly due to their own experience and knowledge.^3,4 Therefore, it is of great significance to develop a stable, accurate and objective automatic tumor delineation method for lung cancer radiotherapy.

In the last decade, deep learning-based algorithms have demonstrated their fast and efficient automatic segmentation capabilities in CT image segmentation tasks in radiotherapy. Deep convolutional neural networks (CNN), a member of the deep learning methods, have achieved significant progress in the task of lung tumor segmentation on CT images. Kamal et al⁵ suggested a 3D encoder-decoder structure based on dense connection scheme for the initial segmentation of lung tumors, and used selective thresholding and morphological operation to reduce false positive regions. Pang et al⁶ utilized a self-adaptive fully CNN with an automated weight distribution mechanism to coarsely segment lung tumors, and finally employed an improved conditional random field (CRF) method to corrected the tumor contour. Later, Pang et al⁷ proposed a generative adversarial network based on U-Net,⁸ and obtained excellent segmentation performance on lung, kidney and liver tumor datasets. The above-mentioned methods have acted as paradigm and have inspired researchers to further explore the field of CT image segmentation of lung tumors. These methods employ 3D convolutional networks to fully exploit the intrinsic volumetric information of CT images thus improving the segmentation performance. However, the problem of blurred boundaries may occur when using 3D CNN for lung tumor segmentation. There exist three potential explanations for this problem: First, the training of 3D networks is very memory-consuming, which limits the depth and the width of the network; Second, the data input to the 3D network is anisotropic, i.e., the pixel spacing in the Z-axis (head and foot direction) is larger than the pixel spacing in the X and Y axes of the CT images, which may cause problems when using isotropic convolution kernel to convolve the images^9–11; Moreover, 3D CNN may lose some level of detail information during the downsampling phase.^12,13

2D CNN has more data samples and fewer parameters to train, in comparison to 3D CNN, and it is better at learning the edge information of each CT slice.¹⁴ Combining the ability of 3D CNN to capture the correlation information at different CT slices, with the rich edge information extracted from 2D CNN, it is possible to solve the problem of boundary blurring and thus to further improve the accuracy of tumor segmentation.

The application of 2D&3D network for medical image segmentation has been previously explored in several research works. Both Baldeon¹⁵ and Zheng¹⁶ used the ensemble method to construct hybrid networks for the segmentation of the prostate, the myocardium and the great vessels, respectively, and achieved advanced segmentation performance. Both of their sub networks use an FCN structure, which lacks sufficient channels in the upsampling process compared to the U-Net or the V-net models, making feature propagation to higher resolution layers more difficult. Li et al¹⁷ deployed a dense connection scheme in both 2D and 3D networks, which on the one hand facilitated the propagation of information and the reuse of features, and on the other hand increased the memory consumption of 3D network, which is itself very computationally intensive. Hossain et al¹⁸ first extracted the intra layer lung tumor information using 2D kernels, then stacked 2D features together and fused them through the 3D convolutions. Although 3D volumetric information were added to the 2D network, the asymmetry of the network design (first 2D, then 3D) hinders the full utilization of the 3D context information. Chen et al¹⁹ proposed a hybrid network applied to small cell lung cancer segmentation, which can effectively fuse 2D features and 3D features and outperform independent CNN in segmentation. One potential limitation could be that Chen integrates 2D and 3D networks into one hybrid network trained in parallel, which complicates the computation of updating parameters for the overall network during training, increasing in turn the burden on the network to identify the optimal solution. In addition, the use of 2D kernels for convolutional operations on hybrid features is not conducive to the full extraction of contextual information.

In this study, a hybrid CNN combining 2D CNN and 3D CNN is introduced and applied to the automatic segmentation of lung tumor on CT images. First, a V-Net model²⁰ is used as a 3D CNN to extract the context information of the tumor. The dense connection scheme is adopted in the 2D CNN model to expand the information flow and enhance the transmission and the reuse of features. Meanwhile, the features can retain more edge information by increasing the weight of low-level information, which also helps to alleviate the problem of boundary blurring. Finally, the pre-trained 2D features are fed into the 3D network for feature fusion and training. The introduced hybrid network model was aimed to achieve good performance of automatic lung tumor segmentation on CT images and to act as an example for other researchers when designing segmentation network models.

2. Methods and materials

2.1. dataset

The radiomics dataset of 260 patients with lung cancer treated in our center from October 2019 to June 2020 was included in the present study in a retrospective design. CT scans were performed using SOMATOM Definition AS (Siemens Healthcare GmbH) under free breathing conditions. The voxel spacing, which is associated with the scans in the xy plane, ranged from 0.724 to 0.976 mm, that is much smaller than the Z dimension of 3 mm. At least two experienced radiation oncologists were involved in the delineation of GTV for each patient and all disagreements were resolved through discussion until consensus is reached. Each patient’s data was saved in a DICOM file. The RTSTRUCT file for each dataset contained the GTV structure manually delineated by the radiation oncologists.

2.2. data pre-processing

The DICOM file was read from the Pydicom library and the image pixel values were converted to HU values using the formula: HU = pixel × slope + intercept, where both slope and intercept were obtained in the DICOM data. The HU values of all slices were truncated to [−200, 300] and then were normalized to a value between 0 and 1. The respective binary masks were extracted from the corresponding RTSTRUCT files. The CT volume was resampled at the same resolution (1 × 1 × 3 mm) and each slice was cropped randomly to 256 × 256 size. If the entire CT sequence was imported directly into the network, the video memory would be overloaded due to the large size. The cropped images of each CT sequence with stride of 1 and 64 consecutive slices were used as a group to synthesize a 64 × 256 × 256 size image as a training sample for the input network in order to improve the training efficiency, expand the sample size and alleviate the problem of data overfitting.

2.3. hybrid CNN

2.3.1. 3D CNN

The architecture of 3D CNN that was used in the present study, was the V-Net model proposed by Milletari²⁰ as shown in Figure 1. This model is based on an encoder-decoder structure, in which the encoding path is used to extract tumor context information and reduce the size of the input’s signal, while the decoding path is mainly used to extract features and to expand the spatial support of the lower resolution feature maps in order to gather and assemble the necessary information. Finally, the segmentation maps are generated. The convolution operation of each layer of the network framework uses convolutional kernels of size 3 × 3 × 3, and the resolution is compressed using convolutional kernels of size 2 × 2 × 2 with stride two for convolutional downsampling. The resolution of the feature map is reduced by half in all three directions (XYZ) with each downsampling. All the convolution operations use appropriate padding operations. The PReLu nonlinear activation function is used throughout the network framework.

2.3.2 2D CNN

The 2D CNN proposed in this paper is also based on an encoder-decoder structure, as shown in Figure 2. Three dense blocks are set on the encoding path to extract features, with each dense block containing four densely connected convolution layers. The dense connection scheme enables the 2D CNN to fully mine the edge information of CT slices, to reduce the number of parameters and to alleviate the gradient disappearance problem of convolution networks. The purpose of gradually decreasing the number of channels in the encoding path is to reduce the high-level information and increase the weight of the low-level information, which is beneficial to the network to focus on extracting the edge information from tumor CT images. Most of the convolution layer in the decoding path on the right side-of the network architecture is removed, and thus the resolution of the feature maps is more directly recovered through the deconvolution upsampling. This attribute limits the bloat effect of the hybrid network and improves the network operation speed. Unlike 3D CNN, 2D convolution was used throughout the whole framework, and the convolutional downsampling was performed with a resolution compression using convolutional kernels of size 2 × 2 with stride 2. The kernel size that was used in all convolutional layers in the dense block was 3 × 3.

2.3.3. fusion of hybrid features

The features extracted by 2D CNN and 3D CNN should be fused before obtaining the final segmentation map and the two types of feature maps should be consistent in size and dimension. The size and number of channels of the final output feature map of 3D CNN were 256 × 256 × 64 × 32 (Figure 1). The feature map extracted by 2D CNN was 256 × 256 × 32 after upsampling and convolution. The outputs of 64 consecutive CT slices in the final 2D CNN are merged in order to generate a tensor that can be used for 3D convolution in order to solve the dimension inconsistency problem. The tensor’s size and channel number were 256 × 256 × 64 × 32. Then, hybrid features are formed by concatenating the 2D and the 3D features respectively (Figure 3). The hybrid features are refined through a convolution kernel of size 3 × 3 × 3 and generate two channels of feature maps. Finally, the final segmentation is generated by a softmax layer.

Figure 3. — Hybrid features fusion module.

2.4. loss function

The dice loss function was chosen to optimize the introduced network to address the severe class imbalance problem in the tumor segmentation task and to avoid the interferences from a large number of non-tumor pixel-wise losses to cover up the loss information from scarce tumor pixels. Dice loss function could further assist the segmentation network in obtaining more accurate tumor contours compared with the cross-entropy loss function and the mean squared error (MSE) loss function.^7,21 Dice function was defined as follows.

L_{D i c e} = 1 - \frac{2 \sum_{i} x_{i} y_{i}}{\sum_{i} x_{i}^{2} + \sum_{i} y_{i}^{2}}

where x_i is the prediction probability for each voxel and y_i is the binary ground truth.

To fully train the hybrid network, a 1 × 1 convolutional layer and a softmax classifier were firstly added to the last layer of the 2D network and the parameters were optimized with dice loss on the dataset. Then, the optimized by the previous 2D network parameters were fixed, the 2D features were fused with the 3D features, and finally the parameters were optimized with the dice loss in the 3D network and the hybrid layer.

2.5. implementation

260 patients were randomly divided into training set (180 cases) to optimize the network parameters, validation set (30 cases) to select the optimal performance model, and test set (50 cases) to test and evaluate the finally obtained trained model. Data augmentation techniques, such as random translation, scaling and rotation, were applied to technically increase the training sample size. Considering the GPU memory limitation, the batch size was set to 1. The Adam optimiser with a learning rate of 0.001 and a weight decay of 10⁻⁷ was used as a gradient descent optimization learning algorithm. The proposed hybrid CNN was trained on two NVIDIA GTX1080 GPUs with 8 GB of RAM for 1000 epochs. Approximate training time was roughly 24 h (about 11 h for the 2D network and a total of 12–14 h for the 3D network and the hybrid layer). The code was written in PyTorch Library using Python.

2.6. evaluation metrics

Multiple metrics were used to evaluate the segmentation performance of the constructed network. The Dice coefficient was the main evaluation metrics for medical image segmentation, with this metric quantifying the spatial volume overlap between the network prediction and the ground truth. The Dice coefficient is defined in the following equation:

Dice = \frac{2 |X \cap Y|}{|X| + |Y|}

where X is the set of segmentation results and Y is the set of ground-truth delineation. The values of Dice varied from 0 to 1, and a higher value of Dice usually implying a better segmentation performance.

Jaccard index was also recorded, defined as:

Jaccard = \frac{|X \cap Y|}{|X \cup Y|}

Hausdorff distance was employed to measure the distance between two segmentation maps, and it is defined as:

HD = max {\underset{y \in Y}{m a x} \underset{x \in X}{m i n} d (y, x), \underset{x \in X}{m a x} \underset{y \in Y}{m i n} d (x, y)}

where X and Y denote the boundary-surface set of the network prediction and the ground truth, d(x,y) indicates the Euclidean distance between voxels x and y. A smaller value of HD usually implies a better result.

2.7. comparison methods

The hybrid CNN was compared with the individual 3D CNN and 2D CNN to demonstrate the superior segmentation performance of the constructed hybrid CNN. Among them, the V-Net model was used as a representative of the 3D CNNs, while the 2D CNN used a similar framework to the V-Net model except that the 3D convolution was replaced by 2D convolution. The data processing was the same for individual CNN models and the hybrid CNN. The parameters in 3D CNN and 2D CNN were adjusted according to the actual segmentation effect. The three above-mentioned evaluation metrics were used to evaluate the three segmentation methods.

The data were processed using SPSS 20.0 software (SPSS Inc., Armonk, NY). Paired t-tests were used to evaluate the statistical significance of the metrics differences between the different network. Independent samples t-tests were used to evaluate the relationship between GTV volume size and Dice metrics. A p-value threshold of 0.05 was used to infer statistically significant differences.

3. results

3.1. comparision of segmentation maps

Figure 4 shows the segmentation results of the three examined networks for the lung GTV of one patient. Obviously, the segmentation map of the constructed hybrid CNN is the closest to the ground truth, followed by 3D CNN and then 2D CNN. The segmentation curve generated by the individual 3D CNN does not shrink well at the tumor boundary. And the segmentation map of the 2D CNN shows more false positive regions.

3.2. comparison of segmentation performance metrics

The box plots of the three segmentation performance metrics in the deployed three networks are depicted in Figure 5. The hybrid network outperformed the other two segmentation networks in all three segmentation performance metrics. 2D CNN presented in summary the poorest segmentation performance, mainly due to the influence of a large number of false positive regions, which is particularly reflected in the difference of the HD metric.

Table 1 summarizes the specific values of Dice, Jaccard and HD in each of the three segmentation networks. As shown in the Table 1, the Dice value of the suggested hybrid CNN is better than the ones of the individual 3D CNN and 2D CNN (0.72 ± 0.10 vs 0.65 ± 0.15, p < 0.001; 0.72 ± 0.10 vs 0.52 ± 0.16, p < 0.001). The Jaccard value of the hybrid CNN is better than the ones of the other two segmentation networks (0.58 ± 0.13 vs 0.53 ± 0.14, p < 0.001; 0.58 ± 0.13 vs 0.40 ± 0.17, p < 0.001). The hybrid CNN was also the best considering the HD metric (21.73 ± 13.30 mm vs 26.73 ± 13.30 mm, p < 0.001; 21.73 ± 13.30 mm vs 70.73 ± 23.30 mm, p < 0.001).

Table 1.

Segmentation performance metrics comparison in the test dataset

Metric	Method	Hybrid (2D + 3D)	3D CNN	2D CNN
Dice	Median	0.73	0.67	0.49
Dice	Mean ± STD	0.72 ± 0.10	0.65 ± 0.15^aa	0.52 ± 0.16^aa
Jaccard	Median	0.57	0.53	0.37
Jaccard	Mean ± STD	0.58 ± 0.13	0.53 ± 0.14^aa	0.40 ± 0.17^aa
HD (mm)	Median	20.02	24.79	74.54
HD (mm)	Mean ± STD	21.73 ± 13.30	26.73 ± 13.30^aa	70.73 ± 23.30^aa

Open in a new tab

The numbers in bold indicate the best result under different criteria for the whole test set.

^aa

stands for p-value < 0.001.

3.3. segmentation performance and GTV volume size

The relationship between the GTV volume size and the performance metrics was also investigated in the present study. The median volume of GTV was 17.05 cm³ (3.58–374.92 cm³). The patients in the test set were divided into two groups (Group one and Group 2) according to the median volume of GTV. As shown in Table 2 and Figure 6, tumors with a volume larger than the median tended to reach larger Dice value with a smaller variance, while the Dice value was not stable in tumors with a volume smaller than the median (0.69 ± 0.12 vs 0.75 ± 0.08, p = 0.033). Jaccard also presented similar results with Dice. The HD value of large tumors was significantly higher than the one of small tumors (26.05 ± 11.03 mm vs 17.02 ± 9.30 mm, p = 0.008).

Table 2.

Comparison of GTV between the two groups

Parameters	Group 1	Group 2	p-value
GTV volume range (cm³)	<17.05	≥17.05	-
Number of patients	25	25	-
Dice (Mean ± STD)	0.69 ± 0.12	0.75 ± 0.08	0.033^a
Jaccard (Mean ± STD)	0.53 ± 0.13	0.62 ± 0.09	0.020^a
HD (Mean ± STD, mm)	17.02 ± 9.30	26.05 ± 11.03	0.008^a

Open in a new tab

stands for p-value < 0.05.

Figure 6. — Relationship between Dice and the GTV volume size.

4. discussion

A hybrid network framework combining 2D CNN and 3D CNN is introduced in this paper for the automatic segmentation of lung tumors in CT images. The results show that the hybrid CNN achieves better segmentation performance in CT images than the individual 3D CNN and 2D CNN. The introduced network achieves better and more stable Dice results on a larger GTV, although the segmentation at the boundaries is not optimal. This study provides a new design idea for deep CNN suitable for the segmentation task of lung tumors. The segmentation network presented high potential in automatically delineating lung tumors from CT images and is also applicable to other types of tumors (e.g., brain tumor, liver tumor).

The suggested hybrid architecture fully combines the advantages of the 3D CNN and the 2D CNN. Firstly, the V-Net model used by 3D CNN can extract more long-range 3D context information from the volumetric data, which is consistent with the clinical practice where radiation oncologists usually delineate GTV with reference to adjacent slices along the Z-axis. However, the 3D CNN has its own limitations: High memory consumption limits network performance; The convolution between anisotropic input data and isotropic kernel limits the extraction of robust representations; The network misses some detailed information when applying pooling downsampling or convolutional downsampling. This study takes advantage of the ability of 2D CNN at learning edge information, and combines the rich intra layer features extracted by 2D CNN with the 3D volumetric features extracted by V-Net to structurally reduce the boundary blurring problem, ending in an improved segmentation accuracy. The addition of 2D features enables the faster identification of the optimal solution and reduces the computational burden of the 3D network. In this study, the training time for the 3D module of the hybrid network is 12–14 h, which is shorter than the training time of 20 h for the 3D network alone. This advantage is not available in the lung tumor segmentation network proposed by Hossain¹⁸ and Chen,¹⁹ allowing the 3D network framework to gain a portion of space to be further widened and deepened, thus making the hybrid network more potential. A dense connection scheme is introduced into 2D network framework, which allows each convolution layer to access the feature maps of all previous layers and is able to expand the information flow in order to allow 2D CNN to fully exploit the edge features of the tumors. This connection scheme can promote the transmission of features and enhance the utilization of features.^22,23 Moreover, reducing the number of high-level feature maps and increasing the weight of low-level information in this hybrid framework enable the extracted features to retain the edge information of the tumor.

The results in Figures 4 and 5 and Table 1 demonstrated that the hybrid CNN statistically significantly (p < 0.001) outperforms the individual 3D CNN and 2D CNN in all three metrics. From the output segmentation map of the individual 3D CNN (Figure 4i–l), it is observed that the segmentation network can well locate and delineate the tumor, but its contour curve does not wrap the tumor boundary very well. In other words, the individual 3D CNN is prone to the problem of boundary blurring when delineating the lung tumor. The automatically generated contour curve shrinks better at the tumor boundary in the output segmentation map of the individual 2D CNN (Figure 4m–p). However, the 2D CNN does not analyze the tumor features as well as the 3D CNN in the overall space due to the lack of context information from the Z-axis, thus generating many false positive regions. The hybrid CNN takes over the tumor context information extracted by 3D CNN and adopts the correction of edge information extracted by 2D CNN, finally producing good segmentation results (Figure 4e–h), and its segmentation performance is better than the other two CNNs.

Although the segmentation performance of the constructed network is better than the reference method, there are still some failure cases in the test set, making the Dice metric performance not outstanding with median of 0.73 and the mean ± STD of 0.72 ± 0.10. On the one hand, because the suggested hybrid network is sensitive to the blood vessels in the lung, it is easy to mistake blood vessels, especially those close to the pulmonary hilum, as tumors for segmentation and form false positive regions (Figure 7). The disappointing results of this case suggest that more specific features need to be considered for the recognition and segmentation of lung tumors. In addition, since the 2D CNN generates more false positive regions when segmenting tumors (Figure 4n-p), the hybrid network suffers from the benefit of 2D CNN for tumor edge correction while it also suffers from its interference in generating false positive regions. On the other hand, the architecture of the constructed network still needs to be improved. Future in-depth research is required to explore how to design a 3D CNN that can extract tumor features more comprehensively, and how to fuse 3D features and 2D features in a more appropriate way to fully combine the advantages of both networks while eliminating the negative effects of the structural complexity.

Figure 7. — Representative cases with false positive regions.

Exploring of the relationship between the segmentation performance and the GTV volume size revealed that the evaluation metrics were strongly affected by the volume of GTV. Small volumes can lead to large differences in Dice value and tend to a lower value, while large volumes have unstable delineation at the boundaries.

Prior studies have combined some other advanced techniques to improve the segmentation accuracy when segmenting tumors using 3D CNN, such as selective thresholding,⁵ CRF^6,7 and graph cut.²⁴ These techniques are not utilized in this paper. The aim of this paper was to investigate whether the hybrid CNN combined by 2D CNN and 3D CNN has an advantage in segmenting lung tumors, and to demonstrate that the hybrid CNN has better segmentation performance in comparison to single CNN. Based on this, the constructed hybrid CNN may yield better segmentation results if further post-processing is done using the alternative techniques suggested in the aforementioned studies.

There are some limitations in the present study. First of all, compared to the independent 3D CNN, the cost of the improvement provided by the strategy of combining 2D and 3D is an increase in the complexity of the structure and the computational burden of the model. Second, CT images of the present study are acquired at a single center, and thus more data from different institutions are required to improve the generalization performance of the model. There are several follow-up studies that can be conducted based on this study. For example, by combining complementary information from different modality images, such as CT&PET images or CT&MRI images, CNN could localize lung tumors more precisely and produce better segmentation results. Several studies have already made attempts at multimodal co-segmentation of tumor,^21,25–28 however, this process may involve inconsistencies in spatial resolution and voxel size between images of different modalities, as well as inaccurate registration due to excessive differences in scanning positions of some patients introducing additional uncertainties. In summary, the present study aimed to construct a deep CNN that can automatically delineate lung tumors on CT images. It can inspire other researchers in designing tumor segmentation networks.

5. conclusion

The present study combines 2D CNN and 3D CNN to implement a hybrid CNN for the automatic segmentation of lung tumors using CT images. The hybrid CNN combines the advantages of both networks and outperforms the individual 3D CNN and 2D CNN in terms of segmentation performance. The hybrid CNN possesses good development potential and can act as a paradigm for other researchers who attempt to design deep neural networks for tumor segmentation.

Footnotes

Acknowledgements: This study was supported by grants from the Interdisciplinary Program of Shanghai Jiao Tong University (No.YG2019ZDB07).

Funding: Sponsored by the Interdisciplinary Program of Shanghai Jiao Tong University (No.YG2019ZDB07).

Contributors: WG was involved in conceptualization, data curation, data analysis, investigation, methodology, and writing; HW was involved in conceptualization, data curation, data analysis, and methodology; HG was involved in conceptualization, data analysis and methodology; YS, HC, AF, YD, YH, YY and XF were involved in methodology, resources, supervision; HQ was involved in writing – review and editing. ZX was involved in conceptualization, data analysis, methodology, project administration, supervision, and writing – review/editing.

Contributor Information

Wutian Gan, Email: gwt3662020@163.com.

Hao Wang, Email: newton1124@hotmail.com.

Hengle Gu, Email: guhengle@hotmail.com.

Yanhua Duan, Email: ninghuadi@163.com.

Yan Shao, Email: shaoyan1@mail.ustc.edu.cn.

Hua Chen, Email: chenyeluoer@sina.com.

Aihui Feng, Email: fah1604534340@163.com.

Ying Huang, Email: huangytez@163.com.

Xiaolong Fu, Email: xlfu1964@hotmail.com.

Yanchen Ying, Email: yanchen_ying@163.com.

Hong Quan, Email: csp6606@sina.com.

Zhiyong Xu, Email: xzyong12vip@sina.com.

REFERENCES

1.Henley SJ, Ward EM, Scott S, Ma J, Anderson RN, Firth AU, et al. Annual report to the nation on the status of cancer, part I: National cancer statistics. Cancer 2020; 126: 2225–49. doi: 10.1002/cncr.32802 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Baskar R, Lee KA, Yeo R, Yeoh K-W. Cancer and radiation therapy: current advances and future directions. Int J Med Sci 2012; 9: 193–9. doi: 10.7150/ijms.3635 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Weiss E, Hess CF. The impact of gross tumor volume (GTV) and clinical target volume (CTV) definition on the total accuracy in radiotherapy theoretical aspects and practical experiences. Strahlenther Onkol 2003; 179: 21–30. doi: 10.1007/s00066-003-0976-5 [DOI] [PubMed] [Google Scholar]
4.Schimek-Jasch T, Troost EGC, Rücker G, Prokic V, Avlar M, Duncker-Rohr V, et al. A teaching intervention in a contouring dummy run improved target volume delineation in locally advanced non-small cell lung cancer: reducing the interobserver variability in multicentre clinical studies. Strahlenther Onkol 2015; 191: 525–33. doi: 10.1007/s00066-015-0812-8 [DOI] [PubMed] [Google Scholar]
5.Kamal U, Rafi AM, Hoque R, Wu J, Hasan MK. Lung cancer tumor region segmentation using recurrent 3D-DenseUNet. MICCAI 2020 Thoracic Image Analysis (TIA) Workshop 2020: 36–47. [Google Scholar]
6.Pang S, Du A, He X, Díez J, Orgun MA. Communications in Computer and Information Science. Neural Information Processing Communications in Computer and Information Science. 2019: 589–97. [Google Scholar]
7.Pang S, Du A, Orgun MA, Yu Z, Wang Y, Wang Y, et al. CTumorGAN: a unified framework for automatic computed tomography tumor segmentation. Eur J Nucl Med Mol Imaging 2020; 47: 2248–68. doi: 10.1007/s00259-020-04781-3 [DOI] [PubMed] [Google Scholar]
8.Ronneberger O, Fischer P. Brox T. U-net: Convolutional networks for biomedical image segmentation. Med Image Comput Comput Assist Interv 2015: 234–41. [Google Scholar]
9.Chen J, Yang L, Zhang Y, Alber M, Chen DZ. Combining fully convolutional and recurrent neural networks for 3D biomedical image segmentation. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016: 3036–44 [Google Scholar]
10.Liu S, Xu D, Zhou SK, Pauly O, Grbic S, Mertelmeier T. 3D anisotropic hybrid network: transferring Convolutional features from 2D images to 3D anisotropic volumes. Med Image Comput Comput Assist Interv 2018: 851–8. [Google Scholar]
11.Jia H, Xia Y, Song Y, Zhang D, Huang H, Zhang Y, et al. 3D APA-Net: 3D Adversarial pyramid anisotropic Convolutional network for prostate segmentation in Mr images. IEEE Trans Med Imaging 2020; 39: 447–57. doi: 10.1109/TMI.2019.2928056 [DOI] [PubMed] [Google Scholar]
12.Zhang R, Zhao L, Lou W, Abrigo JM, Mok VCT, Chu WCW, et al. Automatic segmentation of acute ischemic stroke from DWI using 3-D fully Convolutional DenseNets. IEEE Trans Med Imaging 2018; 37: 2149–60. doi: 10.1109/TMI.2018.2821244 [DOI] [PubMed] [Google Scholar]
13.Long J, Shelhamer E, Darrell T.Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015. pp. 3431–40. [DOI] [PubMed] [Google Scholar]
14.Nemoto T, Futakami N, Yagi M, Kumabe A, Takeda A, Kunieda E, et al. Efficacy evaluation of 2D, 3D U-Net semantic segmentation and atlas-based segmentation of normal lungs excluding the trachea and main bronchi. J Radiat Res 2020; 61: 257–64. doi: 10.1093/jrr/rrz086 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Baldeon Calisto M, Lai-Yuen SK. AdaEn-Net: an ensemble of adaptive 2D-3D fully Convolutional networks for medical image segmentation. Neural Netw 2020; 126: 76–94. doi: 10.1016/j.neunet.2020.03.007 [DOI] [PubMed] [Google Scholar]
16.Zheng H, Zhang Y, Yang L, Liang P, Zhao Z, Wang C, et al. A new ensemble learning framework for 3D biomedical image segmentation. Proc Conf AAAI Artif Intell 2019; 33: 5909–16. doi: 10.1609/aaai.v33i01.33015909 [DOI] [Google Scholar]
17.Li X, Chen H, Qi X, Dou Q, Fu C-W, Heng P-A. H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans Med Imaging 2018; 37: 2663–74. doi: 10.1109/TMI.2018.2845918 [DOI] [PubMed] [Google Scholar]
18.Hossain S, Najeeb S, Shahriyar A, Abdullah ZR, Haque MA. A Pipeline for Lung Tumor Detection and Segmentation from CT Scans Using Dilated Convolutional Neural Networks. ICASSP 2019 - 2019 IEEE International C. 2019: 1348–52. [Google Scholar]
19.Chen W, Wei H, Peng S, Sun J, Qiao X, Liu B. Hsn: hybrid segmentation network for small cell lung cancer segmentation. IEEE Access 2019; 7: 75591–603. doi: 10.1109/ACCESS.2019.2921434 [DOI] [Google Scholar]
20.Milletari F, Navab N, Ahmadi S. V-Net: fully Convolutional neural networks for volumetric medical image segmentation. 2016 Fourth International Conference on 3D Vision. 2016: 565–71 [Google Scholar]
21.Li L, Zhao X, Lu W, Tan S. Deep learning for variational multimodality tumor segmentation in PET/CT. Neurocomputing 2020; 392: 277–95. doi: 10.1016/j.neucom.2018.10.099 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Huang G, Liu Z, Laurens VDM, Weinberger KQ. Densely connected convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700–8. [Google Scholar]
23.Zhang Z, Wu C, Coleman S, Kerr D. DENSE-INception U-net for medical image segmentation. Comput Methods Programs Biomed 2020; 192: 105395. doi: 10.1016/j.cmpb.2020.105395 [DOI] [PubMed] [Google Scholar]
24.Lu F, Wu F, Hu P, Peng Z, Kong D. Automatic 3D liver location and segmentation via convolutional neural network and graph cut. Int J Comput Assist Radiol Surg 2017; 12: 171–82. doi: 10.1007/s11548-016-1467-3 [DOI] [PubMed] [Google Scholar]
25.Zhao X, Li L, Lu W, Tan S. Tumor co-segmentation in PET/CT using multi-modality fully convolutional neural network. Phys Med Biol 2018; 64: 015011. doi: 10.1088/1361-6560/aaf44b [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Kumar A, Fulham M, Feng D, Kim J. Co-Learning feature fusion maps from PET-CT images of lung cancer. IEEE Trans Med Imaging 2019; 39: 204–17. doi: 10.1109/TMI.2019.2923601 [DOI] [PubMed] [Google Scholar]
27.Zhong Z, Kim Y, Plichta K, Allen BG, Zhou L, Buatti J, et al. Simultaneous cosegmentation of tumors in PET-CT images using deep fully convolutional networks. Med Phys 2019; 46: 619–33. doi: 10.1002/mp.13331 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Jue J, Jason H, Neelam T, Andreas R, Sean BL, Joseph DO, et al. Integrating cross-modality hallucinated MRI with CT to aid mediastinal lung tumor segmentation. Med Image Comput Comput Assist Interv 2019; 11769: 221–9. doi: 10.1007/978-3-030-32226-7_25 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b1] 1.Henley SJ, Ward EM, Scott S, Ma J, Anderson RN, Firth AU, et al. Annual report to the nation on the status of cancer, part I: National cancer statistics. Cancer 2020; 126: 2225–49. doi: 10.1002/cncr.32802 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b2] 2.Baskar R, Lee KA, Yeo R, Yeoh K-W. Cancer and radiation therapy: current advances and future directions. Int J Med Sci 2012; 9: 193–9. doi: 10.7150/ijms.3635 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b3] 3.Weiss E, Hess CF. The impact of gross tumor volume (GTV) and clinical target volume (CTV) definition on the total accuracy in radiotherapy theoretical aspects and practical experiences. Strahlenther Onkol 2003; 179: 21–30. doi: 10.1007/s00066-003-0976-5 [DOI] [PubMed] [Google Scholar]

[b4] 4.Schimek-Jasch T, Troost EGC, Rücker G, Prokic V, Avlar M, Duncker-Rohr V, et al. A teaching intervention in a contouring dummy run improved target volume delineation in locally advanced non-small cell lung cancer: reducing the interobserver variability in multicentre clinical studies. Strahlenther Onkol 2015; 191: 525–33. doi: 10.1007/s00066-015-0812-8 [DOI] [PubMed] [Google Scholar]

[b5] 5.Kamal U, Rafi AM, Hoque R, Wu J, Hasan MK. Lung cancer tumor region segmentation using recurrent 3D-DenseUNet. MICCAI 2020 Thoracic Image Analysis (TIA) Workshop 2020: 36–47. [Google Scholar]

[b6] 6.Pang S, Du A, He X, Díez J, Orgun MA. Communications in Computer and Information Science. Neural Information Processing Communications in Computer and Information Science. 2019: 589–97. [Google Scholar]

[b7] 7.Pang S, Du A, Orgun MA, Yu Z, Wang Y, Wang Y, et al. CTumorGAN: a unified framework for automatic computed tomography tumor segmentation. Eur J Nucl Med Mol Imaging 2020; 47: 2248–68. doi: 10.1007/s00259-020-04781-3 [DOI] [PubMed] [Google Scholar]

[b8] 8.Ronneberger O, Fischer P. Brox T. U-net: Convolutional networks for biomedical image segmentation. Med Image Comput Comput Assist Interv 2015: 234–41. [Google Scholar]

[b9] 9.Chen J, Yang L, Zhang Y, Alber M, Chen DZ. Combining fully convolutional and recurrent neural networks for 3D biomedical image segmentation. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016: 3036–44 [Google Scholar]

[b10] 10.Liu S, Xu D, Zhou SK, Pauly O, Grbic S, Mertelmeier T. 3D anisotropic hybrid network: transferring Convolutional features from 2D images to 3D anisotropic volumes. Med Image Comput Comput Assist Interv 2018: 851–8. [Google Scholar]

[b11] 11.Jia H, Xia Y, Song Y, Zhang D, Huang H, Zhang Y, et al. 3D APA-Net: 3D Adversarial pyramid anisotropic Convolutional network for prostate segmentation in Mr images. IEEE Trans Med Imaging 2020; 39: 447–57. doi: 10.1109/TMI.2019.2928056 [DOI] [PubMed] [Google Scholar]

[b12] 12.Zhang R, Zhao L, Lou W, Abrigo JM, Mok VCT, Chu WCW, et al. Automatic segmentation of acute ischemic stroke from DWI using 3-D fully Convolutional DenseNets. IEEE Trans Med Imaging 2018; 37: 2149–60. doi: 10.1109/TMI.2018.2821244 [DOI] [PubMed] [Google Scholar]

[b13] 13.Long J, Shelhamer E, Darrell T.Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015. pp. 3431–40. [DOI] [PubMed] [Google Scholar]

[b14] 14.Nemoto T, Futakami N, Yagi M, Kumabe A, Takeda A, Kunieda E, et al. Efficacy evaluation of 2D, 3D U-Net semantic segmentation and atlas-based segmentation of normal lungs excluding the trachea and main bronchi. J Radiat Res 2020; 61: 257–64. doi: 10.1093/jrr/rrz086 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b15] 15.Baldeon Calisto M, Lai-Yuen SK. AdaEn-Net: an ensemble of adaptive 2D-3D fully Convolutional networks for medical image segmentation. Neural Netw 2020; 126: 76–94. doi: 10.1016/j.neunet.2020.03.007 [DOI] [PubMed] [Google Scholar]

[b16] 16.Zheng H, Zhang Y, Yang L, Liang P, Zhao Z, Wang C, et al. A new ensemble learning framework for 3D biomedical image segmentation. Proc Conf AAAI Artif Intell 2019; 33: 5909–16. doi: 10.1609/aaai.v33i01.33015909 [DOI] [Google Scholar]

[b17] 17.Li X, Chen H, Qi X, Dou Q, Fu C-W, Heng P-A. H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans Med Imaging 2018; 37: 2663–74. doi: 10.1109/TMI.2018.2845918 [DOI] [PubMed] [Google Scholar]

[b18] 18.Hossain S, Najeeb S, Shahriyar A, Abdullah ZR, Haque MA. A Pipeline for Lung Tumor Detection and Segmentation from CT Scans Using Dilated Convolutional Neural Networks. ICASSP 2019 - 2019 IEEE International C. 2019: 1348–52. [Google Scholar]

[b19] 19.Chen W, Wei H, Peng S, Sun J, Qiao X, Liu B. Hsn: hybrid segmentation network for small cell lung cancer segmentation. IEEE Access 2019; 7: 75591–603. doi: 10.1109/ACCESS.2019.2921434 [DOI] [Google Scholar]

[b20] 20.Milletari F, Navab N, Ahmadi S. V-Net: fully Convolutional neural networks for volumetric medical image segmentation. 2016 Fourth International Conference on 3D Vision. 2016: 565–71 [Google Scholar]

[b21] 21.Li L, Zhao X, Lu W, Tan S. Deep learning for variational multimodality tumor segmentation in PET/CT. Neurocomputing 2020; 392: 277–95. doi: 10.1016/j.neucom.2018.10.099 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b22] 22.Huang G, Liu Z, Laurens VDM, Weinberger KQ. Densely connected convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700–8. [Google Scholar]

[b23] 23.Zhang Z, Wu C, Coleman S, Kerr D. DENSE-INception U-net for medical image segmentation. Comput Methods Programs Biomed 2020; 192: 105395. doi: 10.1016/j.cmpb.2020.105395 [DOI] [PubMed] [Google Scholar]

[b24] 24.Lu F, Wu F, Hu P, Peng Z, Kong D. Automatic 3D liver location and segmentation via convolutional neural network and graph cut. Int J Comput Assist Radiol Surg 2017; 12: 171–82. doi: 10.1007/s11548-016-1467-3 [DOI] [PubMed] [Google Scholar]

[b25] 25.Zhao X, Li L, Lu W, Tan S. Tumor co-segmentation in PET/CT using multi-modality fully convolutional neural network. Phys Med Biol 2018; 64: 015011. doi: 10.1088/1361-6560/aaf44b [DOI] [PMC free article] [PubMed] [Google Scholar]

[b26] 26.Kumar A, Fulham M, Feng D, Kim J. Co-Learning feature fusion maps from PET-CT images of lung cancer. IEEE Trans Med Imaging 2019; 39: 204–17. doi: 10.1109/TMI.2019.2923601 [DOI] [PubMed] [Google Scholar]

[b27] 27.Zhong Z, Kim Y, Plichta K, Allen BG, Zhou L, Buatti J, et al. Simultaneous cosegmentation of tumors in PET-CT images using deep fully convolutional networks. Med Phys 2019; 46: 619–33. doi: 10.1002/mp.13331 [DOI] [PMC free article] [PubMed] [Google Scholar]

[b28] 28.Jue J, Jason H, Neelam T, Andreas R, Sean BL, Joseph DO, et al. Integrating cross-modality hallucinated MRI with CT to aid mediastinal lung tumor segmentation. Med Image Comput Comput Assist Interv 2019; 11769: 221–9. doi: 10.1007/978-3-030-32226-7_25 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Automatic segmentation of lung tumors on CT images based on a 2D & 3D hybrid convolutional neural network

Wutian Gan

Hao Wang

Hengle Gu

Yanhua Duan

Yan Shao

Hua Chen

Aihui Feng

Ying Huang

Xiaolong Fu

Yanchen Ying

Hong Quan

Zhiyong Xu

Abstract

Objective:

Methods:

Results:

Conclusions:

Advances in knowledge:

1. introduction

2. Methods and materials

2.1. dataset

2.2. data pre-processing

2.3. hybrid CNN

2.3.1. 3D CNN

Figure 1.

2.3.2 2D CNN

Figure 2.

2.3.3. fusion of hybrid features

Figure 3.

2.4. loss function

2.5. implementation

2.6. evaluation metrics

2.7. comparison methods

3. results

3.1. comparision of segmentation maps

Figure 4.

3.2. comparison of segmentation performance metrics

Figure 5.

Table 1.

3.3. segmentation performance and GTV volume size

Table 2.

Figure 6.

4. discussion

Figure 7.

5. conclusion

Footnotes

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases