Abstract.
Purpose
Contour interpolation is an important tool for expediting manual segmentation of anatomical structures. The process allows users to manually contour on discontinuous slices and then automatically fill in the gaps, therefore saving time and efforts. The most used conventional shape-based interpolation (SBI) algorithm, which operates on shape information, often performs suboptimally near the superior and inferior borders of organs and for the gastrointestinal structures. In this study, we present a generic deep learning solution to improve the robustness and accuracy for contour interpolation, especially for these historically difficult cases.
Approach
A generic deep contour interpolation model was developed and trained using 16,796 publicly available cases from 5 different data libraries, covering 15 organs. The network inputs were a image patch and the two-dimensional contour masks for the top and bottom slices of the patch. The outputs were the organ masks for the three middle slices. The performance was evaluated on both dice scores and distance-to-agreement (DTA) values.
Results
The deep contour interpolation model achieved a dice score of and a mean DTA value of , averaged on 3167 testing cases of all 15 organs. In a comparison, the results by the conventional SBI method were and , respectively. For the difficult cases, the dice score and DTA value were and by the deep interpolator, compared with and by SBI. The t-test results confirmed that the performance improvements were statistically significant () for all cases in dice scores and for small organs and difficult cases in DTA values. Ablation studies were also performed.
Conclusions
A deep learning method was developed to enhance the process of contour interpolation. It could be useful for expediting the tasks of manual segmentation of organs and structures in the medical images.
Keywords: medical imaging segmentation, deep learning, contour interpolation
1. Introduction
Complete manual delineation of organs and other structures in medical images is labor intensive and often very time consuming, especially for time sensitive situations, e.g., online plan adaptation.1 Computer assistance is very desirable, and autosegmentation is an important potential answer.2 The new deep learning autosegmentation models have shown remarkable performance improvements in recent years;3 however, their robustness and accuracy are still inadequate for difficult cases. In addition, many organs or structures are still not supported [e.g., the gastrointestinal (GI) organs and tumor targets]. Because of the current challenges in robustness, accuracy, and breadth for autosegmentation models, most organ segmentations are still performed manually, and the results autosegmentation tools are manually evaluated and corrected before the results are used to support diagnosis and treatments in clinical settings, for example, in radiation therapy.4
Contour interpolation is an important tool for expediting the manual segmentation process by allowing users to manually contour on discontinuous two-dimensional (2D) slices, for example, on one of every three slices, and then automatically fill in the gaps. In the era of auto-segmentation using deep-learning models, contour interpolation is still playing an important role in (1) manually generating ground truth labels and (2) aiding manual segmentation of new or difficult anatomical structures that are not yet supported by deep learning models, for example, tumor targets and the small and large intestines. Conventional methods for image slice interpolation can be roughly grouped in three categories—contour-based, intensity-based, and shape-based methods.5 The contour-based interpolation methods are often used in surface reconstruction, which takes a set of binary images representing cross-sectional boundaries of an object.6 In intensity-based interpolation, the interpolation results are computed from the intensity values of the input images directly and on the organ contour masks for the contour interpolation cases. Linear interpolation is typically used.7 The shape-based interpolation (SBI) methods are explained in Fig. 1. The SBI methods are simple and versatile, thus suitable and commonly used for medical structure interpolation. The basic SBI method has mainly three steps:8 (1) compute the 2D distance maps for the manual contours on the top and the bottom slices, separately. (2) The distance map for an in-between slice is computed by linearly interpolating the top and bottom distance map based on the distance from the intermediate slice to the top and bottom slices. (3) The interpolated contour on the intermediate slice is computed by thresholding the interpolated distance map at 0.
Fig. 1.
Illustration of the SBI method. The manual contours are in red on the top and bottom slices. The interpolation results (the interpolated color map and the interpolated contour by thresholding the interpolated distance map at 0) are shown in the middle row. The figures on the right are color-mapped distance maps that take the value of zero (green) on the contours, positive values (green to red) outside the contours, and negative values (blue to green) inside the contours.
The effects of saving the manual contouring effort depend on the accuracy and robustness of the contour interpolation method. The basic SBI method is simple and universally applicable to any contour interpolation case because it uses nothing but the shapes of manual contours. Although it works well for most situations, it has difficulties when the two to-be-interpolated manual contours differ dramatically, especially near the superior and inferior borders of organs and for the gastrointestinal structures. An example of such a difficult case is shown in Fig. 2. In the authors’ opinion, a major reason for such difficulties is that only the distance to the manually delineated contours is considered in the interpolation process. Other potentially useful information, e.g., the image intensity similarity, image contrast, structure shape, are not utilized.
Fig. 2.
An example of unsatisfactory results by the basic SBI method for slices near the bottom of the left lung. The ground truth manual contours are shown in red, and the interpolation results are shown in blue on the three middle slices. The interpolation results were generated using SBI to interpolate the manual contours on the first and last slices.
Multiple methods have been proposed to improve the contour interpolation performance. Lutufo et al.9 proposed a method in 1992 to combine shape distance and image gray level for interpolation (CSGI). However, the CSGI method was not robust. It incorporated the image intensity information into the interpolation process, but the incorporation was neither accurate nor smooth. This algorithm also requires an extensive manual configuration; therefore, it is not universal by default. Albu et al.10 proposed a morphology-based interpolation (MBI) method in 2008 to generate smooth interpolation between interslice containing one or more regions. This method was later extended to -dimension by Zukic et al.11 in 2016 and implemented in the insight toolkit. Liao et al. improved the MBI method in 2011 by adding image intensity-based classification. The new method, called morphology-based interpolation with local intensity information (MBILII),5 as demonstrated, outperformed MBI, modified cubic spline method,12 and CSGI. Combining the binary weighted averaging for contour-based interpolation and random forest for intensity-based classification, another automatic method improved the accuracy of the interpolation more than morphology.13 Also employed in tumor segmentation, interpolation based on radial basis functions (RBF) distance map was used in a three-dimensional (3D) reconstruction in prostate cancer studies.14 To the authors’ understanding, these more complicated methods that use image gray-level information solely or combined with shape distance information only work relatively well if the gray level of the segmented structure is uniform and clearly separated from the gray level of the background. This is often not the case in medical images in which adjacent organs have very similar voxel values. The basic SBI method often performs better for abdominal organs, e.g., liver and spleen, that have minimal gray-level difference from the surrounding abdominal tissues and organs and for GI organs that do not have consistent gray level.
Deep convolutional neural network (DCNN) methods have been successfully used for image segmentation and have shown greatly improved performances over the conventional image segmentation methods.15 To the authors’ knowledge, no DCNN method has been proposed for contour interpolation. In this study, we propose a new DCNN-based contour interpolation method. The contour interpolation schemes (the slice distance between the two manual contours, the general applicability to any organs) were optimally selected. The DCNN model was designed to provide a balanced performance between model prediction accuracy and robustness, be universal applicability to any organs, and be computational efficiency for supporting real-time user interactive contouring. Both shape distance information and image intensity information are utilized together in the new method to achieve the improved contour interpolation performance, especially for the cases that are usually difficult for the conventional methods.
2. Materials and Methods
2.1. Datasets
A total of 1174 datasets acquired from 5 different data libraries were used in this study. Each dataset contains the computerized tomography (CT) image and manual contours of one or multiple organs of the 15 organs covered by this study. The CT images are in , where is the number of CT axial slices. The ranges from 100 to 400. The slice thickness of different CTs was from 0.7 to 2 mm.
Table 1 provides the details about contoured organs on datasets from different libraries. Fifteen different organs, from the thorax to the pelvis, were covered by the 1174 datasets from 5 different data libraries. Specifically, the non-small cell lung cancer (NSCLC)-radiomics16 data library has contours of lungs, spinal cord, esophagus, and heart. The beyond the cranial vault (BTCV)17 data library has contours of spleen, kidneys, gallbladder, esophagus, liver, stomach, aorta, inferior vena cava, veins, pancreas, and duodenum. The liver tumor segmentation (LiTS) 18 data library has liver contours. The pancreas-CT19 data library has spleen, left kidney, gallbladder, esophagus, liver, stomach, pancreas, and duodenum contours. Medical segmentation decathlon20 data library contains colon contours.
Table 1.
The list of publicly available benchmark data libraries used in this study.
| SI. No | Data libraries | Contoured organs | Number of datasets | Year |
|---|---|---|---|---|
| 1 | NSCLC-radiomics16 | Lung, esophagus, spinal cord, and heart | 422 | 2019 |
| 2 | BTCV17 | Spleen, right kidney, left kidney, gallbladder, esophagus, liver, stomach, aorta, inferior vena cava, veins, pancreas, and duodenum | 450 | 2019 |
| 3 | LiTS18 | Liver | 130 | 2017 |
| 4 | Pancreas-CT19 | Spleen, left kidney, gallbladder, esophagus, liver, stomach, pancreas, and duodenum | 82 | 2016 |
| 5 | Medical segmentation decathlon20 | Colon | 190 | 2019 |
Overall, the collection of CT and organ contour datasets from multiple data libraries was diverse and comprehensive. It covered clinical CT images from a wide range of settings and situations, including various image quality levels, image noise levels, in-plane pixel sizes, slice thicknesses, and ranges of human anatomy. The diversity of the training datasets was useful for ensuring the generality of the trained deep interpolator models. In the future work we plan to apply additional image modalities such as magnetic resonance imaging (MRI). Contours of additional organs in other regions of body such as head and neck, could also be adopted in further improvement.
The ground truths were contoured manually by experts and are available for most of the slices. If one slice happened to be not contoured, it was removed in the preprocessing procedure.
2.2. Preparation of Patches
Following the model design, the following data preprocessing procedure was applied to convert the full volumes of CT image and the structure mask to patches to be ready for model training and evaluation. For each organ in each dataset, (1) a 5-slice image was selected for preprocessing only if the corresponding ground truth was available for all of the continuous slices. Both the CT image and structure mask were cropped based on the 3D volume of the organ. The kidneys and lungs were separated into left and right parts. In each slice, a bounding square was utilized to crop the organs in the original image according to the sizes of the organs with an additional 10-pixel axial margin and a 2-pixel superior-inferior margin. (2) The cropping was applied in the axial view. Each axial slice was resampled to . There was no other preprocessing procedure in the sagittal view. (3) The CT image intensity values are in the range of 0 to 4191, and the voxel intensity to water is normalized at 1000. The raw voxel values of masks were 0 or 1. To keep the mask image value intensity consistent with the water voxel intensity in the CT images, the masks were multiplied by 1000 element wisely. (4) The three middle slices were duplicated so that five slices (the top and bottom slices with manual contours, and three middle slices) were converted to eight slices before the data were fed to the DCNN model. This step was needed because the model required the patch size to be on an order of 2 in each dimension to be computationally efficient. Figure 3 is an illustration of the preprocessing procedure.
Fig. 3.
An illustration of the preprocessing procedure. A five continuous slice raw patch was selected from the dataset and preprocessed. In each slice, the mask was cropped with the assistance of the bounding square (green), and the image was cropped in the same position. Next, it was sampled to . After all five slices were processed, they were stacked together as and used as input.
After preprocessing, 19,963 patches were generated from the 1174 original full volume datasets. 840 patches (5%) were selected randomly for validation during model training. 3167 patches of the randomly selected original (15%) patches were separated for testing. The different patches of the same patient’s same organ were not used for both training and validation.
2.3. DCNN Model Design and Considerations
We designed our deep interpolator model by adapting a three-layer 3D U-Net model.21 Figure 4 shows the procedure of our contour interpolation method. The network has two inputs and a single output. The first input is a 3D patch of the CT image with five continuous axial slices. The second input is the corresponding patch of the manual contour mask, in which only the top and the bottom slices have the ground truth organ contours, and the middle three slices are empty. These two contours on the top and bottom slices serve as the reference contours for the interpolation process to produce the contours for the middle three slices, as explained in Fig. 1. The output of the model is the predicted organ mask for the middle three slices. Compared with the conventional SBI method, which only uses the shape information, both the image and the shapes (i.e., the manual contours) are provided to the network model, thus providing the possibility of the improved interpolation performance.
Fig. 4.
An overview of our contour interpolating procedure.
The deep CNN model was designed under the following considerations.
-
1.
Three middle slices were assumed between the two slices with manual contours. The slice distance between the two to-be-interpolated manual contours is probably the most important factor to affect the efficiency of using contour interpolation to expedite the manual contouring. The slice distance also directly affects the performance of any given contour interpolation algorithm.
According to our preliminary results, the selection of the three middle slices was an optimal choice. It ensured that the two to-be-interpolated manual contours were not too far away; otherwise, the interpolation accuracy would be compromised. It also ensured that the manual contours were not too close; otherwise, the effectiveness of saving the manual contour efforts using contour interpolation would be reduced.
-
2.
The axial patch size was chosen to support various organ sizes and to ensure both the pixel resolution and the overall speed of the computation. An organ usually occupies only a portion of the entire CT slice of . The computation should be focused on the organ contours and not on voxels far outside the organ contours. It would be very inefficient to include the entire CT slice into the computation regardless of the organ size. In addition, the DCNN model was designed to be universal for all organs so that there would be a single trained DCNN model to be easily managed and readily applied to any contour interpolation cases. However, the size of different organs (listed in Table 1) varies significantly. For example, the liver and the lung are much larger than the spinal cord on the axial slice. A medium patch size of was therefore chosen to support the dramatic organ size differences. For all organs with different sizes on the axial slice, the image and contour mask were resampled to fit in the patch size. The details were explained above in the data preprocessing section. To be explained in the discussion, an alternative option is to train and manage multiple different DCNN models for supporting different axial patch sizes.
-
3.
The DCNN model, based on the 3D U-Net architecture, was intentionally designed to be shallow (only three stages) and narrow (only eight channels in the first encoder and the convolution filter size = 3) to ensure fast computation. The final computation speed is important because the contour interpolation should be completed instantaneously to support the interactive manual contouring task in near real time. We chose the U-Net architecture due to its reported efficiency and performance in the medical image segmentation literature. The normalization layers were all removed because the data were already normalized.
-
4.
After the DCNN model is trained for predicting the contours on the three middle slices, a different number of middle slices (1, 2, or ) will still be supported when the trained model is applied for contour interpolation. To do so, the middle slices are duplicated or subsampled to make up six middle slices. For example, if there is only a single middle slice, it will be duplicated six times. If there are two middle slices, they will be duplicated three times each. If there are more than three middle slices, contour interpolation will be carried out in multiple runs and each run will only cover three middle slices. For example, five middle slices can be covered by two runs, with the slices 1, 3, and 5 covered by the first run, and the slices 1, 2, and 4 covered by the second run.
-
5.
The transverse colon were very difficult to interpolate because it is on only a few axial slices instead of going through many axial slices in the superior–inferior direction, whereas the deep interpolator is designed to interpolate between slices in the superior-inferior direction. Therefore, we used 3-slice patches, instead of the regular 5-slice patches, for the transverse colon. To make a patch into a patch, the top and bottom slices were duplicated two times and the single middle slice was copied four times. The model output was still in the format of , but the middle slice of the output was kept as the final single-slice interpolation result.
2.4. Model Implementation, Training, and Performance Evaluation
The model was implemented using MATLAB 2020b. The models were trained on a laptop computer equipped with a GTX 1650 GPU. The Adam optimizer was used to train the 3D U-Net for 20 epochs with the following hyperparameters: mini batch size = 16, initial learning rate = 0.001, and learn rate schedule = piecewise.
We evaluated the model prediction performance on the preprocessed 3167 testing patches. The dice scores and the distance-to-agreement (DTA) values were computed referring to the known ground truth contours for each case. The dice coefficient is a measure of overlap between the predicted contours and the ground truth manual contour. The DTA measures the per voxel distance between the predicted contour and the ground truth contour.22 The average values of the maximum, the mean, and the 95th percentile of per-case DTA values were computed.
As a comparison, contour interpolations were also performed using three conventional methods: SBI,8 MBI,10 and MBILII.5 Unfortunately, the implementation for these methods is not publicly available; therefore, we implemented these methods ourselves based on the original paper. Dice scores and DTA values were computed for the results and compared. A student t-test was conducted to test the statistical significance between the results of our deep interpolator and the SBI method. Because our implementation for MBI and MBILII may not be optimal, the comprehensive comparison did not include them, and it will be our future work.
3. Result
Figure 5 shows the interpolation results for 12 selected organs. The deep contour interpolator worked well visually for all organs, regardless of the organ and the organ size, as one can see the closeness between the interpolated contours in green color and the blue ground truth 3D surfaces. Inferior vena cava and veins were not included in Fig. 5 because they had a similar shape and scan appearance to aorta. Colon was not included because the number of slices was not enough for rendering 3D surfaces and the result was suboptimal.
Fig. 5.
Examples of contour interpolation results of 12 selected organs: (a) kidney, (b) aorta, (c) lung, (d) pancreas, (e) gallbladder, (f) heart, (g) spleen, (h) liver, (i) stomach, (j) spinal cord, (k) esophagus, and (l) duodenum. In each plot, the blue surface is the ground truth of the whole organ. The red curves are the ground truth contours on the 2D slices. The green curves are the interpolated contours.
The computed dice scores are listed in Table 2. Esophagus and spinal cord were counted together. Aorta, inferior vena cava, and veins were counted together. Our deep interpolator performed better in each group with higher dice scores and a smaller standard deviation than the basic SBI method.
Table 2.
Comparison of dice scores by the conventional SBI method and the proposed deep contour interpolator.
| Organs | 3D U-Net deep contour interpolator | SBI method | Dice score differences | t-Test -values |
|---|---|---|---|---|
| Lung | ||||
| and a | ||||
| Heart | ||||
| Duodenum | ||||
| Stomach | ||||
| Spleen | ||||
| , , and b | ||||
| Liver | ||||
| Kidney | ||||
| Gallbladder | 0.01 | |||
| Pancreas | ||||
| Colon | ||||
| Difficult cases | ||||
| All |
Esophagus and spinal cord.
Aorta, inferior vena cava, and veins.
We checked the SBI method result and noticed that the performance was always not satisfying when the shapes of the contoured organs were not regular, especially in the superior and inferior parts. Comparing with the regular shape and suboptimal performance of the SBI method, we selected 286 difficult cases from relatively bigger organs including lung, liver, kidney, pancreas, stomach, gallbladder, and duodenum. The results for the difficult cases are listed in a separate row. The t-test for every group confirmed that the difference between the two result groups was statistically significant, with all -values being . This dice score comparison suggests that our deep interpolator was more accurate and robust. A few examples for such difficult cases, as shown in Fig. 6, suggest that our deep interpolator performed visually better than the basic SBI method, as the results by our deep interpolator are visually closer to the ground truth manual contours.
Fig. 6.
Contour interpolation examples of a few difficult cases: (a) lung, (b) stomach, (c) kidney, and (d) gallbladder. The ground truth manual contours are in red. The SBI interpolation results are in blue. The deep interpolator results are in green.
We computed both 2D DTA (on the axial slice) and 3D DTA values. 3D DTA values were computed among the three slices, and the 2D DTA values were computed in each individual slice. Both are clinically relevant evaluation metrics. The DTA results are shown in Tables 3 and 4. In Tables 3 and 4, smaller DTA values represent the better results of the method. We computed the difference values between DTA from the deep interpolator and SBI method in each group. Our deep interpolator overperformed the SBI method as all of the difference values are negative.
Table 3.
Mean DTA values and the differences (mm).
| Name | DIa (3D) | DI (2D) | SBIb (3D) | SBI (2D) | Diffc (3D) | Diff (2D) |
|---|---|---|---|---|---|---|
| Lung | ||||||
| and | ||||||
| Heart | ||||||
| Duodenum | ||||||
| Stomach | ||||||
| Spleen | ||||||
| , and | ||||||
| Liver | ||||||
| Kidney | ||||||
| Gallbladder | ||||||
| Pancreas | ||||||
| Colon | — | — | — | |||
| Difficult cases | ||||||
| All |
Deep interpolator.
SBI.
Differences.
Table 4.
The 95-percentile DTA values and the differences (mm).
| Name | DI (3D) | DI (2D) | SBI (3D) | SBI (2D) | Diff (3D) | Diff (2D) |
|---|---|---|---|---|---|---|
| Lung | ||||||
| and | ||||||
| Heart | ||||||
| Duodenum | ||||||
| Stomach | ||||||
| Spleen | ||||||
| , and | ||||||
| Liver | ||||||
| Kidney | ||||||
| Gallbladder | ||||||
| Pancreas | ||||||
| Colon | — | — | — | |||
| Difficult cases | ||||||
| All |
The t-test results, as shown in Table 5, indicate that our deep interpolator performed significantly better overall, for most DTA measurements and for the difficult cases, with -values . However, the performance differences were not statistically significant for a few easy organs, e.g., heart and liver, with -values .
Table 5.
The results of the t-test on the DTA values.
| Name | Max (3D) | Mean (3D) | p95% (3D) | Max (2D) | Mean (2D) | p95% (2D) |
|---|---|---|---|---|---|---|
| Lung | ||||||
| and | 0.004 | 0.06 | ||||
| Heart | 0.75 | 0.34 | 0.69 | 0.07 | 0.09 | |
| Duodenum | 0.08 | 0.01 | 0.01 | 0.04 | 0.05 | 0.02 |
| Stomach | 0.16 | 0.003 | 0.06 | 0.19 | 0.02 | 0.08 |
| Spleen | 0.004 | |||||
| , and | 0.08 | 0.01 | 0.07 | 0.58 | 0.03 | 0.51 |
| Liver | 0.09 | 0.01 | 0.43 | 0.01 | 0.08 | |
| Kidney | 0.006 | 0.001 | 0.003 | 0.007 | ||
| Gallbladder | 0.09 | 0.01 | 0.07 | 0.12 | 0.07 | 0.14 |
| Pancreas | 0.001 | 0.02 | 0.01 | 0.002 | 0.03 | 0.02 |
| Colon | — | — | — | 0.08 | 0.03 | 0.07 |
| Difficult cases | ||||||
| All |
The 3D views of lung and stomach are shown in Fig. 7; they demonstrate that our deep interpolator performed well in the regions that are usually difficult for the basic SBI method. The difficult regions were the inferior portions of the organs in these two cases, as indicated by arrows. The deep contour interpolator uses both image intensity and organ contour information. We compared the contour results with the current shape-based and complex methods, including SBI, MBI, and MBILII. An example from a lung case is shown in Fig. 8. The dice values and 3D DTA values on difficult cases are listed in Table 6. MBI and MBILII performed well, but the interpolated contours were not as close to the ground truth as the results by our deep contour interpolator. MBI is a shape-based method, and MBILII is based on MBI combined with the image intensity. The results for MBILII are slightly better than MBI.
Fig. 7.
(a)–(d) Comparison of the interpolation results by the deep interpolator and by SBI. The red surfaces are the interpolation results. The black meshes are the ground truth.
Fig. 8.
Comparison of the interpolation results on a lung case by (a) MBI, (b) MBILII, (c) SBI, and (d) deep interpolator. The ground truth manual contours are in red. The MBI results are in white. The MBILII results are in yellow.The SBI results are in blue. The deep interpolator results are in green.
Table 6.
The dice scores and DTA (mm) values of MBI, MBILII, SBI, and deep method.
| Method | Dice scores | Mean (DTA) | p95% (DTA) |
|---|---|---|---|
| MBI | 0.83 | ||
| MBILII | 0.83 | ||
| SBI | 0.86 | ||
| Deep interpolator | 0.91 |
3.1. Ablation Studies
We adopted a light-weight network design for the deep interpolator. Because there are no other additional blocks or layers, some simple ablation studies were conducted to investigate the impact of the most important design parameters. We compared the dice scores and mean of DTA values on difficult cases with different network depths, training epochs, and numbers of output channels for the first encoder stage. Table 7 summarizes the results. For the network depth settings, we tested one- and two-layer designs against the default three layers. The comparison results show that the model prediction accuracy was not sensitive to the network depth. We confirmed that more training epochs contribute to a better results. We also tested a simpler 4-channel output for the first encoder stage and demonstrated that the more output channels for the first encoder stage contribute to a better result. Due to the limitation of our GPU hardware, we did not try deeper or wider networks.
Table 7.
Ablation studies on the deep model of varying depth, number of output channels for the first encoder stage, and training epochs.
| Changes | Dice (difficult) | Mean (DTA) |
|---|---|---|
| Depth = 1 | ||
| Depth = 2 | ||
| Epoch = 10 | ||
| NumFirstEncoder = 4 | ||
| Default |
3.2. Generality and Slice Thickness Limitation Tests
To prove the generality of the trained deep interpolator model, we tested the model on three new organs (brainstem, parotid gland, and submandibular) from PDDCA23 and HNSCC-3DCT-RT.24 These three new organs are in the head-neck region, and the deep interpolator did not see any CT images or organs from the head-neck region in model training.
The computed dices scores are listed in Table 8. They show that the results from our deep interpolator are as good as or better than SBI. Figure 9 shows a comparison between the results of SBI and our deep interpolator. The smaller distance between deep interpolator results and ground truth indicates a better performance.
Table 8.
Comparison of dice scores on new organs.
| Organs | Deep interpolator | SBI | Differences | t-test -value |
|---|---|---|---|---|
| Brainstem | 0.09 | |||
| Parotid gland | ||||
| Submandibular |
Fig. 9.
Comparison of performance on newly added organs between different methods (submandibular). The ground truth manual contours are in red. The SBI interpolation results are in blue. The deep interpolator results are in green.
We tested the performance of our deep interpolator to interpolate between two 7-slice patches apart for different organs. The two examples in Fig. 10 show the comparison between 5-slice cases and 7-slice cases. The first row contains the comparisons on one relatively regularly shape from lung. The performances were in the same level. The second row contains some suboptimal performances on an irregular shape from stomach. The distance between the prediction and ground truth is smaller in the 5-slice case than the 7-slice cases in the area pointed to by blue circle. It can be observed that the performance can still be robust when intermediate slices are fed into the model with the increased slice thickness. However, when the differences between the reference slices are dramatic, an increasing distance between the reference slices could lead to suboptimal interpolation results. Model robustness over larger slice distances might warrant a further investigation.
Fig. 10.
(a)–(d) Comparison of results from different slice thickness. The ground truth contours are in red. The deep interpolator results are in green.
4. Discussions
The results confirmed that the proposed deep interpolator was significantly more accurate and robust than the conventional shape-based interpolator method, especially for the difficult cases in which the shape changes between the adjacent slices were more dramatic. On the difficult transverse colon cases, our deep interpolator trained with only a few training cases performed much better than the conventional SBI method. The deep contour interpolator also outperformed the MBI and MBILII methods on dice score and DTA measurements. We decided not to compare our results with the CSGI method because the algorithm requires too many user-configurable parameters per case to work properly.
There are many directions to further improve our deep interpolator model. More training data will be very useful, especially for difficult cases. However, datasets for the GI organs, e.g., colon and small intestines, are very difficult to obtain because they are not commonly contoured in the clinic and the contour datasets are rarely publicly shared due to low contour qualities. The data augmentation methods, for example, rotation and noise addition, could be applied to further improve the model robustness. A shallow and narrow 3D U-Net architecture was used in the current study to ensure the computation performance and the general robustness. More advanced network architectures, e.g., generative adversarial network25 and the attention network,26 can still be explored in further work. It is also possible to use the distance map as the second input to our deep interpolator to provide richer information than the current structure mask. The network can also be trained to output the distance map instead of the organ mask to allow for more postprocessing options. The prediction of our current small U-Net model is very fast. The mean processing time for each patch is on a GTX1650 GPU, compared with SBI (), MBI (0.2), and MBILII (0.6).
This deep interpolator model could be further extended in future work to include more image modalities and more organs. Only CT images and 15 organs were covered so far. Other organs can be supported once the organ contour datasets are available. In the authors’ opinion, MRI data can also be added and be supported using a single universal model, but the image intensity normalization step might require additional adjustments.
5. Conclusion
A deep learning model was developed in this study to perform 2D contour interpolation on CT images. The deep model was trained and evaluated on 15 organs and significantly outperformed the basic shape-based contour interpolation method. It could be useful to expedite the tasks of manual segmentation of organs and structures in medical images.
Acknowledgments
This study was supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) (Grant No. R03-EB028427).
Biographies
Chenxi Zhao received his BS degree in computer science and technology from Jilin University in 2018. Currently, he is pursuing a PhD in computer science at the EECS Department of the University of Missouri under Dr. Ye Duan’s supervision. His research interests include medical imaging, computer vision, deep learning, and computer graphics.
Ye Duan received his BA degree in mathematics from Peking University in 1991, his MS degree in mathematics from Utah State University in 1996, and his MS degree and PhD in computer science from the State University of New York at Stony Brook, in 1998 and 2003, respectively. From September 2003 to August 2009, he was an assistant professor of computer science at the University of Missouri, Columbia, Missouri, United States, where he is an associate professor of computer science. His research interests include computer graphics, computer vision, machine learning, and biomedical imaging.
Deshan Yang received his PhD in biomedical engineering from the University of Wisconsin–Madison. Currently, he is a professor at the Department of Radiation Oncology, Duke University. His research interests include medical imaging (registration, segmentation, reconstruction, motion management, and image guidance) for radiation oncology applications, machine learning and deep learning, adaptive radiotherapy, image guidance, treatment planning automation, quality assurance, cardiac radiosurgery, health information technologies, and clinical applications for radiation oncology and medical physics.
Disclosures
No conflicts of interest.
Contributor Information
Chenxi Zhao, Email: cz3d6@mail.missouri.edu.
Ye Duan, Email: yeduan@missouri.edu.
Deshan Yang, Email: deshan.yang@duke.edu.
References
- 1.Lamb J., et al. , “Online adaptive radiation therapy: implementation of a new process of care,” Cureus 9(8), e1618 (2017). 10.7759/cureus.1618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hesamian M. H., et al. , “Deep learning techniques for medical image segmentation: achievements and challenges,” J. Digital Imaging 32(4), 582–596 (2019). 10.1007/s10278-019-00227-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lei Y., et al. , “Deep learning in multi-organ segmentation,” arXiv:2001.10619 (2020).
- 4.Vaassen F., et al. , “Evaluation of measures for assessing time-saving of automatic organ-at-risk segmentation in radiotherapy,” Phys. Imaging Radiat. Oncol. 13, 1–6 (2020). 10.1016/j.phro.2019.12.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Liao X., Reutens D., Yang Z., “Morphology-based interslice interpolation using local intensity information for segmentation,” in 4th Int. Conf. Biomed. Eng. and Inf. (BMEI), IEEE; (2011). 10.1109/BMEI.2011.6098315 [DOI] [Google Scholar]
- 6.Jha K., “Construction of branching surface from 2-D contours,” Int. J. CAD/CAM 8(1), 21–28 (2009). [Google Scholar]
- 7.Baghaie A., Yu Z., “An optimization method for slice interpolation of medical images,” arXiv:1402.0936 (2014).
- 8.Grevera G. J., Udupa J. K., “Shape-based interpolation of multidimensional grey-level images,” IEEE Trans. Med. Imaging 15(6), 881–892 (1996). 10.1109/42.544506 [DOI] [PubMed] [Google Scholar]
- 9.Lutufo R. A., Herman G. T., Udupa J. K., “Combining shape-based and gray-level interpolations,” Proc. SPIE 1808, 1–10 (1992). 10.1117/12.131085 [DOI] [Google Scholar]
- 10.Albu A. B., Beugeling T., Laurendeau D., “A morphology-based approach for interslice interpolation of anatomical slices from volumetric images,” IEEE Trans. Biomed. Eng. 55(8), 2022–2038 (2008). 10.1109/TBME.2008.921158 [DOI] [PubMed] [Google Scholar]
- 11.Zukic D., et al. , “ND morphological contour interpolation,” Insight J. 1–8 (2016). 10.54294/achtrg [DOI] [Google Scholar]
- 12.Herman G. T., Bucholtz C. A., Zheng J., “Shape-based interpolation using modified cubic splines,” in Proc. Annu. Int. Conf. IEEE Eng. in Med. and Biol. Soc., Volume 13: 1991, IEEE; (2005). 10.1109/IEMBS.1991.683941 [DOI] [Google Scholar]
- 13.Ravikumar S., et al. , “Facilitating manual segmentation of 3D datasets using contour and intensity guided interpolation,” in IEEE 16th Int. Symp. Biomed. Imaging (ISBI 2019), IE; EE (2019). 10.1109/ISBI.2019.8759500 [DOI] [Google Scholar]
- 14.Wildeboer R. R., et al. , “Three-dimensional histopathological reconstruction as a reliable ground truth for prostate cancer studies,” Biomed. Phys. Eng. Express 3(3), 035014 (2017). 10.1088/2057-1976/aa7073 [DOI] [Google Scholar]
- 15.Wu Q., et al. , “A multi-stage DCNN method for liver tumor segmentation,” in IEEE 3rd Int. Conf. Safe Prod. and Inf. (IICSPI), IEEE; (2020). 10.1109/IICSPI51290.2020.9332353 [DOI] [Google Scholar]
- 16.Aerts H. J. W. L., “Data from NSCLC-radiomics,” Cancer Imaging Arch. (2019). 10.7937/K9/TCIA.2015.PF0M9REI [DOI] [Google Scholar]
- 17.Landman B., “MICCAI multi-atlas labeling beyond the cranial vault – workshop and challenge,” (2015).
- 18.Bilic P., et al. , “The liver tumor segmentation benchmark (LiTS),” Med. Image Anal. 84, 102680 (2023). 10.1016/j.media.2022.102680 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Roth H. R., et al. , “Data from pancreas-CT,” Cancer Imaging Archive (2016).
- 20.Simpson A. L., et al. , “A large annotated medical image dataset for the development and evaluation of segmentation algorithms,” arXiv:1902.09063 (2019).
- 21.Ronneberger O., Fischer P., Brox T., “U-net: convolutional networks for biomedical image segmentation,” Lect. Notes Comput. Sci. 9351, 234–241 (2015). 10.1007/978-3-319-24574-4_28 [DOI] [Google Scholar]
- 22.Thomson D., Boylan C., Liptrot T., “Evaluation of an automatic segmentation algorithm for definition of head and neck organs at risk,” Radiat. Oncol. 9, 173 (2014). 10.1186/1748-717X-9-173 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Raudaschl P. F., et al. , “Evaluation of segmentation methods on head and neck CT: auto-segmentation challenge 2015,” Med. Phys. 44(5), 2020–2036 (2017). 10.1002/mp.12197 [DOI] [PubMed] [Google Scholar]
- 24.Bejarano T., De Ornelas-Couto M., Mihaylov I. B., “Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment Dataset,” Cancer Imaging Arch. (2018). 10.7937/K9/TCIA.2018.13upr2xf [DOI] [Google Scholar]
- 25.Goodfellow I., et al. , “Generative adversarial networks,” Commun. ACM 63(11), 139–144 (2020). 10.1145/3422622 [DOI] [Google Scholar]
- 26.Vaswani A., et al. , “Attention is all you need,” Adv. Neural Inf. Process. Syst. 30 (2017). [Google Scholar]










