Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 6.
Published in final edited form as: Biomed Phys Eng Express. 2019 Mar 12;5(3):035005. doi: 10.1088/2057-1976/ab0734

Automatic diaphragm segmentation for real-time lung tumor tracking on cone-beam CT projections: a convolutional neural network approach

David Edmunds 1, Greg Sharp 1, Brian Winey 1
PMCID: PMC8260092  NIHMSID: NIHMS1686707  PMID: 34234960

Abstract

Purpose:

To automatically segment the diaphragm on individual lung cone-beam CT projection images, to enable real-time tracking of lung tumors using kilovoltage imaging.

Methods:

The deep neural network Mask R-CNN was trained on 3500 raw cone-beam CT projection images from 10 lung cancer patients, with the diaphragm manually segmented on each image used as a ground truth label. Ground-truth breathing traces were extracted from each patient for both diaphragm hemispheres, and apex positions were compared against the predicted output of the neural network. Ten-fold cross-validation was used to evaluate the segmentation accuracy.

Results:

The mean diaphragm apex prediction error was 4.4 mm. The mean percentage of projection images for which a successful prediction could me made was 87.3%. Prediction accuracy at some lateral gantry angles was worse due to overlap between diaphragm hemispheres, and the increased amount of fatty tissue.

Conclusions:

The neural network was able to track the diaphragm apex position successfully. This allows accurate assessment of the breathing phase, which can be used to estimate the position of the lung tumor in real time.

Keywords: cone-beam CT, machine learning, diaphragm tracking, neural networks, image processing

1. Introduction

Lung cancer remains one of the most challenging cancers to treat with radiotherapy, due to the constant motion of the tumor caused by respiratory movement. Historically, this has been dealt with by expanding treatment margins to encompass the entire range of movement of the tumor, but this comes at the expense of unnecessary dose to surrounding healthy tissue [1].

Recently, advances in motion tracking have offered the possibility of adapting radiation delivery to follow the tumor motion, making it possible to reduce the size of treatment margins. Implanted fiducial markers inside or near the target, which are easily visualized using cone-beam CT (CBCT) or fluoroscopic imaging, are a commonly employed method of tumor tracking [2]. However, concerns over marker migration [3] and the possibility of accidental seeding of the cancer tissue to other organs have limited the clinical adoption of such markers.

Another common technique is to use surface imaging as an external surrogate for internal tumor motion [46]. However, it is known that the correlation between external chest wall movement and the motion of the tumor is unreliable, and is subject to both inter- and intra-fractional changes [7].

Recently, developments in MRI-guided radiotherapy offer the exciting possibility of real-time lung tumor tracking using an MRI scanner during treatment [810]. Unfortunately, this technology is not yet widely available and comes at a significant cost premium over traditional linear accelerators (linacs).

Conversely, on-board kilovoltage imaging is widely available, and is commonly installed on linacs in radiotherapy departments around the world. It would be cost-effective to make use of this existing equipment, which is already familiar to medical practitioners, for the purposes of tumor tracking. However, the quality of individual CBCT projection images is often poor, which makes it difficult or even impossible to visualize the lung tumor itself. For this reason, it is more common to track the diaphragm, easily seen on individual projections due to its relatively high muscle tissue density compared to air and use the diaphragm motion as an internal surrogate for tumor motion. Previous work [11] has shown that the diaphragm is a good surrogate for the large majority of patients and tumor locations.

Several methods to track diaphragm motion using CBCT projection images have been proposed. One of the most popular methods is the so-called ‘Amsterdam shroud’ technique [12, 13], which involves ‘stitching’ the 1D x-ray images projected onto the cranio-caudal axis together, and calculating the number of pixels each column needs to be shifted in order to minimize the RMS difference between columns. These pixel shifts then form the basis of a respiratory motion signal. Another method [14] makes use of differences in pixel intensities between subsequent projection images, combined with post-processing to extract a breathing signal, although this method does not track diaphragm apex position directly. In [15, 16] the RAN-SAC algorithm was used to locate the diaphragm contour, but requires the user to manually select the position of the diaphragm apex on the first projection, which makes it impractical for real-time tracking.

In this work, we present an alternative method for automatic diaphragm segmentation using a state-of-the-art convolutional neural network. The network is called Mask R-CNN (Region Convolutional Neural Network) [17], and was recently developed by Facebook for automatic segmentation of objects in autonomous vehicles. Our method requires little pre- or post-processing of projection images, requires no manual intervention and is capable of tracking diaphragm motion in near real-time. We plan to integrate this network with our Kilovoltage Projection Streaming Application (KiPSTA) [18] for real-time tumor tracking.

2. Methods

2.1. Mask R-CNN

Briefly, Mask R-CNN operates by first applying a region proposal network (RPN) to an image, which proposes a large set of candidate rectangles in the image which may contain an object [19]. A second network [20] then performs object classification on each rectangle to find if the region contains an object of interest or not. Finally, a fully convolutional network (FCN) [21] is used to perform pixel-level segmentation on each object of interest.

2.2. Training data

We collected one pre-treatment CBCT scan for each patient in a ten-person cohort of lung cancer patients. There was an average of 350 projection images per scan, and a total of 3499 projection images. Projection images were obtained from the Elekta XVI CBCT imaging system on an Elekta Synergy linac, which produces 1024 × 1024-pixel grayscale images with intensity values in the range [0, 65535] in the HIS file format. We manually labelled points on both diaphragm hemispheres in every projection image using the Fiji image processing software [22], and then fitted a parabola to each set of points (see figure 1). The hemisphere apex positions were taken to be the most superior pixel in each fitted parabola.

Figure 1.

Figure 1.

Projection image 100 from patient 2. Manually labelled diaphragm hemispheres are shown. Markers indicate manually selected points on the diaphragm apex. A parabola is fitted to the selected points, and a binary mask is created by setting all points above the parabola to a value of 0 (‘above diaphragm’) and all points below the parabola to a value of 1 (‘below diaphragm’). Any mask overlap between hemispheres is removed during a postprocessing step by a Python script.

We used the open-source implementation of Mask R-CNN available at https://github.com/matterport/Mask_RCNN for training. A single NVIDIA Quadro K2200 workstation graphics processing unit (GPU) was used for training. Training was performed with 1000 steps per epoch, for a total of 200 epochs, with a learning rate of 0.001. All projection images were resized to 256 × 256 pixels prior to training, due to GPU memory constraints. A 101-layer ResNet [23] architecture was used as the network backbone.

2.3. Validation

For cross-validation purposes, training was performed on projection images from 9 out of the 10 patients, with one patient’s images held out for validation. This was repeated for all 10 patients.

To validate the performance of the neural network, ground truth breathing traces for each patient had to be acquired. For a given patient CBCT scan, let the N sequential projection images which comprise the scan be labelled as I1, …, IN. Let us denote the set of two diaphragm apex pixel positions for projection i as Pi=(Pia,Pib) Note that one or both positions might be undefined, if the corresponding diaphragm hemisphere was not visible or was not labelled in the projection image. We can extract two breathing traces from this scan, one for each hemisphere, denoted T1 and T2 respectively. We take T11=P1a and T12=P1b to be the first point of each breathing trace. Then we find the next point in T1 by taking the apex pixel in the next projection image which has the smallest Euclidean distance from T11,

Ti+11=argminPPi+1(|Ti1P|) for i2,,N1,

and Ti+12 to be the other apex pixel.

Next, we must match the inferred apex positions predicted by the neural network to the ground truth apex positions. Let Ai=(Aia,Aib) be the inferred apex positions for projection i. Let the predicted breathing traces be B1 and B2 respectively. Then

Bi1={Aia if |AiaTi1|<|AibTi1|,Aib otherwise.

and

Bi2=AAi such that ABi1.

We define the errors for projection image i as

ei1=|Ti1Bi1| and ei2=|Ti2Bi2|.

If there is no predicted apex position when there should be, we exclude the projection image from the error calculation, and instead report it as a percentage of failed predictions. We define the validation error over a patient’s set of N labelled projection images as

e1=1Ni=1Nei.

3. Results

3.1. Training

Training on the entire dataset of 9 patients with 1 left out for validation took about 72 h on a single GPU. We repeated training 10 times, with a different patient selected for leave-one-out verification each time. Figure 2 shows an example training loss curve with patient 5 excluded for validation. The loss decreasing to a constant value indicates that the neural network was successfully learning. The loss on the validation patient also continued to decrease as a function of training time, and did not begin to increase again during training, so there is no evidence of overfitting.

Figure 2.

Figure 2.

Left: the loss function of the neural network as a function of number of epochs, as trained with patient 5 excluded for validation. Right: the loss on the validation set.

3.2. Validation

Figure 3 shows an example ground truth breathing trace from patient 2, along with the predicted diaphragm apex positions from the neural network.

Figure 3.

Figure 3.

Extracted breathing traces for each diaphragm hemisphere from patient 2. Ground truth breathing traces are shown as solid lines, with corresponding predicted breathing traces from the neural network shown as dashed lines.

For patient 2, it can be seen from figure 3 that there is a large jump in predicted apex position for projection number 274. This is caused by the neural network mistakenly classifying heart tissue as diaphragm, as shown in figure 4. There are a range of projection angles between approximately projection numbers 200 and 275, where one diaphragm hemisphere is not visible due to the limited field of view of the CBCT scan. Hence, there is no ground truth available for these projections (see figure 5). This is not the case for all patients.

Figure 4.

Figure 4.

Example diaphragm segmentation for projection number 274 of patient 2. The network has misclassified the heart as a diaphragm hemisphere, due to the similar density of heart and diaphragm muscle tissue. This causes an erroneous abrupt change in predicted apex position.

Figure 5.

Figure 5.

Prediction from projection 240 of patient 2. One hemisphere is visible and correctly segmented, but the other hemisphere is outside of the field of view. The heart is correctly ignored by the network in this projection.

Table 1 shows the cross-validation results in the form of diaphragm apex prediction errors for all 10 patients. The overall mean apex prediction error was 11.3 pixels, which corresponds to a position error of 4.4 mm.

Table 1.

Mean apex prediction error in pixels and percentage of projection images where the neural network failed to make a detection, for both diaphragm hemispheres.

Patient number Hemi 1 mean error (mm) Hemi 1 stdev (mm) Hemi 1% failed detections Hemi 2 mean error (mm)
1 4.3 5.7 9.4 3.3
2 4.8 5 14.4 5.9
3 5 9 2.4 4.6
4 2.9 8.4 10.6 4.1
5 3 8.8 7.7 9.6
6 2.7 1.7 11.4 5.7
7 4.5 5.9 9.5 7.3
8 3 3.2 2.7 5.6
9 3.1 5 1.7 2.2
10 2.9 1.6 2.5 3.4

4. Discussion

The goal of this work was to develop a neural network to perform automatic segmentation of the diaphragm on individual CBCT projection images. Since the lung tumor is often impossible to localize in individual projections, the diaphragm apex position could be used to provide an estimate of tumor position in real time. Our neural network approach is accurate, and contrary to some existing methods, requires no manual intervention at any stage of the process.

As seen in figure 6, the neural network performs less well at lateral gantry angles. This is primarily due to the larger amounts of fatty tissue obstructing the view of the diaphragm at lateral angles. Manual annotation of diaphragm positions at these angles can also be quite challenging. Adjusting CBCT scanner protocols may be one way to mitigate such problems.

Figure 6.

Figure 6.

Number of projection images for which a ground truth label existed, where the neural network could not correctly identify the diaphragm, as a function of gantry angle, across the entire dataset. There are a significant number of missed predictions clustered around 0° and 180°, corresponding to lateral views of the patient.

Figure 4 shows that it is possible for the network to misclassify heart tissue as diaphragm tissue. This is an unlikely occurrence, which could be mitigated by constraining the range of possible diaphragm positions by drawing a predefined region of interest (ROI).

In this proof-of-concept work, we take no action when the neural network fails to detect the diaphragm in a given projection image. However, in a clinical environment it might be appropriate to pause treatment if no diaphragm is detected for a given number of consecutive frames, as this could indicate a tracking failure. Similarly, if the inferred diaphragm apex position changes by more than a pre-defined threshold amount between frames, this could indicate a gross motion error, providing an opportunity to pause treatment and reposition the patient. In future work, it would be possible to incorporate a regression or correlation-based motion prediction technique to ‘fill in’ predictions for missing projection angles. This would alleviate the need to pause treatment during tracking if the neural network cannot make a prediction. Several groups have already proposed accurate methods to perform this motion prediction [2426].

We found that the mean time taken for the network to perform inference was approximately 0.5 s per projection image. This is clearly too slow to perform real-time diaphragm tracking. However, the inference time could easily be improved by using a more modern GPU, or by further downsampling the resolution of the projection images during streaming. Another possibility is to only use the network to search for diaphragm tissue in a sub-region of the complete projection image, by manually defining a restricted search ROI prior to treatment. This would lead to improved performance times and could also improve accuracy.

Limitations of this study include the small size of the training data (10 patients total, with 1 patient held out for 10-fold cross validation). The generalizability of the neural network could be improved by including projection images from more patients in the training data, as well as images that were acquired under a variety of different scanner protocols.

In future work, we plan to extend the neural network to segment other structures visible in the projection image data, such as cardiac tissue and bony anatomy. Knowledge of the location of and degree of motion of these structures during treatment could be helpful.

5. Conclusion

We have developed a method to automatically track the diaphragm using a convolutional neural network. We used a state-of-the-art neural network designed for image segmentation in unrelated fields and retrained it on our own dataset of approximately 3500 cone-beam CT projection images from 10 lung cancer patients. The network was able to predict the diaphragm apex positions successfully, with a mean error of 4.4 mm. In the future, we plan to use these predicted apex positions as a surrogate for lung tumor motion, enabling real-time tracking of the lung tumor without the need for internal or external markers, using the in-room cone-beam CT system.

Acknowledgments

This work was supported by a motion management grant from Elekta. Research supported in part by the NCI Federal Share of program income earned by Massachusetts General Hospital on C06 CA059267, Proton Therapy Research and Treatment Center.

References

  • [1].Ekberg L, Holmberg O, Wittgren L, Bjelkengren G and Landberg T 1998. What margins should be added to the clinical target volume in radiotherapy treatment planning for lung cancer? Radiother. Oncol 48 71–7 [DOI] [PubMed] [Google Scholar]
  • [2].Nuyttens JJ et al. 2006. Lung tumor tracking during stereotactic radiotherapy treatment with the CyberKnife: marker placement and early results Acta Oncol. (Madr) 45 961–5 [DOI] [PubMed] [Google Scholar]
  • [3].Kitamura K et al. 2002. Registration accuracy and possible migration of internal fiducial gold marker implanted in prostate and liver treated with real-time tumor-tracking radiation therapy (RTRT) Radiother. Oncol 62 275–81 [DOI] [PubMed] [Google Scholar]
  • [4].Shah AP, Dvorak T, Curry MS, Buchholz DJ and Meeks SL 2013. Clinical evaluation of interfractional variations for whole breast radiotherapy using 3-dimensional surface imaging. Pract. Radiat. Oncol 3 16–25 [DOI] [PubMed] [Google Scholar]
  • [5].Hayashi N, Obata Y, Uchiyama Y, Mori Y, Hashizume C and Kobayashi T 2009. Assessment of spatial uncertainties in the radiotherapy process with the Novalis system Int. J. Radiat. Oncol. Biol. Phys 75 549–57 [DOI] [PubMed] [Google Scholar]
  • [6].Stieler F, Wenz F, Shi M and Lohr F 2013. A novel surface imaging system for patient positioning and surveillance during radiotherapy: a phantom study and clinical evaluation Strahlenther. Onkol 189 938–44 [DOI] [PubMed] [Google Scholar]
  • [7].Hoisak JDP, Sixel KE, Tirona R, Cheung PCF and Pignol J-P 2004. Correlation of lung tumor motion with external surrogate indicators of respiration Int. J. Radiat. Oncol 60 1298–306 [DOI] [PubMed] [Google Scholar]
  • [8].Lagendijk JJW, van Vulpen M and Raaymakers BW 2016. The development of the MRI linac system for online MRI-guided radiotherapy: a clinical update J. Intern. Med 280 203–8 [DOI] [PubMed] [Google Scholar]
  • [9].Mutic S and Dempsey JF 2014. The viewray system: magnetic resonance–guided and controlled radiotherapy Semin. Radiat. Oncol 24 196–9 [DOI] [PubMed] [Google Scholar]
  • [10].Cerviño LI, Du J and Jiang SB 2011. MRI-guided tumor tracking in lung cancer radiotherapy Phys. Med. Biol 56 3773–85 [DOI] [PubMed] [Google Scholar]
  • [11].Cerviño LI, Chao AKY, Sandhu A and Jiang SB 2009. The diaphragm as an anatomic surrogate for lung tumor motion Phys. Med. Biol 54 3529–41 [DOI] [PubMed] [Google Scholar]
  • [12].Zijp L, Sonke J and van Herk M 2004. Extraction of the respiratory signal from sequential thorax cone-beam x-ray images Int. Conf. Use Comput. Radiat. Ther 507–9 [Google Scholar]
  • [13].Rit S, van Herk M, Zijp L and Sonke JJ 2012. Quantification of the variability of diaphragm motion and implications for treatment margin construction Int. J. Radiat. Oncol. Biol. Phys 82 e399–407 [DOI] [PubMed] [Google Scholar]
  • [14].Kavanagh A, Evans PM, Hansen VN and Webb S 2009. Obtaining breathing patterns from any sequential thoracic x-ray image set Phys. Med. Biol 54 4879–88 [DOI] [PubMed] [Google Scholar]
  • [15].Bögel M, Maier A, Hofmann HG, Hornegger J and Fahrig R 2012. Diaphragm Tracking in Cardiac C-Arm Projection Data (Berlin, Heidelberg: Springer; ) pp 33–8 [Google Scholar]
  • [16].Bögel M, Hofmann HG, Hornegger J, Fahrig R, Britzen S and Maier A 2013. Respiratory motion compensation using diaphragm tracking for cone-beam C-arm CT: a simulation and a phantom study Int. J. Biomed. Imaging 2013 1–10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].He K, Gkioxari G, Dollár P and Girshick R 2017. Mask R-CNN 2017 IEEE International Conference on Computer Vision (ICCV) 1 2980–88 [Google Scholar]
  • [18].Kim J, Park YK, Edmunds DM, Oh K, Sharp G and Winey B 2018. Kilo-voltage projection streaming-based tracking application (KiPSTA): first clinical implementation during spine stereotactic radiosurgery Adv. Radiat. Oncol 3 682–92 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Ren S, He K, Girshick R and Sun J 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [DOI] [PubMed]
  • [20].Girshick R. Fast R-CNN. 2015. arXiv:1504.08083.
  • [21].Shelhamer E, Long J and Darrell T 2016. Fully Convolutional Networks for Semantic Segmentation 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, USA, 7–12 June 2015) (IEEE) ( 10.1109/CVPR.2015.7298965) [DOI] [PubMed] [Google Scholar]
  • [22].Schindelin J et al. 2012. Fiji: an open-source platform for biological-image analysis Nat. Methods 9 676–82 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].He K, Zhang X, Ren S and Sun J 2015. Deep Residual Learning for Image Recognition 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1 [Google Scholar]
  • [24].Sharp GC, Jiang SB, Shimizu S and Shirato H 2004. Prediction of respiratory tumour motion for real-time image-guided radiotherapy Phys. Med. Biol 49 425–40 [DOI] [PubMed] [Google Scholar]
  • [25].Teo TP et al. 2018. Feasibility of predicting tumor motion using online data acquired during treatment and a generalized neural network optimized with offline patient tumor trajectories Med. Phys 45 830–45 [DOI] [PubMed] [Google Scholar]
  • [26].Rottmann J and Berbeco R 2014. Using an external surrogate for predictor model training in real-time motion management of lung tumors Med. Phys 41 121706. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES