Abstract
Acute pancreatitis is an inflammatory disorder of the pancreas. Medical imaging, such as computed tomography (CT), has been widely used to detect volume changes in the pancreas for acute pancreatitis diagnosis. Many pancreas segmentation methods have been proposed but no methods for pancreas segmentation from acute pancreatitis patients. The segmentation of an inflamed pancreas is more challenging than the normal pancreas due to the following two reasons. 1) The inflamed pancreas invades surrounding organs and causes blurry boundaries. 2) The inflamed pancreas has higher shape, size, and location variability than the normal pancreas. To overcome these challenges, we propose an automated CT pancreas segmentation approach for acute pancreatitis patients by combining a novel object detection approach and U-Net. Our approach includes a detector and a segmenter. Specifically, we develop an FCN-guided region proposal network (RPN) detector to localize the pancreatitis regions. The detector first uses a fully convolutional network (FCN) to reduce the background interference of medical images and generates a fixed feature map containing the acute pancreatitis regions. Then the RPN is employed on the feature map to precisely localize the acute pancreatitis regions. After obtaining the location of pancreatitis, the U-Net segmenter is used on the cropped image according to the bounding box. The proposed approach is validated using a collected clinical dataset with 89 abdominal contrast-enhanced 3D CT scans from acute pancreatitis patients. Compared with other start-of-the-art approaches for normal pancreas segmentation, our method achieves better performance on both localization and segmentation in acute pancreatitis patients.
Keywords: Region proposal network, object detection, acute pancreatitis, CT segmentation
1. Introduction
As the crucial step for disease diagnosis, treatment, and follow-up, medical image segmentation has always been a research hotspot. Convolutional neural networks (CNNs) have achieved state-of-the-art performance in various visual recognition tasks in the past few years and are widely used due to their powerful, nonlinear feature extraction capabilities. These CNN-based approaches also dominate the literature about medical image processing and have achieved outstanding performance in a broad span of applications [1]. As a focus of attention in medical image analysis [2], pancreas segmentation over a medical image (e.g., CT and MRI) is typically challenging for the following reasons. 1) The pancreas is very small, accounting for less than 0.5% of the entire CT volume. Therefore, such a small volume can make deep segmentation networks such as fully convolutional networks (FCNs) [3] and DeepLab [4] produce confusion errors. 2) The pancreas has high variability in shape, size, and location and visually blurry inter-class boundaries with other tissues, making it difficult to directly locate the pancreas using deep learning-based networks like the R-CNN series [5–7]. Such factors often degenerate data-driven learning algorithms and produce extremely biased results (e.g., eroded significantly by non-pancreas regions and disrupted segmentation results) [8]. Therefore, researchers proposed a two-stage coarse-to-fine approach [9] for pancreas segmentation. The coarse stage provided a rough target localization, and the delicate stage improved segmentation accuracy.
Acute pancreatitis, an inflammatory disorder of the pancreas, is the leading cause of hospital admission for gastrointestinal disorders in the United States and many other countries [10, 11], with an estimated annual hospitalization cost of 2.6 billion dollars per year [12]. The diagnosis and severity assessment of acute pancreatitis are mainly based on imaging examinations, such as CT and MRI, which are the primary evidence for clinicians to make clinical decisions [13]. Pancreas image segmentation for acute pancreatitis patients is more challenging than the segmentation of the normal pancreas. First, the inflamed pancreas invades its surrounding organs and forms very blurry boundaries; and second, the size of inflamed pancreas region has higher variability than the normal pancreas. As shown in Fig.1, the inflamed pancreas region occupies more than a 9% fraction in some slices, while the largest normal pancreas region is no more than 0.8% of the whole slice. No approaches have been proposed to segment the pancreas of patients with acute pancreatitis.
Fig. 1.
Illustration of challenges for pancreas segmentation for acute pancreatitis patients. Fig. (d) and (e) are normal pancreas images from the National Institutes of Health (NIH) dataset [14]. Fig. (a), (b), and (c) are pancreatitis images from our dataset, where we can see the very blurry boundaries of the pancreatitis region with other organs and their variety in size. Red outlines are the boundaries of the pancreas or pancreatitis region.
The commonly used methods [9, 15–22] for normal pancreas segmentation are based on coarse-to-fine pipelines, which firstly apply a model to detect the region of interest (ROI) and then use a heuristic cropping strategy to crop the detected pancreas region. The cut images are afterward fed into a segmentation model. Therefore, the segmentation performance relies heavily on the detection results at the coarse stage. Using current segmentation approaches for the normal pancreas to localize pancreatitis region may bring up the following two potential issues. Issue I. The positioned area may be much larger than the actual pancreatitis boundary, thus introducing too many background regions. Issue II. Such a method may also result in the loss of some exact pancreatitis regions. Fig.2 shows the coarse localization and segmentation of CT images from two patients, including ground truth (a, g), original CT image (b, h), localization result attained by ground truth (c, i), cropped ground truth (d, j), localization result attained by the previous common coarse segmentation approach (e, k), and coarse segmentation result (f, l). As shown in the red boxes, panels c and e are typical representations of Issue I, where the positioned area is much larger than the actual pancreatitis boundary. Panel k and i are typical representations of Issue II, where the positioned area is smaller than the actual pancreatitis boundary.
Fig. 2.
Illustration of some unreliable coarse localization. (a)&(g) are ground truths, (b)&(h) are the original CT images, (c)&(i) are the localized areas obtained using a ground truth bounding box, (d)&(j) are the bounding boxes of ground truth. (e)&(k) are the localized areas obtained using the coarse segmentation method, and (f)&(l) are the bounding boxes of the coarse segmentation mask of the pancreatitis regions. The red boxes show relative localization.
In this paper, we proposed an automated CT pancreas segmentation approach for acute pancreatitis patients by combining a novel object detection approach (FCN-guided RPN) and U-Net [23]. The FCN-guided RPN first accurately detected the pancreatitis region. The detected region was then cropped and sent to the 2D U-Net for segmentation. The experimental results on clinical acute pancreatitis data demonstrated the effectiveness of our proposed approach. The main contributions of our work include the following two aspects. A novel FCN-guided RPN approach was developed for medical object detection and a new two-stage framework for medical image segmentation. Our object detection and image segmentation results outperform other approaches. To the best of our knowledge, we are the first to apply a fully automated approach for CT segmentation of the pancreas from acute pancreatitis patients.
2. Related work
Normal pancreas segmentation:
Deep learning has dominated many research fields [24–28], and many optimization approaches are also proposed [29–33]. Several deep learning-based methods have been applied to abdominal CT segmentation [9, 15–22]. Zhou et al. [9] used an FCN to locate the pancreas region and finished the segmentation based on localization. Similarly, Yu et al. [18]proposed a multi-stage saliency segmentation approach by recurrently using the segmentation mask to enhance the ROI input in the subsequent stage. Unlike the 2D methods mentioned above, several studies proposed 3D networks for pancreas segmentation [20, 34], which can utilize a more inter-slice spatial context for convolution on the transverse plane. Qiu et al. [35]designed a residual transformer block to better extract multi-scale features. Dogan et al. [36] also proposed a two-stage approach for pancreas segmentation, which adopted Mask R-CNN for coarse localization and 3D U-Net for fine segmentation. Chen et al.[37] combined a vision transformer and a 3D U-Net to achieve high-precision automatic pancreas segmentation.
Although current approaches could achieve promising results, those approaches are all evaluated on the NIH dataset [14]. The NIH dataset contains 82 normal contrast-enhanced abdominal pancreas CT scans, and the resolution of each scan is 512×512xL, where L∈[181,466] is the number of slices along the long axis of the body (see Fig.3 (b)). The distance between neighboring voxels ranges from 0.5mm to 1.0mm. However, our clinical acute pancreatitis dataset has unique characteristics and is very different from the NIH dataset in two aspects. First, the resolution of each scan of our data is 512×512xL1, where L1 ∈[29–204] (for most cases, L1<64). The coronal and sagittal slices are too few (see Fig.3(a)) to be used as the input of networks containing several pooling layers. For this reason, we cannot employ the common three-view (axial, coronal, and sagittal view) approaches proposed in [8, 9, 18] on our dataset. Secondly, the slice gap in our data is more significant (5.0–7.0 mm). A limited number of slices are available from each patient (an average of 14 slices per patient). Hence, the existing 3D approach proposed [16, 19, 20] cannot be applied to our acute pancreatitis dataset due to the small amount of data.
Fig. 3.
Domain gap between our clinical acute pancreatitis dataset and the public dataset of NIH [14]. Representative images from acute pancreatitis and NIH normal pancreas data are shown in (a) and (b). The acute pancreatitis dataset has fewer slices on the x and y-axis than the normal pancreas dataset. Both pancreas organs or pancreatitis regions are highlighted in red from different viewpoints.
Deep learning-based object detection:
Current deep learning-based object detectors can be divided into two types: anchor-based and anchor-free. Anchor-based detectors include one-stage and two-stage approaches. In the one-stage approach [38–41], these candidate regions are anchors generated by sliding windows. In the two-stage approach [5–7], the candidate regions are the proposals generated by RPNs. The anchor-free detector is another way to solve the detection problem [42–44], which can directly search for the targets in two different ways without the need to preset the anchor. The anchor-free detector is mainly divided into key point-based and center-based detection algorithms. The former first detects the target’s upper left and lower right corners and then forms a detection frame by combining the corner points. The object’s central region and boundary information are directly detected for the latter, and the classification and regression are decoupled into two sub-grids. These anchor-free methods have more potential generalization ability without anchors’ hyperparameters than anchor-based approaches.
3. Material and methods
Currently, there are two significant ways of image semantic segmentation. One is to feed the entire image into a segmentation network, and the other combines a detector and a segmentation network. For the former, the FCN has been used as the backbone architecture in various image segmentation tasks [45–47]. Medical images have too much interference information. Thus, obtaining good performance by directly using one-stage segmentation networks is not easy. For the latter, a detection network is used to localize the ROI, and the detected area is then sent into a segmentation network. However, unlike natural scene images, the medical image often has low resolution and lacks discriminative information, making it difficult to localize the target using common object detection approaches. With this insight, the detection performance will be significantly improved if the background of the medical images could be well reduced. In our approach, we developed an FCN-guided RPN detector, which used an FCN to reduce the background of the image and an RPN to obtain localization of the pancreas region from acute pancreatitis. Then, we cropped the identified region and sent it into the U-Net model for outputting the segmentation mask.
The overview of the proposed framework is shown in Fig. 4. The FCN-guided RPN detector is highlighted in green. This detector differs from Faster R-CNN [7] in two aspects. 1). We used a deconvoluting feature map rather than an ordinary feature map to keep the same size as the input image. 2). Our RPN includes two more sets of convolution layers than the original RPN for better feature extraction.
Fig. 4.
The overall architecture of the proposed approach. (FCN: fully convolutional network; RPN: region proposal network; ROI: region of interest.)
3.1. Feature map extraction with FCN
The FCN replaces the fully connected layers in CNN with convolutional layers, enabling it to obtain a 2-dimensional feature map of the same size as the input image, thereby achieving pixel-to-pixel semantic segmentation. The feature map is the latent space containing high-dimensional image features. We used the FCN-8s [3] as a baseline to produce feature maps. In this study, the last deconvoluting feature map of the FCN-8 model was used, which is the same size as the input image. The input of FCN needs to be three channels. As mentioned in section 1, our dataset has a considerable slice-to-slice distance along Z-axis; therefore, we duplicated each slice three times to form three channels rather than select successive three images [9] to feed the network for the following two reasons. First, the resolution of each scan of our data is 512×512xL1, where L1 ∈[29–204] (for most cases, L1<64). The number of CT slices varies significantly from patient to patient, making it difficult to determine how many slices should be after interpolation. On the one hand, if a small number of slices are retained after interpolation, some vital information may be discarded for patients with more slices, which is harmful to the training network. On the other hand, if the number of slices retained after interpolation is large, it will introduce many artificially generated slices for patients with fewer slices, which may bring a lot of noise and hurts the network performance. Second, the operation we used (duplicated each slice three times) is simple and can keep accurate information. Since the target area occupies a small region of the input image, the network can be easily confused with the background region. Thus we applied the focal loss [48] to address the class imbalance problem by reshaping the standard cross-entropy function, reducing the loss of weights assigned to well-classified examples. Focal loss is defined as:
| (1) |
where specifies the ground-truth class and [0, 1] is the model’s estimated probability for the class with label . represents the probability that the sample belongs to the true class. is a modulating factor and is the focusing parameter. is a weight coefficient. As the result of grid-search of hyperparameters, the highest accuracy was obtained when , in our experiment.
3.2. Pancreatitis regions localization with the FCN-guided RPN
An RPN contains a classifier and a regressor. The classifier uses the softmax function [49] to classify the positive and opposing proposals, and the regressor is used to fine-tune proposal localization. The RPN architecture used in our paper is shown in Fig.4. First, we use two feature extractors to extract complementary information from the feature map. Each extractor contains two 3×3 convolutional layers followed by a ReLU [50] layer and a max-pooling layer. The filters in the two extractors are 128 and 256, respectively. Then one more 3×3 convolutional layer followed by a ReLU layer is used to make the feature more robust. Furthermore, those features are sent to the classifier and regressor. Finally, the ROI will be attained after the ROI pooling operation (ROI pooling uses a single feature map for all the proposals generated by RPN in a single cycle. ROI pooling solves the fixed image size requirements of object detection networks. The entire image feeds a CNN model to detect ROI on the feature maps). The whole loss function of RPN is defined as [7]:
| (2) |
where is the anchor index and pi is the predicted probability of anchor i being an object. The ground-truth label is 1 if the anchor is positive, and is 0 if the anchor is negative. is a vector representing the 4 parameterized coordinates of the predicted bounding box, and is the ground-truth box associated with a positive anchor. represents the classification loss, which is cross-entry loss. represents the regress loss, which is the smooth loss defined in [5].
3.3. Segmentation with U-Net.
U-Net is the most famous network for medical image segmentation and has excellent performance even with a small input training dataset. Therefore, it has become the baseline in various biomedical image segmentation tasks, including pancreas segmentation [35–37]. The critical advantage of U-Net is its skip connection, which can merge low- and high-level image features. Compared with the FCN, Deeplab, etc., U-Net upsamples four times and uses skip connection in the same stage instead of directly supervising and loss-reverting on high-level semantic features, ensuring the final recovery. The feature maps incorporate more low-level features and enable the fuse of the features at different scales so that multi-scale prediction and deep supervision can be performed. The four-times upsampling also makes the segmentation map recover more detailed information, such as edges. Based on the localization of the pancreas obtained with RPN, we estimated the bounding box, added a frame with a fixed width, cropped the image region accordingly, and fed it to the U-Net model. As reported previously [51], in the last layer, we used two 1×1 convolutions followed by softmax to reduce the output channels to two, one for the foreground and one for the background. The loss function is cross-entropy loss (eq. (3)).
| (3) |
where denotes the prediction and denotes the ground truth.
3.4. Dataset and Evaluation.
We evaluated our approach on the clinical acute pancreatitis dataset consisting of 89 enhanced intravenous CT scans (~70 seconds after intravenous contrast injection in portal-venous) from patients diagnosed with acute pancreatitis. These data were manually labeled by three medical students using ITK-SNAP [52] and verified by an experienced radiologist. To increase the amount and diversity of data, we performed simple but effective augmentations on all training data, including rotation (90°, 270°) and flipping (vertical and horizontal vertical). We used 4-fold validation (train: validation=67:22) to evaluate the proposed approach. The data augmentation was only performed on the training dataset for each validation. To balance the positive and negative samples, we selected 2D slices in which the pancreatitis region occupied at least 900 pixels as the training data. That prevented the model from being heavily impacted by the noisy background contents. After data augmentation, we attained around 5300 slices with the size 512×512 for training and around 280 slices with the size 512×512 for testing. We measured the segmentation accuracy using the Dice-Sørensen Coefficient (DSC) and reported the average accuracy with 4-fold cross-validation. At the same time, we measured the localization accuracy by using four metrics: intersection over union (IoU) score, recall, accuracy, and precision. Those evaluation metrics could be seen from eq. (4)–eq. (8).
| (4) |
where i represents the ith pixel; p and y separately represent the corresponding predicted result and ground truth of ith pixel. DSC ranges from 0 to 1, and bigger DSC value means better segmentation performance.
| (5) |
| (6) |
| (7) |
| (8) |
where TP represents true positive, TN represents the true negative, FN represents false negative and FP represents false positive.
3.5. Experimental Settings.
Our proposed approach consists of three steps: feature map extraction with FCN, pancreatitis regions localization with the FCN-guided RPN, and segmentation with U-Net. For convenience, we refer to the ground truth in each stage as GT1, GT2, and GT3. GT1 is the annotated binary mask of the pancreatitis region. GT2 is the bounding box of the ROI, generated using the minimum circumscribed rectangle of GT1. GT3 is a cropped GT1, the same size as GT2. For training the FCN model, due to the small sample size, we first initialized the model weights by using a pre-trained model on NIH normal pancreas. The FCN model was fed with original CT images and GT1, and trained for 100 epochs. While training the RPN model, the feature maps generated from FCN and GT2 were fed into the model. We respectively set the value of the IoU score between the candidate and ground truth to 0.6 and 0.3 by the grid-search strategy to determine the positive and negative anchors following [7] The RPN model was trained for 50 epochs. While training the U-Net model, we cropped the ROI by GT3 (and added a fixed frame of 30 pixels wide around it) and trained the model for 100 epochs. In the inference stage, the original CT image was fed into the FCN to obtain the feature map. Then the feature map was sent to the RPN to generate the predicted bounding box of the pancreatitis region. After that, we cropped each slice according to the box and added a fixed frame of 30 pixels wide around it. Before being fed into the U-Net, the images were resized to [192,192]. We adopted a dynamic learning rate (learning rate =0.01, decay=0.0005, momentum=0.9) and an early stopping strategy in the training stage to avoid overfitting. Stochastic gradient descent (SGD) was used for optimization [53]. All the networks were trained and tested by a 32GB-RAM NVIDIA Tesla V100 GPU. The loss curves in the training stage are presented in Fig.5.
Fig.5.
The loss curves for the training stage in different steps.
3.6. Experimental Results.
As mentioned in Section 1, there are no pancreas from acute pancreatitis patients segmentation approaches proposed yet. Here, we summarized those methods used for normal pancreas segmentation, M1, M2 and M3, for comparison with our proposed approach. Those methods use different detectors and the same segmenter (2D U-Net) to make a fair comparison. M1 is the current commonly used detector for normal pancreas segmentation. It uses FCN to obtain coarse segmentation results and corresponding bounding boxes based on the binary image. M2 uses Faster R-CNN [7] on the original CT image to obtain the localization of the pancreas region. Our proposed method uses FCN-guided RPN to localize the acute pancreatitis region. M3 uses ground-truth (oracle) bounding boxes for individual testing cases to generate the input volume, representing the localization performance’s upper limit. These localization results all have been added to a frame of 30 pixels in width. Besides, we added the latest approach, UNeXt model [54], for comparison. UNeXt [54] is a state-of-the-art medical image segmentation method that combines an early convolutional stage and a convolutional multilayer perceptron (MLP). Like U-Net, UNeXt is a one-stage approach for direct segmentation. The segmentation results are shown in Table 1, where we can see that our approach outperforms others overall. Table 2 shows the localization performance using different two-stage approaches. The values of accuracy and recall reflect the degree of Issue I (the localization results are much larger than the actual pancreas boundary) and Issue II (the localization results are unreliable for discarding some actual pancreas regions) mentioned in Fig. 2.
Table 1.
Evaluation results (measured by DSC) of pancreas segmentation for acute pancreatitis patients on our clinical data using different methods.
| Method | Mean DSC % | Max DSC % | Min DSC % |
|---|---|---|---|
|
| |||
| M1 | 60.01±20.12 | 75.58 | 41.24 |
| M2 | 55.81±22.02 | 72.38 | 38.86 |
| UNeXt [54] | 65.89±18.34 | 79.88 | 45.36 |
| Ours | 66.82±16.44 | 80.12 | 45.47 |
|
| |||
| M3 (Oracle Bounding Box) |
81.70±9.46 | 90.22 | 74.66 |
M1: Using the FCN model as the detector; M2: Using the Faster R-CNN model as the detector; M3: Using the oracle bounding box generated by the ground truth as the localization.
Table 2.
Evaluation of localization results.
| Method | M1 | M2 | Ours |
|---|---|---|---|
|
| |||
| IoU | 0.5511 | 0.4546 | 0.5703 |
| Recall | 0.8262 | 0.8010 | 0.8710 |
| Accuracy | 0.6185 | 0.5653 | 0.7172 |
| Precision | 0.7230 | 0.6848 | 0.7552 |
M1: detection by an FCN model; M2: detection by Faster R-CNN model.
From the results in Table 1 and Table 2, we have the following findings. (1) Compared with the FCN and Faster R-CNN models, our proposed FCN-guided RPN detector achieved higher localization accuracy. The reason is that the feature maps generated by an FCN model have less background, based on which a detector will perform better. Although inducing a more advanced technique, the UNeXt model has worse performance, showing one-stage approach still has distance compared with the two-stage method on the pancreas segmentation from pancreatitis patients. (2) The difference between the max DSC and min DSC is significant, and the variance of the Mean DSC is considerable, showing that the pancreas varies significantly among patients. (3) Although our proposed approach outperforms other approaches, there is still a big gap from the ideal situation (M3). The reason is that the positioning is not entirely accurate for the variable target location of pancreatitis. Fig.6 visualizes the localization results using M1, M3, and our proposed approach. We can see our approach overcomes the shortcomings of M1 (Issues I and II mentioned in Section 1). The inference time is around 20s for a case with the size 512×512×60 on a 32GB-RAM NVIDIA Tesla V100 GPU, which is acceptable.
Fig. 6.
Selected examples of pancreas localization results for acute pancreatitis patients. The red box, fuchsia box, and green box separately illustrate the ideal localization results (obtained by ground truth), localization results obtained using our approach, and localization results obtained using M1. From (a)-(d), we can see that the coarse bounding boxes (in fuchsia) are much bigger than the ground truth bounding boxes (in red). So much background will be induced if sending the cropped images obtained using coarse bounding boxes into a segmentation model (reflecting the Issue I in Fig.2). (e)-(h) show that the coarse bounding boxes obtained using M1 (in fuchsia) do not entirely cover the ROI (in red), discarding some exact pancreatitis regions (reflecting the Issue II in Fig.2).
4. Discussion
Generally, the quality for disease-specific datasets obtained from hospitals is often poor due to cost. For example, the contrast between the target and the background is shallow, there are fewer slices containing the target object, and the training set size is small. Therefore, establishing approaches to solve real clinical problems is much more challenging than evaluating public datasets. According to the unique characteristics of our clinical data, we developed a novel approach to overcome the challenges. As expected, our approach outperforms other methods owing to employing FCN-guided RPN detector. Compared with widely used object detection approaches such as Faster R-CNN, the FCN-guided RPN detector detects objects on the feature maps produced by the FCN model, which can significantly reduce the interference from other organs.
Although our proposed approach has shown promising results in segmenting most pancreatic slices from acute pancreatitis patients, there is still room for improvement. We divided the task into two stages (detection and segmentation) and trained models separately instead of jointly. Therefore, our method is not end-to-end, which may not minimize the global energy function. Sometimes, the RPN model didn’t generate the expected results because the upstream FCN model removes too much information. We also found false positive results on slices near the slices with minimum object content. For example, for 3D volume data, if the first slice along the body’s long axis where the acute pancreatitis target appeared as the 20th slice, the model tends to predict false positives from the 15th slice to the 19th slice. The main reason may be that we only select 2D slices with at least 900 pixels of the pancreas as training data while training the FCN model. In future work, we will incorporate our proposed model into an end-to-end learning framework to minimize a global energy function to improve performance. Because the size and location of inflamed pancreatic organs vary greatly, global information is critical. We can use a transformer model [55] to capture this global information to achieve better segmentation.
5. Conclusion
In this paper, we propose an automated CT segmentation approach for the pancreas from patients with acute pancreatitis by combining a novel object detection approach (FCN-guided RPN detector) and U-Net. Our approach is intuitive and effective. Specifically, we first use an FCN to remove most of the background. Then, we use an RPN detector to localize the ROI. Finally, the U-Net is used to segment the pancreatitis region. Our approach outperforms other methods when evaluated on CT image datasets from acute pancreatitis patients collected from the hospital. To the best of our knowledge, we are the first to apply a fully automated approach for pancreas segmentation in patients with acute pancreatitis. The computerized toolkit we developed will help to enhance the computer-aided quantitative analysis of pancreatic CT images of patients with acute pancreatitis.
Acknowledgment
This work is supported partially by a grant from NIH R01DE027027. The authors would like to thank West China Hospital, Sichuan University for providing the clinical data.
Footnotes
Ethics approval
This study was approved by the Institutional Review Board of the University of Texas Health Science Center at Houston.
Declaration of competing interest
We declare we have no competing interests.
Conflict of interest statement: None Declared.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- [1].Litjens G. et al. , “A survey on deep learning in medical image analysis,” Medical image analysis, vol. 42, pp. 60–88, 2017. [DOI] [PubMed] [Google Scholar]
- [2].Chakraborty J. et al. , “CT radiomics to predict high‐risk intraductal papillary mucinous neoplasms of the pancreas,” Medical physics, vol. 45, no. 11, pp. 5019–5029, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Long J, Shelhamer E, and Darrell T, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440. [DOI] [PubMed] [Google Scholar]
- [4].Chen L-C, Papandreou G, Kokkinos I, Murphy K, and Yuille AL, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017. [DOI] [PubMed] [Google Scholar]
- [5].Girshick R, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448. [Google Scholar]
- [6].Girshick R, Donahue J, Darrell T, and Malik J, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587. [Google Scholar]
- [7].Ren S, He K, Girshick R, and Sun J, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99. [DOI] [PubMed] [Google Scholar]
- [8].Man Y, Huang Y, Feng J, Li X, and Wu F, “Deep Q learning driven ct pancreas segmentation with geometry-aware u-net,” IEEE transactions on medical imaging, vol. 38, no. 8, pp. 1971–1980, 2019. [DOI] [PubMed] [Google Scholar]
- [9].Zhou Y, Xie L, Shen W, Wang Y, Fishman EK, and Yuille AL, “A fixed-point model for pancreas segmentation in abdominal CT scans,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2017: Springer, pp. 693–701. [Google Scholar]
- [10].Yadav D. and Lowenfels AB, “The Epidemiology of Pancreatitis and Pancreatic Cancer,” Gastroenterology, vol. 144, no. 6, pp. 1252–1261, 2013, doi: 10.1053/j.gastro.2013.01.068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Lankisch PG, Apte M, and Banks PA, “Acute pancreatitis,” The Lancet, vol. 386, no. 9988, pp. 85–96, 2015, doi: 10.1016/S0140-6736(14)60649-8. [DOI] [PubMed] [Google Scholar]
- [12].Peery AF et al. , “Burden of Gastrointestinal Disease in the United States: 2012. Update,” Gastroenterology, vol. 143, no. 5, pp. 1179–1187.e3, 2012, doi: 10.1053/j.gastro.2012.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Zhang Y. et al. , “Radiomics analysis for the differentiation of autoimmune pancreatitis and pancreatic ductal adenocarcinoma in 18F‐FDG PET/CT,” Medical physics, vol. 46, no. 10, pp. 4520–4530, 2019. [DOI] [PubMed] [Google Scholar]
- [14].Roth HR et al. , “Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation,” in International conference on medical image computing and computer-assisted intervention, 2015: Springer, pp. 556–564. [Google Scholar]
- [15].Cai J, Lu L, Xing F, and Yang L, “Pancreas segmentation in CT and MRI images via domain specific network designing and recurrent neural contextual learning,” arXiv preprint arXiv:1803.11303, 2018. [Google Scholar]
- [16].Chen H, Wang X, Huang Y, Wu X, Yu Y, and Wang L, “Harnessing 2d networks and 3d features for automated pancreas segmentation from volumetric ct images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2019: Springer, pp. 339–347. [Google Scholar]
- [17].Roth HR, Lu L, Farag A, Sohn A, and Summers RM, “Spatial aggregation of holistically-nested networks for automated pancreas segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2016: Springer, pp. 451–459. [Google Scholar]
- [18].Yu Q, Xie L, Wang Y, Zhou Y, Fishman EK, and Yuille AL, “Recurrent saliency transformation network: Incorporating multi-stage visual cues for small organ segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8280–8289. [Google Scholar]
- [19].Zhao N, Tong N, Ruan D, and Sheng K, “Fully Automated Pancreas Segmentation with Two-Stage 3D Convolutional Neural Networks,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2019: Springer, pp. 201–209. [Google Scholar]
- [20].Zhu Z, Xia Y, Shen W, Fishman E, and Yuille A, “A 3d coarse-to-fine framework for volumetric medical image segmentation,” in 2018. International Conference on 3D Vision (3DV), 2018: IEEE, pp. 682–690. [Google Scholar]
- [21].Zhu Z, Xia Y, Xie L, Fishman EK, and Yuille AL, “Multi-scale coarse-to-fine segmentation for screening pancreatic ductal adenocarcinoma,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2019: Springer, pp. 3–12. [Google Scholar]
- [22].Zheng H, Qian L, Qin Y, Gu Y, and Yang J, “Improving the Slice Interaction of 2.5 D CNN for Automatic Pancreas Segmentation,” Medical Physics, 2020. [DOI] [PubMed] [Google Scholar]
- [23].Ronneberger O, Fischer P, and Brox T, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234–241. [Google Scholar]
- [24].Hu H, Guan Q, Chen S, Ji Z, and Lin Y, “Detection and recognition for life state of cell cancer using two-stage cascade CNNs,” IEEE/ACM transactions on computational biology and bioinformatics, vol. 17, no. 3, pp. 887–898, 2017. [DOI] [PubMed] [Google Scholar]
- [25].Ji Z, Zhao W, Lin H-K, and Zhou X, “Systematically understanding the immunity leading to CRPC progression,” PLoS computational biology, vol. 15, no. 9, p. e1007344, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].LeCun Y, Bengio Y, and Hinton G, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015. [DOI] [PubMed] [Google Scholar]
- [27].Min S, Lee B, and Yoon S, “Deep learning in bioinformatics,” Briefings in bioinformatics, vol. 18, no. 5, pp. 851–869, 2017. [DOI] [PubMed] [Google Scholar]
- [28].Zhou SK et al. , “A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises,” Proceedings of the IEEE, vol. 109, no. 5, pp. 820–838, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Mahajan S, Abualigah L, and Pandit AK, “Hybrid arithmetic optimization algorithm with hunger games search for global optimization,” Multimedia Tools and Applications, pp. 1–24, 2022. [Google Scholar]
- [30].Mahajan S, Abualigah L, Pandit AK, and Altalhi M, “Hybrid Aquila optimizer with arithmetic optimization algorithm for global optimization tasks,” Soft Computing, vol. 26, no. 10, pp. 4863–4881, 2022. [Google Scholar]
- [31].Mahajan S. et al. , “Fusion of modern meta-heuristic optimization methods using arithmetic optimization algorithm for global optimization tasks,” Soft Computing, pp. 1–15, 2022. [Google Scholar]
- [32].Mahajan S. and Pandit AK, “Hybrid method to supervise feature selection using signal processing and complex algebra techniques,” Multimedia Tools and Applications, pp. 1–22, 2021.34975282 [Google Scholar]
- [33].Mahajan S. and Pandit AK, “Image segmentation and optimization techniques: a short overview,” Medicon Eng Themes, vol. 2, no. 2, pp. 47–49, 2022. [Google Scholar]
- [34].Wu W, Gao L, Duan H, Huang G, Ye X, and Nie S, “Segmentation of pulmonary nodules in CT images based on 3D‐UNET combined with three‐dimensional conditional random field optimization,” Medical Physics. [DOI] [PubMed] [Google Scholar]
- [35].Qiu C. et al. , “RTUNet: Residual transformer UNet specifically for pancreas segmentation,” Biomedical Signal Processing and Control, vol. 79, p. 104173, 2023. [Google Scholar]
- [36].Dogan RO, Dogan H, Bayrak C, and Kayikcioglu T, “A two-phase approach using mask R-CNN and 3D U-Net for high-accuracy automatic segmentation of pancreas in CT imaging,” Computer Methods and Programs in Biomedicine, vol. 207, p. 106141, 2021. [DOI] [PubMed] [Google Scholar]
- [37].Chen L. and Wan L, “CTUNet: automatic pancreas segmentation using a channel-wise transformer and 3D U-Net,” The Visual Computer, pp. 1–15, 2022. [Google Scholar]
- [38].Redmon J, Divvala S, Girshick R, and Farhadi A, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788. [Google Scholar]
- [39].Bochkovskiy A, Wang C-Y, and Liao H-YM, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint arXiv:2004.10934, 2020. [Google Scholar]
- [40].Redmon J. and Farhadi A, “YOLO9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271. [Google Scholar]
- [41].Redmon J. and Farhadi A, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018. [Google Scholar]
- [42].Duan K, Bai S, Xie L, Qi H, Huang Q, and Tian Q, “Centernet: Keypoint triplets for object detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6569–6578. [Google Scholar]
- [43].Law H. and Deng J, “Cornernet: Detecting objects as paired keypoints,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 734–750. [Google Scholar]
- [44].Zhou X, Zhuo J, and Krahenbuhl P, “Bottom-up object detection by grouping extreme and center points,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 850–859. [Google Scholar]
- [45].Badrinarayanan V, Kendall A, and Cipolla R, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017. [DOI] [PubMed] [Google Scholar]
- [46].Noh H, Hong S, and Han B, “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1520–1528. [Google Scholar]
- [47].Szegedy C, Ioffe S, Vanhoucke V, and Alemi AA, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Thirty-first AAAI conference on artificial intelligence, 2017. [Google Scholar]
- [48].Lin T-Y, Goyal P, Girshick R, He K, and Dollár P, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988. [Google Scholar]
- [49].Goodfellow I, Bengio Y, and Courville A, Deep learning. MIT press, 2016. [Google Scholar]
- [50].Krizhevsky A, Sutskever I, and Hinton GE, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105. [Google Scholar]
- [51].Deng Y. et al. , “A New Framework to Reduce Doctor’s Workload for Medical Image Annotation,” IEEE Access, vol. 7, pp. 107097–107104, 2019. [Google Scholar]
- [52].Yushkevich PA et al. , “User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability,” Neuroimage, vol. 31, no. 3, pp. 1116–1128, 2006. [DOI] [PubMed] [Google Scholar]
- [53].Ruder S, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747, 2016. [Google Scholar]
- [54].Valanarasu JMJ and Patel VM, “UNeXt: MLP-based Rapid Medical Image Segmentation Network,” arXiv preprint arXiv:2203.04967, 2022. [Google Scholar]
- [55].Vaswani A. et al. , “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [Google Scholar]






