Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jan 25.
Published in final edited form as: Mach Learn Med Imaging. 2023 Oct 15;14349:134–143. doi: 10.1007/978-3-031-45676-3_14

Radiomics Boosts Deep Learning Model for IPMN Classification

Lanhong Yao 1, Zheyuan Zhang 1, Ugur Demir 1, Elif Keles 1, Camila Vendrami 1, Emil Agarunov 2, Candice Bolan 3, Ivo Schoots 4, Marc Bruno 4, Rajesh Keswani 1, Frank Miller 1, Tamas Gonda 2, Cemal Yazici 5, Temel Tirkes 6, Michael Wallace 7, Concetto Spampinato 8, Ulas Bagci 1
PMCID: PMC10810260  NIHMSID: NIHMS1959265  PMID: 38274402

Abstract

Intraductal Papillary Mucinous Neoplasm (IPMN) cysts are pre-malignant pancreas lesions, and they can progress into pancreatic cancer. Therefore, detecting and stratifying their risk level is of ultimate importance for effective treatment planning and disease control. However, this is a highly challenging task because of the diverse and irregular shape, texture, and size of the IPMN cysts as well as the pancreas. In this study, we propose a novel computer-aided diagnosis pipeline for IPMN risk classification from multi-contrast MRI scans. Our proposed analysis framework includes an efficient volumetric self-adapting segmentation strategy for pancreas delineation, followed by a newly designed deep learning-based classification scheme with a radiomics-based predictive approach. We test our proposed decision-fusion model in multi-center data sets of 246 multi-contrast MRI scans and obtain superior performance to the state of the art (SOTA) in this field. Our ablation studies demonstrate the significance of both radiomics and deep learning modules for achieving the new SOTA performance compared to international guidelines and published studies (81.9% vs 61.3% in accuracy). Our findings have important implications for clinical decision-making. In a series of rigorous experiments on multi-center data sets (246 MRI scans from five centers), we achieved unprecedented performance (81.9% accuracy). The code is available upon publication.

Keywords: Radiomics, IPMN Classification, Pancreatic Cysts, MRI, Pancreas Segmentation

1. Introduction

Pancreatic cancer is a deadly disease with a low 5-year survival rate, primarily because it is often diagnosed at a late stage [1,10]. Early detection is crucial for improving survival rates and gaining a better understanding of tumor pathophysiology. Therefore, research on pancreatic cysts is significant, since some types, such as intraductal papillary mucinous neoplasms (IPMN), can potentially develop into pancreatic cancer [12]. Hence, the diagnosis of IPMN cysts and the prediction of their likelihood of transforming into pancreatic cancer are essential for early detection and disease management. Our study is aligned with this objective and aims to contribute to aiding the early detection of pancreatic cancer.

Diagnosis of IPMN involves a combination of imaging studies, laboratory tests, and sometimes biopsy. Imaging studies utilize CT, MRI, and EUS scans to visualize the pancreas and detect any cystic lesions. The size, location, shape, texture, and other characteristics of the lesions are used for radiographical evaluations. Current international guidelines (AGA, ACG, and IAP) [3,11,13] state that IPMNs should be classified into low-risk or high-risk based on their size, morphology, and presence of high-risk features such as main pancreatic duct involvement, mural nodules, or elevated cyst fluid CEA levels. While high-risk IPMNs should be considered for surgical resection, low-risk IPMNs may be managed with surveillance.

Prior Art.

Radiographical identification of IPMNs is important but falls short of diagnostic accuracy, thus there is a need for improving the current standards for IPMN risk stratifications [9]. Several studies have demonstrated the potential of deep learning in IPMN diagnosis, including the use of convolutional neural networks (CNN) [6], inflated neural networks (INN) [10], and neural transformers [15]. However, these studies analyzed MRIs from a single center with a small number of patients and did not include the pancreas segmentation step, only using directly cropped pancreas regions or whole images for classification. Still, these models showed promising results compared to the international guidelines. While deep learning techniques have shown the potential to improve the accuracy and efficiency of IPMN diagnosis and risk stratification [2], further research and validation are needed to determine the clinical utility and their potential impact on patient outcomes. Our work addresses the limitations of current deep learning models by designing a new computer-aided diagnosis (CAD) system, which includes a fully automated pipeline including (1) MRI cleaning with preprocessing, (2) segmentation of pancreas, (3) classification with decision fusion of deep learning and radiomics, (4) statistical analysis of clinical features and incorporation of the pancreas volume into clinical decision system, and (5) testing and validation of the whole system in the multi-center settings. Figure 1 shows an overview of the proposed novel CAD system.

Fig. 1.

Fig. 1.

An overview of our proposed CAD system is shown. Multi-contrast MRI (T1 and T2) are preprocessed with inhomogeneity correction, denoising, and intensity standardization. Clean images are then used to segment the pancreas region. ROI enclosing the segmented pancreases are fed into a deep learning classifier. Clinical features selected through statistical analysis are fed into radiomics classifier. Decision vectors (probability) from both classifiers are combined via a weighted averaging-based decision fusion strategy for final IPMN cyst stratification.

Summary of Our Contributions.

To the best of our knowledge, this is the first study having a fully automated pipeline for IPMN diagnosis and risk stratification, developed and evaluated on multi-center data. Our major contributions are as follows:

  1. We develop the first fully automated CAD system that utilizes a powerful combination of deep learning, radiomics, and clinical features - all integrated into a single decision support system via a weighted averaging-based decision fusion strategy.

  2. Unlike existing IPMN CAD systems, which do not include MRI segmentation of the pancreas and require cumbersome manual annotations on the pancreas, we incorporate the volumetric self-adapting segmentation network (nnUNet) to effectively segment the pancreas from MRI scans with high accuracy and ease.

  3. We present a simple yet convenient approach to fusing radiomics that can significantly improve the DL model’s performance by up to 20%. This powerful tool allows us to deliver more accurate diagnoses for IPMN.

  4. Through rigorous statistical analysis of 8 clinical features, we identify pancreas volume as a potential predictor of risk levels of IPMN cysts. By leveraging this vital information, we enhance the decision fusion mechanism to achieve exceptional overall model accuracy.

2. Materials and Methods

2.1. Dataset

In compliance with ethical standards, our study is approved by the Institutional Review Board (IRB), and necessary privacy considerations are taken into account: all images are de-identified before usage. We obtain 246 MRI scans (both T1 and T2) from five centers: Mayo Clinic in Florida (MCF), Mayo Clinic in Arizona (MCA), Allegheny Health Network (AHN), Northwestern Memorial Hospital (NMH), and New York University Langone Hospital (NYU). All T1 and T2 images are registered and segmentation masks are generated using a fast and reliable segmentation network (see below for the segmentation section). Segmentations are examined by radiologists case by case to ensure their correctness. The ground truth labels of IPMN risk classifications are determined based on either biopsy exams or surveillance information with radiographical evaluation, and overall three balanced classes are considered for risk stratification experiments: healthy (70 cases), low-grade risk (85 cases), and high-grade risk (91 cases).

2.2. Preprocessing

MRI presents unique challenges, including intensity inhomogeneities, noise, non-standardization, and other artifacts. These challenges often arise from variations in acquisition parameters and hardware, even when using the same scanner and operators at different times of the day or with different patients. Therefore, preprocessing MRI scans across different acquisitions, scanners, and patient populations is necessary. In our study, we perform the following preprocessing steps on the images before feeding them into the segmentor and classifiers: Initially, images are reoriented in accordance with the RAS axes convention. Subsequent steps involve the application of bias correction and denoising methodologies, designed to mitigate artifacts and augment image fidelity. Further, we employ Nyul’s method [14] for intensity standardization, harmonizing the intensity values of each image with a designated reference distribution. Figure 1 illustrates the image histograms pre- and post-preprocessing, underscoring the efficacy of our standardization procedure. These preprocessing steps can help improve the robustness and reliability of deep learning models.

2.3. Pancreas Segmentation

Pancreas volumetry is a prerequisite for the diagnosis and prognosis of several pancreatic diseases, requiring radiology scans to be segmented automatically as manual annotation is highly costly and inefficient. In this module of the CAD system, our aim is to develop a clinically stable and accurate deep learning-based segmentation algorithm for the pancreas from MRI scans in multi-center settings to prove its generalization efficacy. Among 246 scans, we randomly select 131 MRI images (T2) from multi-center settings: 61 cases from NMH, 15 cases from NYU, and 55 cases from MCF. Annotations are obtained from three centers’ data. The segmentation masks are used for pancreas region of interest (ROI) boundary extraction in radiomics and deep learning-based classification rather than exact pixel analysis. We present a robust and accurate deep learning algorithm based on the 3D nnUNet architecture with SGD optimization [7].

2.4. Model Building for Risk Stratification

Radiomics Classifier

Radiomics involves extracting a large number of quantitative features from medical images, offering valuable insights into disease characterization, notably in IPMN where shape and texture are vital to classifications. For this study, 107 distinct features are extracted from both T1 and T2 images, within the ROI enclosing pancreas segmentation mask. They capture characteristics such as texture, shape, and intensity. To mitigate disparities across scales in the radiomics data, we employ the ln(x + 1) transformation and unit variance scaling. Further, analyze 8 clinical features using an OLS regression model to evaluate their predictive efficacy for IPMN risk: diabetes mellitus, pancreas volume, pancreas diagonal, volume over diagonal ratio, age, gender, BMI, and chronic pancreatitis. Through stepwise regression, we refine the model’s focus to key features. T-tests reveal significant differences in pancreas volume across IPMN risk groups, consistent with prior medical knowledge. Notably, pancreas volume shows predictive efficacy for IPMN risk, leading to its inclusion as a vital clinical feature in our risk prediction model.

Deep Learning Classifier

We utilize one Transformer-based and four CNN-based architectures to compare and evaluate the IPMN risk assessment. Neural transformers [15] is notably the first application of vision transformers (ViT) in pancreas risk stratification and has obtained promising results but with a limited size of data from a single center. DenseNet [5], ResNet18 [4], AlexNet [8], and MobileNet [16] are all well-known CNN-based architectures that have been developed in recent years and have been shown to perform well on various computer vision tasks. Herein, we benchmark these models to create baselines and compare/contrast their benefits and limitations.

Weighted averaging based decision fusion

The sparse feature vectors learned by DL models pose a challenge for feature fusion with the radiomics model. To address this, we implement weighted averaging-based decision fusion, where we combine the weighted probabilities of the DL classifier (shown below in Eq. 1) and radiomics classifier:

Pc={Pr,ifmax(Pr)tkPd+(1k)Pr,otherwise (1)

where k and t are parameters to adjust the decision fusion of two models, and they are selected via grid search during cross-validation. Pd and Pr are probability vectors, predicted by the DL classifier and radiomics classifier, respectively. Pc refers to the combined probabilities, based on which we get the final fused predictions. For each case, Pc for three classes add up to one. This Pc is used in the cases when the maximum probability of the radiomics classifier is less than a threshold t, indicating the radiomics classifier is not confident in its decision and could use extra information.

2.5. Training Details

The dataset used in this study consists of 246 patients from five medical centers. Out of these, 49 cases are randomly selected for blind testing (i.e., independent test), and are unseen by any of the models. The remaining 197 cases are split into training and validation sets. Every set incorporates data across all participating medical centers. This consistent distribution ensures an unbiased evaluation environment. We employ the same evaluation procedure for all the models.

The deep learning (DL) models have been developed utilizing the PyTorch framework and executed on an NVIDIA RTX A6000 GPU. The training was conducted with a batch size of 16 across a maximum of 1500 epochs. The radiomics classifier employs XGBOOST with grid search to identify the best parameters. The optimal parameters are determined as follows: number of estimators=140 and maximum depth=4.

3. Results

3.1. Segmentation

We employ standard 5-fold cross-validation for the training and use Dice score (higher is better), Hausdorff distance at 95% (HD95, lower is better), Precision, and Recall for quantitative evaluations. Dice score for CT-based segmentation in the literature reaches a plateau value of 85% but MRI segmentations hardly reach 50-60% with a limited number of research papers [17]. Herein, our segmentation results reach 70% for multi-center data, showing a significant increase from the current standards. Fig. 2 shows qualitative visualization of the predicted segmentation results compared with the reference standards provided by the radiologists, demonstrating highly accurate segmentations.

Fig. 2.

Fig. 2.

Qualitative pancreas segmentation visualization on multi-center data. Predicted segmentation maps (yellow line) are highly similar to the ground truth annotations (red line) in anatomical structure regardless of image variance.

We conduct a comprehensive quantitative evaluation. Table 1 shows the quantitative evaluation results on multi-domain, and particularly, we reach one average Dice of 70.11% which has never been achieved before in the literature in multi-center settings.

Table 1.

Multi-center pancreas MRI segmentation performance comparison. The model reaches a sufficiently accurate segmentation with average Dice of 70.11.

Data Center Dice HD95(mm) Precision Recall Case
MCF 69.51±2.96 28.36±23.59 77.18±3.50 66.11±4.36 61
NMH 66.02±1.84 14.91±9.95 63.39±4.30 71.19±6.23 15
NYU 71.90±3.20 26.59±12.00 75.9±1.59 71.73±4.56 55

Average 70.11±2.96 26.08±18.19 75.06±2.98 69.05±4.70 Sum:131

3.2. Classification

Metrics for evaluating the models’ clasification performance include Accuracy (ACC), the Area Under the receiver operating characteristic Curve (AUC), Precision (PR), and Recall (RC). Higher values in these metrics indicate better performance. We boost the classification results with the combination of radiomics and deep learning classifiers and obtain new SOTA results (Table 2). Despite the success of our proposed ensemble, we also identify some challenging cases where our classifiers fail to identify the existence and type of the IPMNs in Figure 3 (the second row for the failure cases compared to the first row for successfully predicted cases).

Table 2.

Quantitative comparison (%) for the influence of combining radiomics with deep learning. We can observe that regardless of network structure, combining radiomics with deep learning can impressively leverage the IPMN classification performance. Similarly, combining the deep learning extracted features can also leverage the performance of radiomics for the IPMN classification.

Network w/o Radiomics w/ Radiomics

Metrics ACC AUC PR RC ACC AUC PR RC
w/o DL - - - - 71.6 89.7 74.7 75.2

DenseNet 57.4 68.4 55.9 56.2 75.8 87.4 74.7 76.0
ResNet18 48.9 66.6 46.3 48.0 73.7 89.3 76.1 76.7
AlexNet 57.0 64.8 60.3 54.7 77.7 89.8 76.5 75.2
MobileNet 57.0 64.4 57.7 54.6 71.6 89.3 74.7 75.2

ViT 61.3 71.9 56.2 56.6 81.9 89.3 82.4 82.7

Fig. 3.

Fig. 3.

First and second rows show MRI cases where IPMN diagnosis and stratification are done correctly and incorrectly, respectively. Zoomed versions of the pancreas regions demonstrate the shape and appearance diversity.

We assess the performance of a deep learning (DL) model and a radiomics model individually and in combination, for the classification of IPMN using multi-center multi-contrast MRI data. On a single center scenario, the DL model performs comparably to previous literature [15]. However, when the models are tested on multi-center data, the performance of the DL model decreases, likely due to the heterogeneity of multi-center data compared to a single institution. We also observe that the decision fusion of the radiomics predictions to the DL model prediction improves its performance on multi-center data. This improvement can be attributed to the domain knowledge contained in the handcrafted radiomics features more than the deep features which are high-dimensional and sparse. Radiomics features are designed to capture important characteristics of IPMNs in dense vectors with more controlled variations, indicating a better generalization over multi-center data. The fusion method can be also viewed as a trainable linear layer on probability vectors. Lastly, our findings suggest that the information captured by the DL layers and radiomics features is complementary to a certain degree, and combining them can yield better performance than using either approach alone.

To understand this further, we run experiments of radiomics classifiers with different combinations of features: T1 radiomics, T2 radiomics, combined T1+T2 radiomics, and T1+T2+clinical features. We achieve the following accuracies: 0.573, 0.650, 0.666, and 0.674, respectively, indicating that T1 and T2 features are complementary to each other, and the clinical feature (pancreas volumetry) increases the prediction performance.

4. Conclusion

IPMN cysts are a ticking time bomb that can progress into pancreatic cancer. Early detection and risk stratification of these precancerous lesions is crucial for effective treatment planning and disease control. However, this is no easy feat given the irregular shape, texture, and size of the cysts and the pancreas. To tackle this challenge, we propose a novel CAD pipeline for IPMN risk classification from multi-contrast MRI scans. The proposed CAD system includes a self-adapting volumetric segmentation strategy for pancreas delineation and a newly designed deep learning-based classification scheme with a radiomics-based predictive approach at the decision level. In a series of rigorous experiments on multi-center data sets (246 MRI scans from five centers), we achieve unprecedented performance (81.9% accuracy with radiomics) that surpasses the state of the art in the field (ViT 61.3% without radiomics). Our ablation studies further underscore the pivotal role of both radiomics and deep learning modules for attaining superior performance compared to international guidelines and published studies, and highlight the importance of pancreas volume as a clinical feature.

Acknowledgments

This project is supported by the NIH funding: NIH/NCI R01-CA246704 and NIH/NIDDK U01-DK127384-02S1.

References

  • 1.Chen PT, Wu T, Wang P, Chang D, Liu KL, Wu MS, Roth HR, Lee PC, Liao WC, Wang W: Pancreatic cancer detection on ct scans with deep learning: a nationwide population-based study. Radiology 306(1), 172–182 (2023) [DOI] [PubMed] [Google Scholar]
  • 2.Corral JE, Hussein S, Kandel P, Bolan CW, Bagci U, Wallace MB: Deep learning to classify intraductal papillary mucinous neoplasms using magnetic resonance imaging. Pancreas 48(6), 805–810 (2019) [DOI] [PubMed] [Google Scholar]
  • 3.Elta GH, Enestvedt BK, Sauer BG, Lennon AM: Acg clinical guideline: diagnosis and management of pancreatic cysts. Official journal of the American College of Gastroenterology| ACG 113(4), 464–479 (2018) [DOI] [PubMed] [Google Scholar]
  • 4.He K, Zhang X, Ren S, Sun J: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016) [Google Scholar]
  • 5.Huang G, Liu Z, Van Der Maaten L, Weinberger KQ: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700–4708 (2017) [Google Scholar]
  • 6.Hussein S, Kandel P, Bolan CW, Wallace MB, Bagci U: Lung and pancreatic tumor characterization in the deep learning era: novel supervised and unsupervised learning approaches. IEEE transactions on medical imaging 38(8), 1777–1787 (2019) [DOI] [PubMed] [Google Scholar]
  • 7.Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2), 203–211 (2021) [DOI] [PubMed] [Google Scholar]
  • 8.Krizhevsky A, Sutskever I, Hinton GE: Imagenet classification with deep convolutional neural networks. Communications of the ACM 60(6), 84–90 (2017) [Google Scholar]
  • 9.Kuwahara T, Hara K, Mizuno N, Okuno N, Matsumoto S, Obata M, Kurita Y, Koda H, Toriyama K, Onishi S, et al. Usefulness of deep learning analysis for the diagnosis of malignancy in intraductal papillary mucinous neoplasms of the pancreas. Clinical and translational gastroenterology 10(5) (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.LaLonde R, Tanner I, Nikiforaki K, Papadakis GZ, Kandel P, Bolan CW, Wallace MB, Bagci U: Inn: inflated neural networks for ipmn diagnosis. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part V 22. pp. 101–109. Springer; (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lennon AM, Ahuja N, Wolfgang CL: Aga guidelines for the management of pancreatic cysts. Gastroenterology 149(3), 825 (2015) [DOI] [PubMed] [Google Scholar]
  • 12.Luo G, Fan Z, Gong Y, Jin K, Yang C, Cheng H, Huang D, Ni Q, Liu C, Yu X: Characteristics and outcomes of pancreatic cancer by histological subtypes. Pancreas 48(6), 817–822 (2019) [DOI] [PubMed] [Google Scholar]
  • 13.Marchegiani G, Andrianello S, Borin A, Dal Borgo C, Perri G, Pollini T, Romanò G, D’Onofrio M, Gabbrielli A, Scarpa A, et al. Systematic review, meta-analysis, and a high-volume center experience supporting the new role of mural nodules proposed by the updated 2017 international guidelines on ipmn of the pancreas. Surgery 163(6), 1272–1279 (2018) [DOI] [PubMed] [Google Scholar]
  • 14.Nyúl LG, Udupa JK, Zhang X: New variants of a method of mri scale standardization. IEEE transactions on medical imaging 19(2), 143–150 (2000) [DOI] [PubMed] [Google Scholar]
  • 15.Salanitri FP, Bellitto G, Palazzo S, Irmakci I, Wallace M, Bolan C, Engels M, Hoogenboom S, Aldinucci M, Bagci U, et al. Neural transformers for intraductal papillary mucosal neoplasms (ipmn) classification in mri images. In: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). pp. 475–479. IEEE; (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4510–4520 (2018) [Google Scholar]
  • 17.Zhang Z, Bagci U: Dynamic linear transformer for 3d biomedical image segmentation. In: Machine Learning in Medical Imaging: 13th International Workshop, MLMI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings. pp. 171–180. Springer; (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES