Skip to main content
Cancer Imaging logoLink to Cancer Imaging
. 2026 Mar 11;26:53. doi: 10.1186/s40644-026-01014-y

Prediction of MYC/BCL-2 co-expression in diffuse large B-cell lymphoma using a multimodal fusion model: a retrospective study based on PET/CT habitat radiomics and deep learning

Yu He 1,2, Shirong Chen 2, Xinyang Li 2, Jingkai Yi 1,2, Dan Wang 2, Kailin Qi 1,2, Xiao Jiang 2, Ping Wu 3, Meng Zhao 2, Hao Lu 2, Ying Kou 2, Yutang Yao 2, Zhuzhong Cheng 1,2,
PMCID: PMC13101366  PMID: 41814400

Abstract

Background

The co-expression of MYC and BCL-2 proteins in diffuse large B-cell lymphoma (DLBCL) is linked to poor prognosis and resistance to standard therapies. Thus, a non-invasive and accurate method to detect this co-expression before treatment is essential for pre-treatment risk stratification and assisting in personalized patient management.

Methods

This retrospective study included DLBCL patients who underwent baseline 18F-FDG PET/CT between December 2018 and August 2024. Clinical data were collected. Habitat radiomics features were extracted by segmenting tumors into distinct subregions, and 3D deep learning features were obtained using convolutional neural networks, both derived from PET/CT images. Two individual models were built: A habitat radiomics model and a 3D deep learning model. A multimodal fusion model was also constructed by integrating dimensionally reduced features from habitat radiomics, 3D deep learning, clinical data, and PET-derived metabolic parameters. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA). DeLong’s test was used to compare AUCs, and net reclassification improvement (NRI) and integrated discrimination improvement (IDI) were calculated to assess net benefit.

Results

A total of 242 patients were enrolled (95 DEL-positive [39.3%] and 147 DEL-negative [60.7%]) and were stratified-randomly split by DEL status into a training set (n = 193) and test set (n = 49) in an 8:2 ratio. This was a retrospective single-center study with an internal hold-out test cohort. All feature selection and model development were performed in the training cohort only, and the test cohort was used solely for final evaluation. The habitat radiomics model showed better performance than the deep learning model, with AUCs of 0.869 (95% CI: 0.820–0.918) and 0.812 (95% CI: 0.661–0.964) vs. 0.844 (95% CI: 0.787–0.902) and 0.715 (95% CI: 0.562–0.869) in the training and test sets, respectively. The fusion model outperformed both, achieving AUCs of 0.946 (95% CI: 0.917–0.974) in the training and 0.890 (95% CI: 0.793–0.987) in the test set. Calibration curves demonstrated strong agreement between predicted and observed outcomes. DCA confirmed higher clinical benefit for the fusion model. DeLong’s test showed the fusion model significantly outperformed both individual models in the training set and the deep learning model in the test set (P < 0.05). NRI and IDI further supported improved discrimination, suggesting potential incremental value.

Conclusions

The multimodal fusion model based on 18F-FDG PET/CT and clinical data provides a non-invasive and reliable tool for predicting MYC/BCL-2 co-expression in DLBCL, providing complementary prognostic information to assist personalized treatment planning.

Trial Registration

This study was retrospectively registered.

Supplementary Information

The online version contains supplementary material available at 10.1186/s40644-026-01014-y.

Keywords: Habitat radiomics, Deep learning, 18F-FDG PET/CT, MYC/BCL-2 co-expression, Diffuse large B-cell lymphoma

Background

Diffuse Large B-Cell Lymphoma (DLBCL), the predominant subtype of non-Hodgkin lymphomas (NHL), is an aggressive B-cell lymphoma representing around 30% of all NHL [1]. DLBCL demonstrates considerable biological heterogeneity, with prognosis affected by multiple factors [2]. Patients with unfavorable prognoses frequently exhibit distinct subgroups or detrimental prognostic markers. Among them, some are classified as double-expressor diffuse large B-cell lymphoma (DEL), defined on formalin-fixed, paraffin-embedded (FFPE) diagnostic biopsy specimens by immunohistochemistry as MYC expression in ≥ 40% of tumor cells and BCL-2 expression in ≥ 50% of tumor cells. In contrast, genetic double-hit lymphoma (DHL) is defined by MYC rearrangement together with BCL2 and/or BCL6 rearrangements, typically detected by cytogenetic/fluorescence in situ hybridization (FISH) testing [3]. Studies indicate that DEL exhibits a poor response to first-line R-CHOP therapy [4, 5], and the WHO recognizes it as an adverse prognostic biomarker [3]. Recent research suggests that the combination of targeted therapy and chemotherapy, as well as intensified chemotherapy regimens, can improve therapeutic efficacy in patients with DEL to different degrees [6]. While immunohistochemical MYC/BCL-2 co-expression remains the reference standard for DEL, reliance on tissue biopsy involves invasiveness. Consequently, complementary non-invasive pre-treatment approaches, such as baseline ¹⁸F-FDG PET/CT-based quantitative imaging, are clinically significant for supporting early risk stratification and triaging patients for timely confirmatory testing.

PET/CT is a critical imaging modality for DLBCL, enabling early diagnosis, staging, evaluation of treatment response, therapeutic guidance, and prognostic assessment [7]. PET/CT delineates anatomical structures while also enabling metabolic characterization of lesions. Semi-quantitative PET-derived parameters have been identified as independent prognostic biomarkers in DLBCL, facilitating the noninvasive evaluation of intratumor heterogeneity [811].

Artificial Intelligence (AI)-driven radiomics extracts high-throughput imaging features to quantify tumor heterogeneity and develops clinical prediction models for diagnosis, risk stratification, therapeutic response, and prognosis, thus advancing precision oncology [1214]. Contemporary radiomic research generally emphasizes the extraction of features from entire tumors. Tumor heterogeneity exhibits spatial variation, requiring the analysis of specific subregions [15, 16]. Habitat analysis techniques facilitate the precise extraction of subregional characteristics through the clustering of analogous voxels within tumor regions [17]. Deep learning, a subset of artificial intelligence and machine learning, exhibits significant proficiency in detection, classification, and predictive tasks. These architectures derive information from medical imaging data, facilitating the creation of prognostically significant biomarkers [18]. This study aimed to develop a multimodal fusion model that integrates habitat radiomics, 3D deep learning, clinical information, and PET metabolic parameters to predict MYC/BCL-2 co-expression in DLBCL, thereby supporting pre-treatment risk stratification and facilitating timely confirmatory testing when appropriate.

Methods

Study design

This study developed three distinct radiomics models: The multimodal fusion model (MFM), the 3D deep learning model (DL-3D), and the habitat radiomics model (Intra-Hab). The workflow of this study is shown in Fig. 1.

Fig. 1.

Fig. 1

The workflow of this study. CT: computed tomography, DCA: decision curve analysis, Habitat: habitat Image (subregion), LASSO: least absolute shrinkage and selection, MSE: mean squared error, PET: positron emission tomography, ROI: region of interest, ROC: receiver operating characteristic curve

Patient data

This study retrospectively enrolled patients with histopathologically confirmed DLBCL who underwent baseline 18F-FDG PET/CT scans at our institution from December 2018 to August 2024. Inclusion and exclusion criteria are summarized in Fig. 2. Patients were divided into training and test sets at an 8:2 ratio using stratified random sampling by DEL status. Clinical information was collected, including age, sex, Ann Arbor stage, cell-of-origin (COO) classification, and Ki-67 expression level.

Fig. 2.

Fig. 2

Flowchart showing the patient enrollment process in the study

Imaging protocols

Patients were instructed to fast and avoid the intake of sugar-containing fluids for at least 6 h prior to the examination. The administered dose of 18F-FDG ranged from 3.70 to 5.55 MBq/kg. Prior to the administration of the radiopharmaceutical, measurements of patients’ height, weight, and blood glucose levels were conducted, ensuring that blood glucose concentrations remained below 11.1 mmol/L. After injection, patients were instructed to drink water and rest in a quiet environment for approximately 60 min. The bladder was emptied immediately prior to scanning. A whole-body low-dose CT scan was initially conducted, utilizing parameters of 120 kV, 80 mA, a pitch of 0.55, and a slice thickness of 5 mm. A PET scan was subsequently conducted, featuring a scanning duration of 1.5 min per bed position, a matrix size of 200 × 200, and an image reconstruction using True X and time-of-flight (TOF). The specific models and manufacturers of the PET/CT scanner and cyclotron are provided in Supplementary Material 1.

18F-FDG PET/CT image preprocessing

PET images were imported into LIFEx (version 7.6.0; https://lifexsoft.org) in DICOM format. All identifiable hypermetabolic nodal and extranodal lesions in each patient were semi-automatically delineated using a 40% SUVmax threshold. The lesion masks were then combined to form a single composite tumor VOI per patient for subsequent feature extraction, followed by manual exclusion of physiological uptake and adjacent non-tumor tissues. The initial VOI delineation and preliminary editing were performed by an intermediate nuclear medicine physician with 8 years of experience. All VOIs were subsequently reviewed and refined by a senior nuclear medicine expert (chief physician) with > 15 years of experience, with special attention to physiological ¹⁸F-FDG activity in the urinary system (e.g., kidneys/collecting system and urinary bladder). Both readers were blinded to DEL status during VOI delineation and verification. The corresponding CT images were imported into 3D Slicer 5.6.1 (https://www.slicer.org) in DICOM format. Subsequently, PET-derived VOIs were spatially aligned with the CT dataset via rigid registration, resulting in CT-based VOIs for subsequent radiomic feature extraction. Semi-quantitative PET parameters were extracted using LIFEx software, including SUVmax, total metabolic tumor volume (MTV), standardized total metabolic tumor volume (SMTV), total lesion glycolysis (TLG), standardized total lesion glycolysis (STLG), and maximal inter-lesion distance (Dmax). Both PET and CT datasets were resampled to an isotropic spatial resolution of 1 × 1 × 1 mm³ using nearest-neighbor interpolation to standardize voxel spacing.

Habitat clustering and radiomics feature extraction

Subregional habitats refer to imaging-defined intratumoral subregions generated by unsupervised voxel-wise clustering of standardized multi-parametric local feature vectors. Voxels assigned to the same cluster share similar local metabolic and textural patterns and are mapped back to the tumor VOI to form a spatially coherent habitat subregion. Importantly, habitat subregions do not directly correspond to specific molecular or pathological subtypes but should be regarded as imaging surrogates of intratumoral heterogeneity. To construct these feature vectors, a moving window technique was first applied to compute the local entropy of PET and CT images within a 5 × 5 × 5 voxel neighborhood. Subsequently, to ensure complete feature extraction at the boundaries, each VOI was expanded outward by two voxels. Finally, a 19-dimensional local feature vector was constructed for each voxel within the VOI by integrating multiple quantitative parameters. A detailed summary of these 19 features is provided in Supplementary Material 2. The standardized voxel-level feature vectors from all patients were clustered using an unsupervised K-means algorithm, with Euclidean distance serving as the similarity metric. The number of clusters was assessed in the range of 3 to 10, and the Calinski–Harabasz index was employed to determine the optimal configuration. Radiomic features were extracted from intra-tumoral and habitat subregions using the PyRadiomics toolkit (http://pyradiomics.readthedocs.io), in accordance with the guidelines of the Imaging Biomarker Standardization Initiative (IBSI). However, due to the high heterogeneity of tumors, the internal composition varied considerably among patients, and certain habitat subregions were biologically absent in some cases. Additionally, in a few patients with small tumor volumes or atypical imaging patterns, some predefined habitats failed to form valid spatial clusters following K-means clustering, resulting in missing features. To retain these cases and ensure data completeness, missing values were imputed using the K-nearest neighbors (KNN) algorithm, with Euclidean distance used to identify the five most similar samples (K = 5) for substitution.

Deep learning feature extraction

The images were cropped into three-dimensional regions to define the minimal bounding cuboid encompassing the VOI. The processed PET and CT datasets were independently used to train a series of 3D deep learning models, including ResNet50-3D, ResNet101-3D, DenseNet169-3D, DenseNet121-3D, ShuffleNet-3D, ViT-3D, and SimpleViT-3D. All input volumes were resized to 128 × 128 × 96 voxels prior to model training. Model training employed a cosine-decay learning rate schedule, initiated with a learning rate of 0.01 and continued for 1,000 epochs. Comparative performance analysis among the 3D deep learning architectures was conducted to identify the optimal model for final feature extraction.

Feature selection

Following feature extraction, the inter-observer reproducibility of radiomic features was first assessed to ensure robustness against segmentation variability. A subset of 30 patients was randomly selected, and tumor VOIs were independently re-segmented by the two nuclear medicine physicians using the standardized protocol. The intraclass correlation coefficient (ICC) was calculated for all extracted features, and those with poor reproducibility (ICC < 0.80) were excluded. Upon identifying this robust feature set, a rigorous feature screening process was applied to both habitat radiomic features and features derived from 3D deep learning. All feature selection procedures were strictly limited to the training set to prevent information leakage and to ensure unbiased model evaluation. Z-score normalization was employed to standardize the mean and standard deviation among features. A t-test was conducted to determine p-values, retaining only features that exhibited statistical significance (p < 0.05). To reduce redundancy, Pearson correlation coefficients were calculated to identify highly correlated features; for each pair with a correlation coefficient exceeding 0.9, only one feature was retained. Subsequently, least absolute shrinkage and selection operator (LASSO) regression was applied to eliminate non-informative features by shrinking their coefficients to zero, thereby enhancing model interpretability. The optimal regularization parameter (λ) was determined using 10-fold cross-validation within the training set. The test set was reserved exclusively for independent validation and was not involved in either feature selection or hyperparameter tuning.

Model construction

Following feature selection, the selected features were used to train multiple machine learning algorithms, including support vector machine (SVM), K-nearest neighbors (KNN), random forest (RF), Light Gradient Boosting Machine (LightGBM), and Adaptive Boosting (AdaBoost). Three predictive models were developed: [1] a habitat radiomic model focusing on subregional habitat and intratumoral features (Intra-Hab); [2] a 3D deep learning model based on features extracted from 3D neural networks (DL-3D); [3] a multimodal fusion model (MFM) that integrated 3D deep learning features, habitat radiomic features, clinical variables, and PET-derived metabolic parameters.

Statistical analysis

Clinical characteristics between cohorts were compared using the Student’s t-test for continuous variables and the χ² test for categorical variables. The area under the receiver operating characteristic curve (AUC) was calculated to evaluate the discriminative performance of each model, thereby facilitating optimal model selection. Model calibration was assessed using calibration curves, and decision curve analysis (DCA) was conducted to determine the net clinical benefit and applicability of the predictive models across a range of threshold probabilities. Comparative model performance was further examined using DeLong’s test to identify statistically significant differences in AUCs. Additionally, the net reclassification improvement (NRI) and integrated discrimination improvement (IDI) indices were employed to quantify the improvement in predictive accuracy and overall clinical utility.

Results

Patient characteristics

This retrospective study included 242 DLBCL patients (95 DEL-positive [39.3%] and 147 DEL-negative [60.7%]) who underwent baseline ¹⁸F-FDG PET/CT. Patients were randomly split into training (n = 193) and internal testing (n = 49) cohorts (8:2 ratio) using stratified sampling to maintain consistent DEL-positive/negative proportions. Baseline characteristics, including clinical data and PET-derived metabolic parameters, are summarized in Supplemental Table 1. A two-step logistic regression approach was adopted: univariable logistic regression was first performed to screen for significant clinical and PET metabolic parameters, and only those with statistical significance were subsequently included in a stepwise multivariable logistic regression analysis. The univariable analysis revealed significant associations between MYC/BCL-2 co-expression and sex, age, Ann Arbor stage, SUVmax, and Ki-67 expression levels. However, none of these variables remained statistically significant in the multivariable analysis. Detailed results of the univariable and multivariable analyses are presented in Table 1.

Table 1.

Univariate and multivariate analysis of clinical characteristics

Feature-Name OR_UNI 95% CI P-value-UNI OR_MULTI 95% CI P-value-MULTI
gender 0.540 0.380–0.766 0.004 0.576 0.354–0.939 0.063
Ann-Arbor 0.898 0 829-0.971 0.024 1.043 0.826–1.318 0.766
SUV max 0.986 0.977–0.995 0.010 0.979 0.953–1.006 0 202
age 0.994 0.990–0.998 0.010 0.989 0.973–1.005 0.250
Ki-67 0.996 0.993–0.999 0.025 1.013 0.999–1.028 0.133
COO 0.875 0.659–1.162 0.439
sMTV 0.985 0.968–1.003 0.160
D max 0.995 0.989-1.000 0.105
sTLG 0.998 0.996-1.000 0.082
TLG 1.000 1.000–1.000 0.077
MTV 1.000 0.999-1.000 0 304

*OR: odds ratio, UNI: univariate analysis, MULTI: multivariate analysis, CI: confidence interval, SUV max: maximal standardized uptake value, MTV: total metabolic tumor volume, SMTV: standardized total metabolic tumor volume, TLG: total lesion glycolysis, STLG: standardized total lesion glycolysis, D max: maximal inter-lesion distance

Clustering of habitat subregions and construction of habitat radiomic models

The optimal number of clusters was determined to be three, which produced the highest Calinski–Harabasz (CH) scores for both CT and PET modalities, as illustrated in Fig. 3. A total of 15,400 radiomic features were extracted, comprising 2,016 PET-based and 1,834 CT-based features from each segmented habitat subregion and the entire tumor region, calculated as 15,400 = (2,016 + 1,834) × 4. Following feature selection and LASSO dimensionality reduction, nine features with non-zero coefficients were identified. These included two features from the whole tumor region, five from Habitat Region 1, and one each from Habitat Regions 2 and 3, all of which were subsequently used for final algorithm modeling, as shown in Fig. 4. The specific names of these selected features are provided in Supplemental Material 3. Performance comparisons of various machine learning algorithms based on radiomic features derived from both intratumoral and habitat subregional analyses are presented in Supplemental Table 2.

Fig. 3.

Fig. 3

(A and B) Clustering process and generation of subregions based on habitat analysis. (A) Flow chart of intra-tumor Subregion Segmentation. (B) The optimal cluster number selection based on the highest Calinski-Harabasz Index (which was 3 in this study)

Fig. 4.

Fig. 4

(A, B and C) Dimensionality reduction and feature selection of intra-tumor and subregional radiomic features. (A) LASSO Regularization Path Plot; (B) MSE Cross-Validation Error Plot; (C) Feature Coefficient Bar Chart. (D and E) ROC curves comparing intra-tumor and habitat models across datasets (D: Training set, E: Test set). MSE: mean squared error

Construction of a 3D deep learning model

A systematic comparison of the performance of various 3D deep learning models for CT and PET was conducted (refer to Tables 2 and 3). Among the CT-based 3D neural networks, DenseNet169 demonstrated robust learning capacity, achieving an AUC of 0.875 (95% CI: 0.825–0.925) in the training cohort and 0.809 (95% CI: 0.671–0.947) in the test cohort. Although ShuffleNet achieved a higher AUC of 0.982 (95% CI: 0.968–0.997) in the training set, exceeding the performance of DenseNet169, its test performance dropped markedly to an AUC of 0.624 (95% CI: 0.431–0.818), indicating a potential overfitting issue. Therefore, DenseNet169 was selected as the final model for extracting CT-derived 3D deep learning features. In the PET-based models, ResNet101 and SimpleViT exhibited different levels of generalizability. SimpleViT showed limited generalization ability, as evidenced by a substantial drop from a training AUC of 0.916 (95% CI: 0.874–0.958) to a testing AUC of 0.669 (95% CI: 0.508–0.830). In contrast, ResNet101 demonstrated more stable generalization and was consequently chosen for extracting 3D features from PET data. A total of 3,712 deep learning features were extracted, including 2,048 from PET and 1,664 from CT. After feature selection and LASSO dimensionality reduction, 16 features with non-zero coefficients were retained for the final model, as illustrated in Fig. 5. The specific feature names are listed in Supplemental Material 4. Comparative performance of multiple machine learning algorithms based on these 3D deep learning-derived features is presented in Supplementary Table 3.

Table 2.

Comparison of Performance of Different Deep Learning Models Based on CT 3D Features

Model-Name Acc AUC 95% CI Sensitivity Specificity PPV NPV Precision Recall F1 Threshold Cohort
DenseNet121 0.534 0.607 0.527–0.686 0.936 0.261 0.462 0.857 0.462 0.936 0.619 0.221 Train
DenseNet121 0.816 0.831 0.698–0.964 0.647 0.906 0.786 0.829 0.786 0.647 0.710 0.420 Test
DenseNet169 0.824 0.875 0.825–0.925 0.679 0.922 0.855 0.809 0.855 0.679 0.757 0.606 Train
DenseNet169 0.735 0.809 0.671–0.947 0.765 0.719 0.591 0.852 0.591 0.765 0.667 0.625 Test
resnet101 0.596 0.440 0.355–0.524 0.115 0.922 0.500 0.606 0.500 0.115 0.187 0.520 Train
resnet101 0.837 0.835 0.712–0.959 0.647 0.937 0.846 0.833 0.846 0.647 0.733 0.395 Test
resnet50 0.606 0.474 0.389–0.558 0.103 0.948 0.571 0.609 0.571 0.103 0.174 0.702 Train
resnet50 0.714 0.562 0.375–0.750 0.294 0.937 0.714 0.714 0.714 0.294 0.417 0.423 Test
Shuffle Net 0.938 0.982 0.968–0.997 0.936 0.939 0.912 0.956 0.912 0.936 0.924 0.342 Train
Shuffle Net 0.755 0.624 0.431–0.818 0.353 0.969 0.857 0.738 0.857 0.353 0.500 0.926 Test
Simple VIT 0.565 0.571 0.488–0.653 0.615 0.530 0.471 0.670 0.471 0.615 0.533 0.383 Train
Simple VIT 0.755 0.671 0.495–0.847 0.412 0.937 0.778 0.750 0.778 0.412 0.538 0.503 Test
VIT 0.617 0.619 0.537–0.701 0.590 0.635 0.523 0.695 0.523 0.590 0.554 0.411 Train
VIT 0.694 0.522 0.340–0.705 0.118 1.000 1.000 0.681 1.000 0.118 0.211 0.506 Test

*ACC: accuracy, AUC: the area under the receiver operating characteristic curve, PPV: positive predictive value, NPV: negative predictive value, F1: F1 Score

Table 3.

Comparison of Performance of Different Deep Learning Models Based on PET 3D Features

Model-Name Acc AUC 95% CI Sensitivity Specificity PPV NPV Precision Recall F1 Threshold Cohort
DenseNet121 0.596 0.485 0.400–0.570 0.179 0.878 0.500 0.612 0.500 0.179 0.264 0.486 Train
DenseNet121 0.673 0.503 0.321–0.685 0.118 0.969 0.667 0.674 0.667 0.118 0.200 0.501 Test
DenseNet169 0.461 0.503 0.420–0.587 0.846 0.200 0.418 0.657 0.418 0.846 0.559 0.240 Train
DenseNet169 0.633 0.519 0.342–0.697 0.176 0.875 0.429 0.667 0.429 0.176 0.250 0.353 Test
resnet101 0.788 0.881 0.835–0.927 0.859 0.739 0.691 0.885 0.691 0.859 0.766 0.339 Train
resnet101 0.755 0.634 0.449–0.820 0.412 0.937 0.778 0.750 0.778 0.412 0.538 0.667 Test
resnet50 0.751 0.806 0.745–0.868 0.538 0.896 0.778 0.741 0.778 0.538 0.636 0.470 Train
resnet50 0.714 0.572 0.388–0.756 0.294 0.937 0.714 0.714 0.714 0.294 0.417 0.379 Test
Shuffle Net 0.596 1.000 1.000–1.000 0.000 1.000 0.000 0.596 0.000 0.000 NaN 1.000 Train
Shuffle Net 0.673 0.542 0.359–0.726 0.353 0.844 0.545 0.711 0.545 0.353 0.429 0.848 Test
Simple VIT 0.855 0.916 0.874–0.958 0.897 0.826 0.778 0.922 0.778 0.897 0.833 0.302 Train
Simple VIT 0.714 0.669 0.508–0.830 0.471 0.844 0.615 0.750 0.615 0.471 0.533 0.461 Test
VIT 0.720 0.757 0.685–0.829 0.679 0.748 0.646 0.775 0.646 0.679 0.662 0.428 Train
VIT 0.714 0.590 0.387–0.793 0.471 0.844 0.615 0.750 0.615 0.471 0.533 0.471 Test

Fig. 5.

Fig. 5

(A, B and C) Dimensionality reduction and feature selection of 3D deep learning features. (A) LASSO Regularization Path Plot; (B) MSE Cross-Validation Error Plot; (C) Feature Coefficient Bar Chart. (D and E) ROC curve comparison of 3D deep learning models (D: Training set, E: Test set)

Construction of the multimodal fusion model

A total of 30 features were integrated during the post-fusion process, including 9 habitat radiomic features and 16 3D deep learning features with non-zero coefficients, identified through feature selection and LASSO dimensionality reduction, along with 5 clinical and PET-derived metabolic parameters selected via univariate logistic regression. Subsequent feature selection and LASSO dimensionality reduction yielded 24 features with non-zero coefficients, which were used to construct the final multimodal fusion model, as illustrated in Fig. 6. The specific names of the selected features are provided in Supplemental Material 5. Performance comparisons of various machine learning algorithms based on features derived from the multimodal fusion approach are summarized in Supplementary Table 4.

Fig. 6.

Fig. 6

(A, B and C) Dimensionality reduction and feature selection of multimodal fusion model features. (A) LASSO Regularization Path Plot; (B) MSE Cross-Validation Error Plot; (C) Feature Coefficient Bar Chart. (D and E) Comparison of ROC curves for multimodal fusion models (D: Training set, E: Test set)

Model validation

This study evaluated the performance of habitat radiomics models, 3D deep learning models, and a multimodal fusion model in distinguishing the co-expression status of MYC and BCL-2 proteins, as shown in Table 4; Fig. 7. For the habitat radiomics model, Light Gradient Boosting Machine (LightGBM) achieved the best performance, with an AUC of 0.869 (95% CI: 0.820–0.918) in the training cohort and 0.812 (95% CI: 0.661–0.964) in the test cohort. Among the models utilizing 3D deep learning features, the Support Vector Machine (SVM) yielded superior results, with training and test AUCs of 0.844 (95% CI: 0.787–0.902) and 0.715 (95% CI: 0.562–0.869), respectively. These findings indicate that the habitat radiomics models outperformed the 3D deep learning models in predicting MYC/BCL-2 co-expression. To further improve predictive performance, we developed a multimodal fusion model by integrating habitat radiomic features, deep learning-derived features, clinical data, and PET metabolic parameters. The fusion model demonstrated the best overall performance, with the SVM achieving an AUC of 0.946 (95% CI: 0.917–0.974) in the training cohort and 0.890 (95% CI: 0.793–0.987) in the test cohort, as illustrated in Fig. 7A and B. Calibration curves indicated excellent agreement between predicted and observed outcomes in both cohorts, confirming the reliability of the model’s probabilistic predictions (Fig. 7C and D). DCA revealed that the multimodal fusion model provided a greater net clinical benefit across a range of threshold probabilities compared to the other models, indicating potential decision-analytic value across a range of threshold probabilities (Fig. 7E and F). The results of DeLong’s test, along with NRI and IDI comparisons, are presented below. DeLong’s test showed that the multimodal fusion model significantly outperformed both the habitat radiomics and 3D deep learning models in the training cohort (P < 0.05), while no significant difference was observed between the habitat and deep learning models (P = 0.549). In the test cohort, the fusion model demonstrated a statistically significant advantage over the 3D deep learning model (P < 0.05); however, differences between the fusion model and the habitat radiomics model, as well as between habitat and deep learning models, were not statistically significant (Fig. 7G and H). Moreover, the multimodal fusion model exhibited significant improvements in NRI and IDI compared with the other models, supporting incremental improvement in discrimination and risk stratification (Fig. 7I to L). In summary, the habitat radiomics model outperformed the deep learning model, while the multimodal fusion model demonstrated the highest overall predictive capability for MYC/BCL-2 co-expression.

Table 4.

Comparison of the performance of habitat radiomics models, 3D deep learning models, and multimodal fusion models

Model-Name Accuracy AUC 95% CI Sensitivity Specificity PPV NPV Precision Recall F1 Threshold Cohort
Intra-Hab 0.808 0.869 0.820–0.918 0.782 0.826 0.753 0.848 0.753 0.782 0.767 0.444 train
DL-3D 0.777 0.844 0.787–0.902 0.731 0.809 0.722 0.816 0.722 0.731 0.726 0.375 train
MFM 0.881 0.946 0.917–0.974 0.859 0.896 0.848 0.904 0.848 0.859 0.854 0.411 train
Intra-Hab 0.837 0.812 0.661–0.964 0.647 0.937 0.846 0.833 0.846 0.647 0.733 0.479 test
DL-3D 0.694 0.715 0.562–0.869 0.588 0.750 0.556 0.774 0.556 0.588 0.571 0.398 test
MFM 0.837 0.890 0.793–0.987 0.647 0.937 0.846 0.833 0.846 0.647 0.733 0.482 test

Fig. 7.

Fig. 7

(A-F) Comparison of ROC curve, Calibration Curve, and Decision Curve Analysis results for habitat radiomics models, 3D deep learning models, and multimodal fusion models across different datasets. (A and B) ROC curves; (C and D) Calibration curves; (E and F) DCA curves, training set (A, C and E) and test set (B, D and F). (G-L) Comparison of DeLong test, NRI test, and IDI test results for habitat radiomics models, 3D deep learning models, and multimodal fusion models across different datasets. (G and H) DeLong test heatmaps; (I and J) NRI heatmaps; (K, L) IDI heatmaps, training set (G, I and K) and test set (H, J and L). Heatmaps of DeLong test (G and H), NRI (I and J) and IDI (KL) of the different models in the training set (G, I, and K) and test set (H, J, and L)

Discussion

DLBCL is biologically heterogeneous, and MYC/BCL-2 co-expression (DEL) is an adverse prognostic biomarker. This study proposed a multimodal fusion model that integrates habitat radiomics, 3D deep learning features, clinical variables, and PET-derived metabolic parameters for the noninvasive prediction of MYC/BCL-2 co-expression in patients with DLBCL. While pathology remains the reference standard, such a tool may support earlier pre-treatment risk stratification and help triage patients for timely confirmatory testing when appropriate. The results showed that the habitat radiomics model achieved higher AUC values than the 3D deep learning model (training set AUC: 0.869 [95% CI: 0.820–0.918] vs. 0.844 [95% CI: 0.787–0.902]; test set AUC: 0.812 [95% CI: 0.661–0.964] vs. 0.715 [95% CI: 0.562–0.869]), indicating better predictive performance. The multimodal fusion model outperformed both single-modality models, achieving an AUC of 0.946 [95% CI: 0.917–0.974] in the training set and 0.890 [95% CI: 0.793–0.987] in the test set. A comprehensive evaluation was conducted using multiple metrics, including AUC, calibration curves, DCA and model improvement indices such as the DeLong test, NRI and IDI. Calibration curve analysis confirmed the model’s robustness and predictive accuracy in both training and test cohorts. DCA further demonstrated that the multimodal fusion model provided greater clinical net benefit across all threshold probabilities, effectively balancing treatment risks and benefits. Moreover, the positive results from NRI and IDI analyses indicated improved discrimination for clinical risk stratification and decision-making. Collectively, these findings confirm the effectiveness of the multimodal fusion model in predicting MYC/BCL-2 co-expression and underscore its potential value in supporting individualized therapeutic decision-making in patients with DLBCL.

To strengthen the biological plausibility of these findings, we further examined the dominant predictors retained in the habitat radiomics model (Supplementary Material 3). These predictors were mainly wavelet- and LBP-filtered first-order and texture descriptors (e.g., kurtosis/skewness, GLCM correlation/cluster shade, GLSZM small-area emphasis, and GLDM dependence variance). According to standardized radiomics definitions, these metrics summarize intensity-distribution characteristics and multiscale spatial pattern complexity and are commonly used as imaging surrogates of intratumoral heterogeneity [13, 19]. Mechanistically, MYC is a central regulator of glucose transport and glycolytic metabolism, providing a plausible biological rationale for why FDG-derived intensity and heterogeneity signatures may be associated with MYC/BCL-2 co-expression risk [20]. Importantly, we emphasize that this pathophysiologic interpretation is hypothesis-generating and should not be regarded as direct evidence of specific molecular or histopathologic processes; prospective studies integrating imaging with pathologic and/or molecular correlation are warranted to substantiate these associations.

Advancements in medical imaging have rendered 18F-FDG PET/CT essential for the diagnosis and assessment of DLBCL [21]. Due to tumor heterogeneity and the complexity of imaging data, advanced analytical techniques are required to identify subtle biomarkers. Gatenby et al. [16] proposed that malignancies do not operate as unified, self-organizing systems but instead form spatially heterogeneous habitats. Imaging-defined subregions within these habitats exhibit distinct microenvironmental selection pressures and region-specific adaptive evolutionary strategies among tumor cell populations. This study comprehensively quantified the tumor heterogeneity of DLBCL by integrating multimodal data. By introducing habitat-based radiomics techniques and employing K-means unsupervised clustering, tumors were categorized into distinct habitat subregions, allowing for an in-depth analysis of spatial heterogeneity within the tumor. Traditional radiomics calculations have relied on the premise of “tumor uniform heterogeneity,” which may not accurately reflect the true spatial heterogeneity of the tumor. Most studies have focused solely on extracting radiomic features from the entire tumor region. This approach may introduce confounding factors, including hemorrhage, necrosis, cystic changes and edema, which can obscure the expression of tumor heterogeneity and reduce the predictive performance of models [22]. In contrast, habitat-based radiomics quantify tumor heterogeneity more accurately by dividing the tumor into multiple subregions with distinct biological characteristics. This method not only captures the spatial complexity of the tumor but also reveals potential biological mechanisms, thereby providing a basis for predictive models [23]. Hongyue Zhao et al. [24] developed a habitat-based radiomic model derived from 18F-FDG PET/CT imaging in 62 colorectal cancer patients, demonstrating that habitat-driven radiomic features could predict KRAS/NRAS/BRAF mutational status. Ling Chen et al. [25] applied habitat imaging analysis to preoperatively differentiate 317 cases of non-small cell lung cancer (NSCLC) from benign inflammatory lesions, validating the discriminative potential of habitat-specific PET/CT radiomics. These studies collectively demonstrate the potential of habitat imaging analysis to enhance predictive model performance in oncologic applications.

This study employs unsupervised K-means clustering to generate habitat subregions, partitioning all intratumor voxels into distinct clusters according to predefined criteria, thereby ensuring high intra-cluster homogeneity and substantial inter-cluster heterogeneity [26]. The methodology effectively removes prior assumptions about the characteristics of gray-level distribution in medical imaging (such as uni- or bi-modal patterns). It achieves adaptive multi-class subregional segmentation through the use of unsupervised K-means clustering, which is integrated with radiomic multidimensional quantifiers, including texture, morphology, and functional parameters. This data-driven segmentation mechanism addresses the limitations of conventional thresholding methods, which rely on rigid single-variable distribution assumptions regarding gray-level histograms. It significantly improves the discriminative capacity for grayscale-overlapping microenvironments, such as viable tumor-necrotic-edematous zones, and offers theoretical support for accurate quantification of tumor spatial heterogeneity [27].

Another innovative aspect of this study resides in the implementation of a 3D convolutional neural network (3D CNN) to directly process volumetric imaging data, enabling comprehensive capture of spatial contextual information. This framework allows the model to more accurately characterize shape, structural architecture and textural patterns within the imaging data, thereby achieving precise quantification of tumor volume, morphology and spatial heterogeneity. Furthermore, the 3D CNN facilitates omnidirectional analysis across anatomical planes, providing holistic, multiangular characterization of lesion features and enhancing complementary information integration. Compared to 2D neural networks, which process slices sequentially, the 3D CNN utilizes three-dimensional convolutional kernels to simultaneously analyze adjacent slices. This method maintains inter-slice spatial relationships during feature extraction, thereby avoiding the information loss associated with the limitations of slice-by-slice processing in 2D models [28].

Limitations

Several limitations of this study must be acknowledged. First, as a retrospective single-center study, inherent selection bias may exist, and the generalizability of the findings remains to be fully established. Specifically, an independent external validation cohort was not available due to institutional data privacy requirements. Although we mitigated this limitation by employing a strict internal hold-out testing strategy (completely isolated from feature selection and model training) and performing ICC-based reproducibility checks to reduce overfitting, retrospective performance alone cannot guarantee real-world utility. Before clinical adoption, it is mandatory that this model undergoes evaluation in prospective, multicenter studies using a pre-specified protocol and a locked model. This is essential to confirm its reproducibility and generalizability across different PET/CT scanners, reconstruction settings, and patient populations. Second, the “black-box” nature of deep learning models may hinder clinical interpretability. Future investigations will explicitly incorporate rigorously validated explainability workflows—such as stability-tested visualization techniques—to further elucidate the decision-making logic of the deep learning models and facilitate clinical trust. Lastly, although habitat analysis enables the quantification of tumor heterogeneity, the accuracy of subregion delineation depends on the algorithms employed. Further investigations are warranted to refine algorithmic design, evaluate parameter sensitivity, and establish standardized frameworks to facilitate clinical translation.

Conclusion

In summary, this study demonstrates that ¹⁸F-FDG PET/CT-based habitat radiomics and 3D deep learning models hold considerable promise for predicting MYC/BCL-2 co-expression in DLBCL and represent promising imaging methodologies. Notably, the multimodal fusion model exhibits improved predictive performance at the individual level, highlighting its potential to support non-invasive pre-treatment risk stratification and to triage patients for timely confirmatory testing, pending prospective multicenter external validation and clinical impact studies.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (36.3KB, docx)
Supplementary Material 2 (34.5KB, docx)

Acknowledgements

This study was supported by the State Administration of Science, Technology and Industry for National Defense (Grants HNKF202323(36) and HNKF202322(36)), the Science and Technology Department of Sichuan Province (Grant 2024YFFK0067), and the Sichuan Cancer Hospital Outstanding Youth Funding (Grant YB2023022). No other potential conflict of interest relevant to this article was reported.

Abbreviations

AUC

The area under the receiver operating characteristic curve

AI

Artificial Intelligence

DLBCL

Diffuse large B-cell lymphoma

DEL

Double-expressing diffuse large B-cell lymphoma

DICOM

Digital Imaging and communications in medicine

Dmax

Maximal inter-lesion distance

DCA

Decision curve analysis

IDI

Integrated discrimination improvement

KNN

K-nearest neighbors

LASSO

Least absolute shrinkage and selection operator

MTV

Total metabolic tumor volume

NRI

Net reclassification improvement

NHL

Non-Hodgkin lymphomas

PET/CT

Positron emission tomography/computed tomography

SUVmax

Maximal standardized uptake value

SMTV

Standardized total metabolic tumor volume

STLG

Standardized total lesion glycolysis

TOF

Time of flight

TLG

Total lesion glycolysis

VOI

Volume of interest

Author contributions

All authors have read and approved the final manuscript. The specific contributions of each author are as follows:- Conception and study design, data analysis, and interpretation: Yu He, Shirong Chen, Zhuzhong Cheng- Drafting of the manuscript: Yu He, Shirong Chen, Xinyang Li, Jingkai Yi, Dan Wang, Kailin Qi- Critical revision of the manuscript and English translation: Xiao Jiang, Ping Wu, Meng Zhao, Hao Lu, Ying Kou, Yutang Yao, Zhuzhong Cheng.

Funding

This study was supported by the State Administration of Science, Technology and Industry for National Defense (Grants HNKF202323(36) and HNKF202322(36)), the Science and Technology Department of Sichuan Province (Grant 2024YFFK0067), and the Sichuan Cancer Hospital Outstanding Youth Funding (Grant YB2023022).

Data availability

The datasets analyzed during the current study are not publicly available due to concerns regarding patient privacy and institutional data protection policies. However, they are available from the corresponding author upon reasonable request and with approval from Sichuan Cancer Hospital.

Declarations

Ethics approval and consent to participate

This study was approved by the Ethics Committee for Medical Research and New Medical Technology of Sichuan Cancer Hospital (Approval No. SCCHEC-02-2025-082). As a retrospective study, the requirement for written informed consent was waived in accordance with institutional guidelines.

Consent for publication

Not applicable. This study does not involve any identifiable individual data (such as personal details, images, or videos); therefore, consent for publication was not required.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Sehn LH, Salles G. Diffuse Large B-Cell Lymphoma. N Engl J Med. 2021;384:842–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Wright GW, Huang DW, Phelan JD, et al. A Probabilistic Classification Tool for Genetic Subtypes of Diffuse Large B Cell Lymphoma with Therapeutic Implications. Cancer Cell. 2020;37:551–e56814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Swerdlow SH, Campo E, Pileri SA, et al. The 2016 revision of the World Health Organization classification of lymphoid neoplasms. Blood. 2016;127:2375–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Green TM, Young KH, Visco C, et al. Immunohistochemical double-hit score is a strong predictor of outcome in patients with diffuse large B-cell lymphoma treated with rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone. J Clin Oncol. 2012;30:3460–7. [DOI] [PubMed] [Google Scholar]
  • 5.Savage KJ, Slack GW, Mottok A, et al. Impact of dual expression of MYC and BCL2 by immunohistochemistry on the risk of CNS relapse in DLBCL. Blood. 2016;127:2182–8. [DOI] [PubMed] [Google Scholar]
  • 6.D’Alò F, Bellesi S, Maiolo E, et al. Novel Targets and Advanced Therapies in Diffuse Large B Cell Lymphomas. Cancers (Basel). 2024;16:2243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lewis KL, Trotman J. Integration of PET in DLBCL. Semin Hematol. 2023;60:291–304. [DOI] [PubMed] [Google Scholar]
  • 8.Chung HW, Lee KY, Kim HJ, Kim WS, So Y. FDG PET/CT metabolic tumor volume and total lesion glycolysis predict prognosis in patients with advanced lung adenocarcinoma. J Cancer Res Clin Oncol. 2014;140:89–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wang J, Kim D, Kang WJ, Cho H. Prognostic Value of Bone Marrow F-18 FDG Uptake in Patients with Advanced-Stage Diffuse Large B-Cell Lymphoma. Nucl Med Mol Imaging. 2020;54:28–34. [DOI] [PMC free article] [PubMed]
  • 10.Lee EJ, Chang S-H, Lee TY, et al. Prognostic Value of FDG-PET/CT Total Lesion Glycolysis for Patients with Resectable Distal Bile Duct Adenocarcinoma. Anticancer Res. 2015;35:6985–91. [PubMed] [Google Scholar]
  • 11.Guo B, Tan X, Ke Q, Cen H. Prognostic value of baseline metabolic tumor volume and total lesion glycolysis in patients with lymphoma: A meta-analysis. PLoS ONE. 2019;14:e0210224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Saboury B, Rahmim A, Siegel E. PET and AI Trajectories Finally Coming into Alignment. PET Clin. 2021;16:xv–xvi. [DOI] [PubMed] [Google Scholar]
  • 13.Zwanenburg A, Vallières M, Abdalah MA, et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology. 2020;295:328–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology. 2016;278:563–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.O’Connor JPB, Rose CJ, Waterton JC, Carano RAD, Parker GJM, Jackson A. Imaging intratumor heterogeneity: role in therapy response, resistance, and clinical outcome. Clin Cancer Res. 2015;21:249–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gatenby RA, Grove O, Gillies RJ. Quantitative imaging in cancer evolution and ecology. Radiology. 2013;269:8–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.O’Connor JPB. Cancer heterogeneity and imaging. Semin Cell Dev Biol. 2017;64:48–57. [DOI] [PubMed] [Google Scholar]
  • 18.Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349:255–60. [DOI] [PubMed] [Google Scholar]
  • 19.van Griethuysen JJM, Fedorov A, Parmar C, et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017;77:e104–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Stine ZE, Walton ZE, Altman BJ, Hsieh AL, Dang CV. MYC, Metabolism, and Cancer. Cancer Discov. 2015;5:1024–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Barrington SF, Mikhaeel NG, Kostakoglu L, et al. Role of imaging in the staging and response assessment of lymphoma: consensus of the international conference on malignant lymphomas imaging working group. J Clin Oncol. 2014;32:3048–3058. [DOI] [PMC free article] [PubMed]
  • 22.Parmar C, Rios Velazquez E, Leijenaar R, et al. Robust Radiomics feature quantification using semiautomatic volumetric segmentation. PLoS ONE. 2014;9:e102107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.O’Connor JPB, Aboagye EO, Adams JE, et al. Imaging biomarker roadmap for cancer studies. Nat Rev Clin Oncol. 2017;14:169–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhao H, Su Y, Wang Y, et al. Using tumor habitat-derived radiomic analysis during pretreatment 18F-FDG PET for predicting KRAS/NRAS/BRAF mutations in colorectal cancer. Cancer Imaging. 2024;24:26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Chen L, Liu K, Zhao X, Shen H, Zhao K, Zhu W. Habitat Imaging-Based 18F-FDG PET/CT Radiomics for the Preoperative Discrimination of Non-small Cell Lung Cancer and Benign Inflammatory Diseases. Front Oncol. 2021;11:759897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wu J, Cao G, Sun X, et al. Intratumor Spatial Heterogeneity at Perfusion MR Imaging Predicts Recurrence-free Survival in Locally Advanced Breast Cancer Treated with Neoadjuvant Chemotherapy. Radiology. 2018;288:26–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Fang M, Kan Y, Dong D, et al. Multi-Habitat Based Radiomics for the Prediction of Treatment Response to Concurrent Chemotherapy and Radiation Therapy in Locally Advanced Cervical Cancer. Front Oncol. 2020;10:563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (36.3KB, docx)
Supplementary Material 2 (34.5KB, docx)

Data Availability Statement

The datasets analyzed during the current study are not publicly available due to concerns regarding patient privacy and institutional data protection policies. However, they are available from the corresponding author upon reasonable request and with approval from Sichuan Cancer Hospital.


Articles from Cancer Imaging are provided here courtesy of BMC

RESOURCES