Abstract
Background:
Axillary lymph nodes (ALN) status serves as a crucial prognostic indicator in breast cancer (BC). The aim of this study was to construct a radiogenomic multimodal model, based on machine learning and whole-transcriptome sequencing (WTS), to accurately evaluate the risk of ALN metastasis (ALNM), drug therapeutic response and avoid unnecessary axillary surgery in BC patients.
Methods:
In this study, conducted a retrospective analysis of 1078 BC patients from The Cancer Genome Atlas (TCGA), The Cancer Imaging Archive (TCIA), and Foshan cohort. These patients were divided into the TCIA cohort (N=103), TCIA validation cohort (N=51), Duke cohort (N=138), Foshan cohort (N=106), and TCGA cohort (N=680). Radiological features were extracted from BC radiological images and differentially expressed gene expression was calibrated using technology. A support vector machine model was employed to screen radiological and genetic features, and a multimodal model was established based on radiogenomic and clinical pathological features to predict ALNM. The accuracy of the model predictions was assessed using the area under the curve (AUC) and the clinical benefit was measured using decision curve analysis. Risk stratification analysis of BC patients was performed by gene set enrichment analysis, differential comparison of immune checkpoint gene expression, and drug sensitivity testing.
Results:
For the prediction of ALNM, rad-score was able to significantly differentiate between ALN- and ALN+ patients in both the Duke and Foshan cohorts (P<0.05). Similarly, the gene-score was able to significantly differentiate between ALN- and ALN+ patients in the TCGA cohort (P<0.05). The radiogenomic multimodal nomogram demonstrated satisfactory performance in the TCIA cohort (AUC 0.82, 95% CI: 0.74–0.91) and the TCIA validation cohort (AUC 0.77, 95% CI: 0.63–0.91). In the risk sub-stratification analysis, there were significant differences in gene pathway enrichment between high and low-risk groups (P<0.05). Additionally, different risk groups may exhibit varying treatment responses (P<0.05).
Conclusion:
Overall, the radiogenomic multimodal model employs multimodal data, including radiological images, genetic, and clinicopathological typing. The radiogenomic multimodal nomogram can precisely predict ALNM and drug therapeutic response in BC patients.
Keywords: breast cancer, drug therapeutic response, lymph node metastasis, machine learning, radiogenomic
Introduction
Highlights
Machine learning and whole-transcriptome sequencing technology were conducted to filter features.
Radiogenomic multimodal data from international multicenter cohorts was applied to construct a machine learning model to predict the risk of axillary lymph nodes metastasis from three different dimensions of medical imaging, tissue cells, and molecular biology.
This study rendered insights into the drug response of breast cancer patients in different risk groups stratified by model prediction.
Breast cancer (BC) represents one of the most prevalent malignancies among women on a global scale1. Over most of the past four decades, the incidence of BC has exhibited an overall upward trend1,2. Axillary lymph node metastasis (ALNM) is an important factor affecting the survival and prognosis of BC patients3. In clinical practice, the precise prediction of ALNM plays a pivotal role in determining the surgical approach and pharmacological treatment plan for BC patients4. Currently, the standard method for axillary lymph nodes (ALN) staging is sentinel lymph node biopsy (SLNB)5. For patients with a positive sentinel lymph node, axillary lymph node dissection is the standard management6. However, SLNB is not a completely benign procedure, as it is invasive and carries a risk of long-term comorbidities. A meta-analysis revealed that the equivalent pooled estimates for the prevalence of lymphedema following SLNB were observed to be 7.5% (six studies, n=3866), 3.7% (four studies, n=491), and 5.9% (11 studies, n=3136) at less than 12 months, 12–24 months, and more than 24 months, respectively7. The pooled estimate for the prevalence of pain at any follow-up interval post-SLNB was 21.7% (10 studies, n=1039)7. Pooled estimates for the prevalence of reduced strength and range of motion post-SLNB were 15.2% (five studies, n=437) and 17.1% (six studies, n=5809), respectively7. Furthermore, a prospective international cooperative group trial involving 4069 patients indicated that the 30-day incidence rates of wound infection, hematoma, and seroma post-SLNB were 1.0, 1.4, and 7.1%, respectively8. The likelihood of axillary seroma in patients aged 70 years or older was 2.4 times higher than that in younger patients, and in patients who had five or more SLNs removed was 3.1 times higher than that who had four or fewer SLNs removed8. Additionally, in a study including 4351 patients, about 66.8% of BC patients suffered from unnecessary SLNB9. Therefore, developing an accurate tool to evaluate the status of ALN may be beneficial in avoiding unnecessary SLNB.
In clinical practice, the other methods except axillary surgery to determine the ALN status are primarily predicated upon the evaluation of three key aspects: genetic data, imaging findings, and clinical characteristics9–11. These evaluations are tailored to meet specific clinical requirements. However, in instances where the risk profiles derived from these three modalities are conflicting and do not present absolute indications, the decision to perform an SLNB poses a significant challenge. Therefore, there is a pressing need for a method that allows for an objective, quantified risk assessment that integrates these three modalities. In response to this need, the advent of the radiogenomic multimodal model has been pivotal.
The appropriate selection of treatment drugs poses a significant deficiency in clinical practice, which refers to the use of ineffective drugs may expose patients to unnecessary toxic side effects12. Therefore, we have also investigated the differences in various gene enrichment pathways, immune checkpoint expressions, and chemotherapy drug sensitivities across different risk groups.
In our study, we aimed to use radiomics, genomics, and clinical feature analysis to jointly predict the risk of ALNM and guide drug treatment through machine learning (ML) and whole-transcriptome sequencing (WTS) technology. We gathered BC patient data from international multicenters and screened radiogenomic features using the support vector machine (SVM) model and logistic regression (LR) to establish a rad-score and a gene-score for predicting ALNM in BC patients. And we incorporated clinicopathological features to construct a radiogenomic multimodal nomogram, and then performed risk stratification. At last, we conducted gene set enrichment analysis (GSEA), immune checkpoint analysis and chemotherapy drugs sensitivity analysis.
Methods
Study design and patients
The detailed workflow of the study is illustrated in Figures 1 and 2. The international multicohorts contain the TCIA cohort collected from The Cancer Genome Atlas (TCGA) dataset of The Cancer Imaging Archive (TCIA) on Duke cohort collected from the Duke dataset of The Cancer Imaging Archive (TCIA), TCGA cohort collected from TCGA, and the TCIA validation cohort. The radiological data was MRI images of the patients, included in the TCIA cohort, Duke cohort, FoShan cohort, and the TCIA validation cohort. The genetic data was the expression level of genes in different patients, involved in the TCIA cohort, TCGA cohort, and the TCIA validation cohort. The clinical data of these patients consisted of information on pathology, subtype, T stage, N stage, pathological tumor node metastasis (TNM) stage, estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (HER2) status, contained in all five cohorts. The work has been reported in line with the strengthening the reporting of cohort, cross-sectional, and case–control studies in surgery (STROCSS) criteria13 (Supplemental Digital Content 1, http://links.lww.com/JS9/B683).
Figure 1.

Radiogenomic multimodal for preoperative prediction of Axillary lymph node metastasis and drug therapeutic response in breast cancer.
Figure 2.
Study design. GLRLM, gray stroke length matrix; GLSZM, gray size area matrix; GLCM, gray co-occurrence matrix; NGTDM, neighborhood gray color difference matrix; ROI, region of interest.
Image acquisition
In the TCIA cohort, tissues for TCGA were collected from around the globe to meet its cumulative goals, typically about 500 specimens per cancer type. As a result, image datasets are highly heterogeneous due to variations in scanner modes, manufacturers, and acquisition protocols14. In the Duke cohort, the images are all downloaded from the Duke dataset of TCIA15. Patients considered in the FoShan cohort had been scanned on different scanner platforms with different magnets at both 1.5 T [MAGNETOM Optima (GE Healthcare) and HDx (GE Healthcare) and 3.0 T (MAGNETOM Discovery (GE Healthcare) and Architect (GE Healthcare)]. Axial fast low angle shot fat-saturated T1-weighted 3D scans (TR=5.7 ms/4.5ms, TE=2.7 ms/2.1 ms, flip angle=15°/12°, slice thickness=3.4 mm/1.6 mm, acquisition matrix=320×256, SENSE factor=1.5) were obtained before and after contrast agent injection. An initial fat-saturated T1-weighted precontrast scan was first collected. Then 0.1 mmol/kg of Gadolinium diethylenetriamine pentaacetic acid and bismethylamide (Gd-DTPA)-BMA (Omniscan; GE Healthcare, LittleChalfont) was injected intravenously in a rate of 2.5 ml/s, followed with the same volume of saline. A first postcontrast scan was collected 20 s after contrast agent injection. Five subsequent postcontrast images were acquired at intervals of 58 s, resulting in six postcontrast images for each patient (t = 2, 3, 4, 5, 6, and 7 min).
Image processing
In order to enhance the accuracy of the feature extraction algorithm, it is crucial to specify a rectangular region of interest (ROI) by utilizing the coordinates within the MRI image. This particular step endeavor to eliminate any potential interference stemming from inconsequential data, encompassing texts, icons, and noise. All images were resampled prior to labeling the ROI to ensure that the voxel spacing of all images was standardized to 1 mm×1 mm×1 mm. Two radiologists utilized ITK-SNAP software to manually characterize the three-dimensional ROI of the tumor. The center and contour of the tumor ROI were delineated manually. The central necrotic area was not delineated.
Extraction of radiological features
These hand-crafted features can be divided into three groups: texture, geometry, and intensity. Geometric features describe the three-dimensional shape of the tumor. Intensity characteristics describe the first-order statistical distribution of voxel intensity within the tumor. Texture features describe patterns and spatial distributions of intensity, including second-order and higher-order patterns. The extraction methods of texture features include gray level run length matrix (GLRLM), gray co-occurrence matrix (GLCM), neighborhood gray color difference matrix (NGTDM) and gray size area matrix (GLSZM).16.
Identification of differentially expressed genes through WTS
The TCIA cohort covered 53 ALN- patients and 50 ALN+ patients. And the WTS cohort consisted of four ALN- patients and four ALN+ patients. Differential expressed genes were obtained separately from the TCIA cohort (Fc=1, P<0.05) and the WTS cohort (Fc=3, P<0.05) through limma differential analysis. The heat maps and the volcano plots visually displayed the distribution of gene expression levels differences between ALN- and ALN+ patients. Subsequently, we took the intersection of these two sets of differentially expressed genes to determine which of them were more valuable. Univariate LR analysis was then implemented to search for correlations between differential gene expression levels and ALN status in BC patients and to explore independent prognostic differentially expressed genes in the TCIA cohort. We also constructed a Pearson correlation coefficient matrix, excluding genes with a correlation coefficient greater than 0.6.
Selection of radiological features and genetic features
Difference analysis (P<0.05) and Pearson correlation coefficient analysis (r<0.9) were used as the initial screening methods for extracting radiological features. And then, in the linear SVM model constructed with the TCIA cohort, a recursive feature elimination (RFE) algorithm and 20-fold cross-validation were used to select the optimal feature subset. The feature weights in the linear SVM model were used to weight the radiological feature values and construct a rad-score. The optimal gene set was subsequently subjected to multivariate LR analysis to further refine and retain the most influential subset of genes, as well as their weights. Finally, a gene-score and a rad-score were constructed by linearly combining the weights and values of the selected features.
Interpretability of the SVM model
In order to explain the contribution of each radiological feature to the model prediction we turned to the ‘SHAP’ package in Python3.9.2 to explain how important the features were and how they were involved in the prediction of ALNM. SHAP (SHapley Additive exPlanations) is a method to explain individual predictions based on the optimal Shapley value in cooperative game theory. It uses SHapley values to connect optimal credit assignments to local explanations and quantify the contribution of each feature to the predictions made by the model, helping us better understand how the model makes predictions.
Performance of the rad-score and gene-score in external validation cohorts
We demonstrate via violin plots the difference of the rad-score between ALN- and ALN+ patients in the Duke cohort and FoShan cohort, while gene-score were examined in the TCGA cohort.
Construction and evaluation of a radiogenomic multimodal nomogram
Univariate analysis was accomplished on the clinical characteristics, rad-score, and gene-score in the TCIA cohort, and variables with significant characteristics were selected for multivariate analysis to construct a radiogenomic multimodal model. The discrimination of the radiogenomic multimodal nomogram was evaluated using the area under the curve (AUC) of the receiver operator characteristic in the TCIA cohort and the TCIA validation cohort. Calibration curves and decision curve analysis (DCA) were applied to assess the calibration of the model and its usefulness in clinical decision-making, respectively. Finally, each cohort was stratified into high-risk and low-risk groups based on the median values of the nomogram prediction, gene-score, and rad-score in the TCIA cohort.
GSEA in different risk groups
We executed KEGG analysis to compare differences in gene expression between different risk groups and to identify risk-associated biological processes or pathways. Pathways were defined as enriched when P<0.05.
Stratified analysis of immune checkpoint genes based on radiological characteristics
To explore the association between risk groups and immune checkpoint genes in the TCIA cohort, we counted the distribution of immune checkpoint gene expression across different risk groups, presented by violin plots. A Wilcoxon test was conducted to compare the differences between them for statistical significance.
Evaluation of the therapeutic response
To investigate potential differences in treatment efficacy among distinct risk subgroups, we selected six commonly used drugs for the treatment of BC and calculated their respective IC50 values in the TCIA cohort with the ‘pRRophetic’ package.
Statistical analysis
All statistical analyses were performed with the R statistical software, version 4.2.3 and 4.3.1, and Python software, version 3.9.2. Comparisons between groups were made using the independent samples t-test, Wilcoxon test, and limma test. Durbin–Watson Test and Breusch–Pagan Test was applied to test the independence assumption and the assumption of homogeneity of variances in the LR model with the ‘lmtest’ package, separately. The nomogram and the calibration curve were plotted with the ‘rms’ package. The AUC was utilized to estimate the probability of correctly predicting ALNM with the ‘pROC’ package. DCA curves were applied to assess the net benefit and threshold of the clinical model with the ‘rmda’ package. A two-sided P-value threshold of 0.05 was used to determine statistical significance.
Results
Baseline characteristics
This study included a total of 1078 independent BC patients from international multicohort databases. They consist of five cohorts, including the TCIA cohort (N=103), Duke cohort (N=138), FoShan cohort (N=106), TCGA cohort (N=680), and the TCIA validation cohort (N=51). The clinical and pathological characteristics are presented in Table 1, encompassing age, pathology, subtype, T, N, TNM stage, ER, PR, and HER2 status.
Table 1.
Clinicopathological characteristics of patients in the training cohort and validation cohorts.
| TCIA cohort | TCIA validation cohort | Duke cohort | FoShan cohort | TCGA cohort | |
|---|---|---|---|---|---|
| Characteristic | N (%) | N (%) | N (%) | N (%) | N (%) |
| Number of patients | 103 (100) | 51 (100) | 138 (100) | 106 (100) | 680 (100) |
| Age, median (IQR) | 53 (30–79) | 55 (38–79) | 53 (30–81) | 51 (27–80) | 57 (26–90) |
| Pathology | |||||
| Invasive | 89 (86.4) | 41 (80.4) | 0 | 104 (98.1) | 0 |
| Others | 14 (13.6) | 10 (19.6) | 0 | 2 (1.9) | 0 |
| Unknown | 0 | 0 | 138 (100) | 0 | 680 (100) |
| T stage | |||||
| T1 | 40 (38.8) | 14 (27.4) | 44 (31.9) | 60 (56.6) | 186 (27.4) |
| T2 | 58 (56.3) | 34 (66.7) | 74 (53.6) | 46 (43.4) | 385 (56.6) |
| T3/T4 | 5 (4.9) | 3 (5.9) | 20 (14.5) | 0 | 108 (15.9) |
| Unknown | 0 | 0 | 0 | 0 | 1 (0.1) |
| N stage | |||||
| N0 | 53 (51.4) | 31 (60.8) | 80 (58.0) | 50 (47.2) | 326 (47.9) |
| N1 | 36 (35.0) | 13 (25.5) | 42 (30.4) | 45 (42.4) | 233 (34.3) |
| N2 | 10 (9.7) | 4 (7.8) | 12 (8.7) | 9 (8.5) | 82 (12.1) |
| N3 | 4 (3.9) | 3 (5.9) | 4 (2.9) | 2 (1.9) | 39 (5.7) |
| ER status | |||||
| Negative | 20 (19.4) | 12 (23.5) | 30 (21.7) | 13 (12.3) | 145 (21.3) |
| Positive | 83 (80.6) | 39 (76.5) | 108 (78.3) | 92 (86.8) | 468 (68.8) |
| Unknown | 0 | 0 | 0 | 1 (0.9) | 67 (9.9) |
| PR status | |||||
| Negative | 28 (27.2) | 17 (33.3) | 47 (34.1) | 18 (17.0) | 193 (28.4) |
| Positive | 75 (72.8) | 34 (66.7) | 91 (65.9) | 87 (82.1) | 419 (61.6) |
| Unknown | 0 | 0 | 0 | 1 (0.9) | 68 (10.0) |
| HER2 status | |||||
| Negative | 82 (79.6) | 42 (82.4) | 113 (81.9) | 74 (69.8) | 435 (64.0) |
| Positive | 21 (20.4) | 9 (17.6) | 25 (18.1) | 17 (16.0) | 107 (15.7) |
| Unknown | 0 | 0 | 0 | 15 (14.2) | 138 (20.3) |
ER, estrogen receptor; PR, progesteron receptor; HER2, human epidermal growth factor receptor 2.
Identification of significant genes with differential expression in ALN+ and ALN-
In an effort to identify genes associated with ALNM in BC patients, we utilized limma differential analysis to discern 563 differentially expressed genes within the TCIA cohort (Fc=1, P<0.05) and 583 differentially expressed genes within the WTS data (Fc=3, P<0.05). The gene expression levels and difference of these genes between ALN- and ALN+ tissue were illustrated as volcano maps (Fig. 3A, C) and heatmaps (Fig. 3B, D). The intersection of the two gene sets includes 23 genes (Fig. 3E). Subsequently, we retained significant genes by univariate analysis and set a threshold of 0.6 to exclude genes with high correlation. Ultimately, we successfully identified 16 potential predictive genes, which were used for further screening in the ML model (Supplementary Table S1, Supplemental Digital Content 2, http://links.lww.com/JS9/B684).
Figure 3.

Genes differentially expressed between whole-transcriptome sequencing (WTS) data and The Cancer Imaging Archive (TCIA) cohort. Volcano plot and heatmap of differentially expressed genes in the TCIA cohort (A, B) and in the WTS samples (C, D). The intersection of genes differentially expressed in both WTS samples and TCIA cohort (E).
Selection of optimal features via the SVM-Recursive Feature Elimination (RFE) algorithm
In each cohort with radiological data, we extracted 1014 radiological features that can holographically embodies the radiological features of the images. In order to simplify the amount of calculation, the radiological features were initially screened by an independent sample t-test (P<0.05) and correlation coefficient matrix (r<0.9). Next, we separately established SVM models based on radiological and genetic features, and employed GridSearchCV to find the optimal parameters for the models (Fig. 4A, B). After traversing all hyperparameter combinations to train the model in the specified search space, the hyperparameter combination with the highest score of model was selected. To visualize the discriminative ability of the SVM model, we employed principal component analysis (PCA) to condense the high-dimensional attributes of the samples into a two-dimensional space. Subsequently, we utilized the SVM model to identify appropriate decision boundaries for the classification of samples. (Figs 4C, D). Afterward, we drew on the RFE algorithm to filter the radiological features and genetic features layer by layer, and then selected the optimal feature sets combining cross-validation (Figs 4E, F). Ultimately, four radiological features were employed to construct the rad-score, including log_sigma_5_0_mm_3D_firstorder_90Percentile, log_sigma_5_0_mm_3D_firstorder_Skewness, wavelet_HLL_firstorder _Skewness, and wavelet_LLL_firstorder_Skewness. Twelve genetic features obtained through the SVM-RFE algorithm were further selected through multivariate LR analysis, resulting in the final selection of three genetic features for the construction of the gene-score, including MESP, RAB40B, and ACADS. Rad-score and gene-score were constructed by the linear combination of product of features and weights. Rad-score=0.014×log_sigma_5_0_mm_3D_firstorder_90Percentile+0.958×log_sigma_5_0_mm_3D_firstorder_Skewness1.067 ×wavelet_HLL_firstorder_Skewness-0.495×wavelet_LLL_firstorder_Skewness. Gene-score=0.944×MESP-1.158×RAB40B+0.780×ACADS. Using the aforementioned formulas, it is possible to calculate the rad-score and gene-score associated with distinct feature values in various samples. Consequently, these scores can be utilized to collectively forecast ALNM.
Figure 4.
Selection of radiogenomic features using the SVM-RFE algorithm. GridSearchCV plot searching for the optimal parameter (A, B), the decision boundary curve (C, D) and 20-fold cross-validation (E, F) of the RFE algorithm of SVM model based on radiological features and genetic features. (score: the performance of the cross-validated model on the set; C: penalty coefficient of SVM model; gamma: a kernel function parameter in SVM model; degree: dimensions of polynomial kernel function in SVM model).
Rationale for the selection of the SVM model
Risk prediction could be extracted from SVM models using SHAP values to facilitate global-level risk explanation. The SHAP value not only considered the influence of a single variable, but also considered the synergistic effect between variables. As the SVM-RFE algorithm represents the final stage of radiological feature screening, we displayed SHAP summary plots of various radiological features and their contributions to individual and multiple samples in Figure 5. These plots provided a visual representation of the importance of each feature in predicting the risk of ALNM. The feature importance map of SHAP value (Fig. 5A) calculated the marginal contribution of each feature when added to the model. It then considered the different marginal contributions of the feature in all feature sequences, displayed in the form of the mean. The feature, log_sigma_5_0_mm_3D_firstorder_90Percentile, had the greatest impact on the prediction of the model (mean |SHAP value|=0.404), while wavelet_LLL_firstorder_Skewness had the minimal impact (mean |SHAP value|=0.255). The impact of feature value on model output was presented in SHAP summary plot (Fig. 5B). From the partial dependence curve, we could see that log_sigma_5_0_mm_3D_firstorder_90Percentile and log_sigma_5_0_mm_3D_firstorder_Skewness were both linearly positively correlated with the SHAP value, while wavelet_HLL_firstorder_Skewness and wavelet_LLL_firstorder_Skewness were both linearly negatively correlated with the SHAP value (Fig. 5C–F). The waterfall plot of feature effects in a single sample showed similar contribution results to the population (Fig. 5G). The multisample SHAP force plot was a macroscopic and intuitive understanding of the influence of different features of numerous samples on contribution (Fig. 5H).
Figure 5.

Interpretability of SVM model based on radiological features in TCIA cohort and the validation of rad-score and gene-score. Red represents positive contribution, and blue represents negative contribution. Feature importance map based on SHAP value (A). Summary graph of SHAP value (B). Partial dependency graph of optimal radiological feature set (C, D, E and F). SHAP force plot on a single sample (G). SHAP force plot on multiple samples ordered by similarity (H). The validation of rad-score in the Duke cohort (I) and FoShan cohort (J). The validation of gene-score in the TCGA cohort (K).
Validation of gene-score and rad-score in external validation cohorts
We analyzed the distribution of scores across different cohorts and compared whether there was a difference in the distribution between the ALN- and ALN+ patient populations. The results showed that the gene-score and rad-score of the ALN+ BC patients were higher than those of the ALN- BC patients (P<0.05) in Figure 5I–K.
Construction and evaluation of a radiogenomic multimodal nomogram
By analyzing the radiogenomic multimodal data with univariate and multivariate LR models, we identified gene-score, rad-score, and pathology as significant predictors of ALNM (Table 2). We executed the Durbin–Watson Test and the Breusch–Pagan Test on these three predictors, and the results showed that they satisfied the assumptions of independence and the assumptions of homogeneity of variances (P=0.95, P=0.54). A radiogenomic multimodal nomogram incorporating these three predictors was constructed (Fig. 6A). From the nomogram, we can see that gene-score and rad-score are positively correlated with ALNM, which is consistent with the results in external validation cohorts. Within the spectrum of values, each significant predictor is assigned a corresponding bar chart score. The cumulative score of these three predictors equates to the probability of predicting ALNM using the radiogenomic multimodal model. Consequently, each sample is allocated a prediction value from the multimodal model, in addition to its actual value. The proportional distribution of these values is visually represented through a confusion matrix (Fig. 6B, C). The AUC in the TCIA cohort was 0.82 (95% CI: 0.74–0.91; Fig. 6D), and the AUC in the TCIA validation cohort was 0.77 (95% CI: 0.63–0.91; Fig. 6E). From the results of the DeLong test, it could be seen that in the TCIA cohort, the AUC of the nomogram was different from those of the rad-score and gene-score. This suggested that nomograms were more predictive of ALNM than individual rad-score (P=0.013) or gene-score (P=0.043) as predictors. The calibration curves showed concordance between the prediction value by nomogram and the actual observation probability of ALNM in BC patients in the TCIA cohort (Fig. 6F) and the TCIA validation cohort (Fig. 6G). The DCA for the nomogram, presented in Figure 6H, I, indicated that when the threshold probability for a doctor or a patient was within a range from 0.10 to 0.93 and 0.13 to 0.79 in the TCIA cohort and TCIA validation cohort, respectively, the nomogram added more net benefit than the ‘treat all’ or ‘treat none’ strategies.
Table 2.
Univariate and multivariate analyses in the training cohort.
| Univariate analysis | Multivariate analysis | |||
|---|---|---|---|---|
| Variable | OR (95% CI) | P ⁎ | OR (95% CI) | P ⁎ |
| Age | 0.610 (0.322–1.158) | 0.131 | ||
| T stage | ||||
| T1 | Referent | |||
| T2 | 1.310 (0.584–2.939) | 0.513 | ||
| T3/T4 | 0.815 (0.123–5.418) | 0.832 | ||
| ER | 0.930 (0.350–2.470) | 0.885 | ||
| PR | 1.124 (0.471–2.681) | 0.793 | ||
| HER2 | 1.544 (0.587–4.061) | 0.379 | ||
| Pathology | 0.142 (0.030–0.673) | 0.0139 ⁎ | 0.149 (0.025–0.908) | 0.0389 ⁎ |
| Rad-score | 2.998 (1.561–5.756) | 0.0010 ⁎ | 2.714 (1.311–5.618) | 0.0072 ⁎ |
| Gene-score | 3.909 (2.028–7.534) | <0.0001 ⁎ | 3.538 (1.740–7.195) | 0.0005 ⁎ |
ER, estrogen receptor; PR, progesteron receptor; HER2, human epidermal growth factor receptor 2.
Bold values indicate statistical significance (P<0.05).
Figure 6.

Construction and validation of a radiogenomic multimodal nomogram. Radiogenomic multimodal nomogram for predicting the ALNM (A). ROC curve, confusion matrix and calibration plot, and decision curve in the TCIA cohort (B, D, F, and H) and the internal validation cohort (C, E, G, I).
Risk stratification based on different indicators
The median values of the rad-score, gene-score, and nomogram prediction in the TCIA cohort were defined as cutoffs for risk stratification. The risk grouping of patients was illustrated in different cohorts based on various indicators in Figure 7. The results showed that the predictors modeled by the TCIA cohort, whether nomogram prediction or rad-score or gene-score, were statistically significantly higher in high-risk group than those of the low-risk group other validation cohorts. Considering the positive correlation between rad-score or gene-score and the risk of ALNM, it is convinced that the high-risk group is more likely to occur ALNM than the low-risk group.
Figure 7.
The violin diagram of risk stratification based on nomogram prediction, rad-score, or gene-score. Risk stratification based on nomogram prediction (A), rad-score (B), or gene-score (C) in the TCIA cohort. Risk stratification based on nomogram prediction (D), rad-score (E), or gene-score (F) in TCIA validation cohort. Risk stratification based on rad-score in the Duke cohort (G) and FoShan cohort (H). Risk stratification based on gene-score in the TCGA cohort (I).
GSEA of the risk stratification based on the prediction of the nomogram
To further understand the potential molecular mechanisms underlying the differences between low-risk and high-risk patients, GSEA was utilized (Fig. 8A). The high-risk group showed enrichment in pathways closely related to the sugar metabolism pathway. And the genes of the low-risk group were enriched in the pantothenate and CoA biosynthesis pathway. This reflected possible different biology behavior in tissue metabolic levels between high-risk and low-risk groups.
Figure 8.
Landscape of gene function and signaling pathways in risk subtypes. GSEA in the risk stratification based on nomogram prediction in the TCIA cohort (A). Differentially expressed immune checkpoint genes in the risk stratification based on rad-score in the TCIA cohort (B).
Immunization status of the risk stratification based on rad-score
To evaluate the sensitivity of different risk groups to, we examined the changes in expression levels of immune checkpoints. Specifically, we compared the differential expression levels of immune checkpoints between different risk groups in the TCIA cohort (Fig. 8B). Our analysis revealed that the high-risk group had higher expression levels of PD-L1, CD8A, CTLA4, CXCL10, GZMB, and HAVCR2, indicating a potential better response to targetable immunotherapy. For the risk stratification based on the nomogram prediction, the expression of PD-1 also showed a significant difference between risk groups.
Drug sensitivity analysis of the risk stratification based on the prediction of the nomogram
We examined the sensitivity of the high-risk and low-risk groups to commonly used clinical drugs. Patients in the high-risk group had higher IC50 to Doxorubicin and Methotrexate, which meant patients in the high-risk group have lower sensitivity to these two drugs (Fig. 9A, B). And patients in the low-risk group had a higher IC50 to Lapatinib, which meant patients in the high-risk group have higher sensitivity to Lapatinib (Fig. 9C). However, there were no evidence that could approve the different sensitivity between low-risk and high-risk group in Docetaxel, Paclitaxel, and Vinorelbine (Fig. 9D–F). The administration of these drugs to patients across varying risk groups may not yield substantial benefits. The rich results of the drug sensitivity analysis suggested that the grouping of high-risk and low-risk group had profound clinical significance in BC drug therapy. This implies that in clinical practice, we can guide the drug regimens for different patients through risk stratification in order to achieve better clinical outcomes.
Figure 9.
The estimated IC50 of different drugs between the low-risk and high-risk groups based on nomogram prediction, including Doxorubicin (A), Methotrexate (B), Lapatinib (C), Docetaxel (D), Paclitaxel (E), and Vinorelbine (F).
Discussion
In this study, we established a predictive radiogenomic multimodal nomogram for ALNM, combining radiological features, genetic features, and clinicopathological features through ML. The nomogram demonstrated good predictive performance for ALNM [AUC: 0.82 (0.74–0.91) in the TCIA cohort; AUC: 0.77 (0.63–0.91) in the TCIA validation cohort]. The BC patients stratified by the rad-score, gene-score, and nomogram prediction were classified into low-risk and high-risk groups, which presented a higher risk of ALNM in the high-risk group. Through GSEA, we found that genes from different risk groups were enriched in different pathways. Immunotherapy was likely to have a more dramatic effect in high-risk group. The IC50 of drugs showed different tendencies in different risk groups. This approach may have clinical significance for avoiding unnecessary axillary surgery and guiding individual drug therapeutic interventions.
MRI is widely used to detect BC17,18. There is amount of evidence that MRI has a higher sensitivity in detecting breast malignancies in individuals with a high-risk of BC or an average risk of dense breast than mammography19, while BC with a genetic background accounts for 5–10% of all BC20, and it is estimated that up to 50% of women in the United States experience dense breasts in mammography21. A 2010 questionnaire distributed among the members of the American Society of Breast Surgeons (ASBrS) analyzed the replies from 1012 surgeons on the use of MRI in tens of thousands of BC patients. It was discovered that 41% of surgeons routinely employed MRI, which is characterized as usage exceeding 75% of instances, for patients newly diagnosed with BC, while in patients with family history, the percentage of practicing surgeons was 73.4%, and in patients with increased mammographic destiny, that was 87.9%22. Previous studies identified ALN status by identifying cancer shape, size, kinetic curves, disappearance of lymph parenchyma, signs of adjacent vessels, and disappearance of lymphoid parenchyma in MRI images23–25. However, these traditional radiological results can provide limited quantitative features and rely too much on the doctor’s clinical experience and subjective judgment26,27. How to accurately predict ALN metastasis objectively and simply is still an important problem to be solved. Radiomics is a field that utilizes advanced computational techniques to extract quantitative features from medical images, allowing for the characterization of tumor tissue in a more objective and accurate manner28. This emerging assay has been widely used to identify molecular subtypes and predict chemotherapy response in BC29,30, and can also be applied to predict ALNM with an acceptable predictive performance11,31. In the study, we identified four first-order statistical features of MRI images:log_sigma_5_0_mm_3D_firstorder_90Percentile, log_sigma_5_0_mm_3D_firstorder_Skewness, wavelet_HLL_firstorder_Skewness, wavelet_LLL_firstorder_Skewness. First-order statistical features are used to assess uniform patterns and variability in images, representing statistical values of image intensities32.
Additionally, by analyzing the genomic profiles of breast tumors, some researchers have used genomic approaches to identify specific genetic changes that are associated with an increased risk of recurrence and distant metastasis, and predict the rate of 5-year survival33,34. These investigations have yielded novel approaches for evaluating the risk of BC patients and predicting prognosis. WTS technology, has been found extensive applications in various facets of cancer research and treatment35. This technology has paved the way for novel insights into cancer diagnosis, treatment, and prevention. In addition to using genetic data from public databases, we also employed WTS technology to analyze clinical samples in this study, owing to its superior sensitivity and accuracy compared to gene microarray and sanger sequencing35. This allowed us to further calibrate the differentially expressed genes between the BC patients with ALN- and ALN+, thereby enhancing the reliability of our research results in the pan-population. In this study, the genetic signatures finally identified encompasses three genes. RAB40B plays a key role in mediating invadopodia function during BC cell migration and invasion36. MESP1 directly regulates various genes involved in multiple hallmarks of cancer37. High MESP1 expression and MESP1-regulated gene signatures are associated with poor prognosis in NSCLC patients37. ACADS proteins are a family of mitochondrial enzymes that catalyze the initial rate-limiting step in the oxidation of fatty acyl-CoA β38. Studies have shown that ACADS is involved in the proliferation and metastasis of HCC39, and is upregulated in colorectal cancer tissues38.
Several clinicopathological factors have been reported to be prognostic for the ALN status of BC patients, including tumor size, histological type, tumor grade, HER2 overexpression, and hormone receptor status9,40. It is widely accepted that patients with invasive tumors are more likely to have lymph node involvement, which can lead to a poorer prognosis. Therefore, in this study, we also included clinical pathology as an independent predictor of ALNM through univariate LR testing.
Radiomics, genomics, and clinicopathology each offer unique insights into disease occurrence and progression. Radiomics provides statistical descriptions derived from medical imaging, genomics contributes a molecular biological perspective, and clinicopathology offers outcome predictions based on histological analysis. Together, these disciplines create a comprehensive understanding of disease from multiple levels of analysis. Previous studies have often only focused on single radiomics41, genomics10, or clinicopathological analysis40, but few studies have combined these three research methods to evaluate the ALN status and individualized medicine of BC patients In terms of the predictive ability of the model, the AUC of the multimodal model in this study is also excellent considering some other radiomics studies42–44 and genomics studies45, although some of them combined clinicopathological analysis46. Although some researchers have reported associations between certain radiological features and genetic phenotypes47, medical imaging is still unlikely to fully reflect the microscopic characteristics of tumors, which was tested in the Durbin–Watson Test for the independence of rad-score, gene-score, and pathology (P<0.95). In the study, we explored ML to screen and integrate these three modal features of clinical practice.
ML techniques have been extensively utilized in the analysis of clinical data for the development of potent risk stratification models and the reclassification of patient groups48. These techniques can facilitate the prediction of outcomes based on multiple characteristics or the identification of recurrent patterns within high-dimensional data sets49. A wide array of them are available, capable of accommodating the intricacies and diversity inherent in clinical data and objectives49. Compared to other ML models, the SVM model excels in handling small sample data and exhibits strong generalization capabilities, which is more suitable for the small sample data of the TCIA cohort in this study for classification prediction50. Additionally, unlike the neural network algorithm, the SVM model does not suffer from local minimum problems51. The excellent performance of SVM in various ML comparisons has also been verified43,44. A common challenge with many ML methods is that their predictions often lack interpretability, which may be due to their inherent black-box nature, such as with SVM52,53. To address this issue, researchers have attempted various model-specific and model-agnostic approaches to explain ML predictions. For instance, feature weighting is a model-specific method that can identify the contributions of features that influence ML model predictions54. As a model-agnostic approach, sensitivity analysis is commonly used to study the impact of changes in system feature values on model performance55. However, these methods have their limitations. Feature weighting is more applicable to simple models with limited predictive performance, which restricts its relevance and explanatory demand56. Sensitivity analysis calculations in high-dimensional models are often challenging57. Therefore, we propose SHAP to explain the application of SVM models in radiogenomic, which is generally applicable to any complex ML method and can be understood as an extension of the locally interpretable model-agnostic explanations method58. Due to its strong interpretability and predictive performance, we chose LR as the final multimodal fusion method and presented the model in the form of a nomogram. Thus, a ML prediction model that leverages the strengths of multimodal technology may enable more effective prediction for ALNM in BC patients.
Additionally, the benefits of chemotherapy and immunotherapy are not entirely positive, because it is associated with a higher risk of toxicities and increasing financial costs59,60. It was reported that administration of trastuzumab has been observed to induce a significant decline in left ventricular ejection fraction (LVEF) in 7.1% to 18.6% of patients12. Furthermore, simultaneous treatment with trastuzumab, an anthracycline, and cyclophosphamide has been associated with cardiac dysfunction in up to 27% of patients undergoing treatment for HER2-positive metastatic BC12. Another study indicated that chemotherapy may have brain toxicity, leading to cognitive changes61. Therefore, avoiding unnecessary chemotherapy and immunotherapy has important clinical value. Among them, the evaluation and prediction of drug reactions among different populations is an important indicator of whether to undergo chemotherapy or immunotherapy. A multimodal model that combines radiomics, genomics, and clinical pathology can enable us to better assess the potential response of patients to drugs from a holistic and multidimensional perspective. In this study, we initially conducted GSEA. The pathways enriched in the high-risk group were related to glucose metabolism, including the glycolysis gluconeogenesis, glycosylphosphatidylinositol GPI anchor biosynthesis and butanoate metabolism pathways. It has been reported in many studies that these glucose metabolism-related pathways are closely related to the outcome and prognosis of cancer62–64. The genes in the low-risk group are enriched in the pantothenate and CoA biosynthesis pathway, which is more widely involved in maintaining cellular energy levels and the normal operation of many enzymes, such as glucose metabolism, fat metabolism, and acetylcholine synthesis65. After performing risk stratification analysis of immunotherapy, we found that the expressions of immune checkpoint genes in the high-risk group such as PD-L1, CD8A, CTLA4, CXCL10, GZMB, and HAVCR2 were significantly upregulated compared with the low-risk group, indicating that immune checkpoint inhibitors targeting these targets may be able to provide patients in the high-risk group with a better survival outcomes66. And in the risk stratification analysis of drugs, lapatinib is associated with a higher IC50 in low-risk BC patients, while doxorubicin and methotrexate are related to a higher IC50 in high-risk BC patients. Lapatinib is a type of targeted therapy known as a tyrosine kinase inhibitor67. It works by blocking epidermal growth factor receptor on the surface of cells that can encourage the cancer to grow67. Its difference in sensitivity between high-risk and low-risk groups may be attributed to the enriched GPI anchor biosynthesis pathway found in high-risk groups. This is because not only epidermal growth factor but also GPI-anchored proteins and their biosynthetic pathways all play crucial roles in the growth and transfer of cancer62,67. Doxorubicin is an anthracycline type of chemotherapy that is used to treat several different types of cancer. We speculate that in addition to the interference of doxorubicin itself on DNA function, which may indirectly affect the upregulated immune checkpoint genes in the high-risk group through the suppression of immune responses68. Methotrexate can be used in cancer treatment as a folate antagonist. By inhibiting dihydrofolate reductase, an essential enzyme in folate metabolism, it prevents the production of intermediates required for DNA replication and protein synthesis, which is effective for actively dividing cells, such as cancer cells, the most effective69. The pantothenate and CoA biosynthesis pathway enriched in the low-risk group may account for the higher sensitivity of methotrexate in the low-risk group. From the results of GSEA and immune analysis, we may be able to glimpse the genomic mechanism behind the difference in IC50 between high-risk and low-risk groups.
Although the applicability of the SVM model in small samples makes up for the problem of fewer samples with both genetic data and radiological data in this study, the problems of accuracy and weak inference ability faced by small samples still cannot be ignored. In the future, we will need more radiogenomic data for further verification. Secondly, we did not have a prognostic survival analysis for different risk groups, which requires our future collection of relevant data. Thirdly, moving forward, our research will delve deeper into the molecular mechanisms of these genes, aiming to shed light on the underlying processes driving lymph node metastasis.
Conclusion
In summary, the radiogenomic multimodal nomogram based on the SVM model outperforms clinical and single-omics models in several aspects. Firstly, it more accurately predicts ALN involvement and avoid unnecessary axillary surgery, which can reduce surgical complications for patients as well as learning costs and time costs for surgeons in clinical decision-making. Secondly, it is available on guiding therapeutic schemes of different risk groups. Additionally, the radiogenomic multimodal model brings multidimensional overall understanding and risk stratification of BC from the level of medical image, molecular biology, and tissue cells.
Ethical approval
This study was approved by Institutional Research Ethics Committee of GDPH (No.S2023-416-01) and conducted under the guidance of the Declaration of Helsinki.
Consent
A waiver of informed consent was granted due to the retrospective nature of the study. All patientrelevant information was anonymous and de-identified.
Sources of funding
This work was supported by grant number 82372601 and 82002928 from the National Natural Science Foundation of China (Jianguo Lai), grant number 202201011184 from Guangzhou Municipal Science and Technology Bureau (Jianguo Lai), grant number A2023035 from Medical Scientific Research Foundation of Guangdong Province (Jianguo Lai), grant number 2022YW030009 form National Key Clinical Specialty Construction Project (Jianguo Lai), grant number 2220001003854 and 2220001004847 from Science and technology innovation project of Foshan(Gengxi Cai), grant number 20220479 from Medical research project of Foshan Health Bureau(Gengxi Cai), grant number 2021A1515111087 from GuangDong Basic and Applied Basic Research Foundation(Gengxi Cai), grant number A2022509 from Medical Scientific Research Foundation of Guangdong Province(Gengxi Cai), and grant number JJTS2020-019 from Beijing Jingjian Pathology Development Fund (Gengxi Cai).
Author contribution
J.L.: conceptualization, methodology, software, data curation, formal analysis, validation, funding acquisition, supervision, and writing – review and editing; Z.C.: methodology, software, data curation, formal analysis, visualization, validation, and writing – original draft; J.L.: methodology, software, data curation, and writing – original draft; C.Z. and H.H.: investigation; Y.Y. and G.C.: data curation and investigation; N.L.: supervision and project administration. All authors confirm that they contributed to manuscript reviews and critical revision for important intellectual content, and read and approved the final draft for submission. All authors agree to be accountable for the content of this study.
Conflicts of interest disclosure
The authors declares no conflict of interest.
Research registration unique identifying number (UIN)
Name of the registry: Research Registry.
Unique identifying number or registration ID: researchregistry9563.
Hyperlink to your specific registration (must be publicly accessible and will be checked): https://www.researchregistry.com/browse-the-registry#home/.
Guarantor
Jianguo Lai.
Data availability statement
The public datasets generated or analyzed during the present study are available in the TCGA database (https://portal.gdc.cancer.gov/); the TCIA datasets(https://www.cancerimagingarchive.net/). Data related to the patients of The First People’s Hospital of Foshan in the study is available from corresponding author by request.
Provenance and peer review
Not commissioned, externally peer-reviewed.
Supplementary Material
Acknowledgements
Assistance with the study.
The authors thank all authors who contributed valuable methods and data and made them public.
Footnotes
Jianguo Lai, Zijun Chen, Jie Liu, Chao Zhu, and Haoxuan Huang contributed equally to this work.
Sponsorships or competing interests that may be relevant to content are disclosed at the end of this article.
Supplemental Digital Content is available for this article. Direct URL citations are provided in the HTML and PDF versions of this article on the journal's website, www.lww.com/international-journal-of-surgery.
Published online 11 January 2024
Contributor Information
Jianguo Lai, Email: laijianguo@gdph.org.cn.
Zijun Chen, Email: Chenzijun92@163.com.
Jie Liu, Email: lewiscapital2018@sina.com.
Chao Zhu, Email: zc18326946380@126.com.
Haoxuan Huang, Email: hhx08120015@163.com.
Ying Yi, Email: yiying709@126.com.
Gengxi Cai, Email: caigengxi2021@126.com.
Ning Liao, Email: clarkdawson0210@163.com.
References
- 1.Giaquinto AN, Sung H, Miller KD, et al. Breast cancer statistics, 2022. CA Cancer J Clin 2022;72:524–541. [DOI] [PubMed] [Google Scholar]
- 2.Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71:209–249. [DOI] [PubMed] [Google Scholar]
- 3.Beenken SW, Urist MM, Zhang Y, et al. Axillary lymph node status, but not tumor size, predicts locoregional recurrence and overall survival after mastectomy for breast cancer. Ann Surg 2003;237:732–738. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ahmed M, Purushotham AD, Douek M. Novel techniques for sentinel lymph node biopsy in breast cancer: a systematic review. Lancet Oncol 2014;15:e351–e362. [DOI] [PubMed] [Google Scholar]
- 5.National Comprehensive Cancer Network NCCN guidelines 2023. https://www.nccn.org/professionals/physician_gls/default_nojava.aspx
- 6.Liang Y, Chen X, Tong Y, et al. Higher axillary lymph node metastasis burden in breast cancer patients with positive preoperative node biopsy: may not be appropriate to receive sentinel lymph node biopsy in the post-ACOSOG Z0011 trial era. World J Surg Oncol 2019;17:37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Che Bakri NA, Kwasnicki RM, Khan N, et al. Impact of axillary lymph node dissection and sentinel lymph node biopsy on upper limb morbidity in breast cancer patients: a systematic review and meta-analysis. Ann Surg 2023;277:572–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wilke LG, McCall LM, Posther KE, et al. Surgical complications associated with sentinel lymph node biopsy: results from a prospective international cooperative group trial. Ann Surg Oncol 2006;13:491–500. [DOI] [PubMed] [Google Scholar]
- 9.Viale G, Zurrida S, Maiorano E, et al. Predicting the status of axillary sentinel lymph nodes in 4351 patients with invasive breast carcinoma treated in a single institution. Cancer 2005;103:492–500. [DOI] [PubMed] [Google Scholar]
- 10.Guo ZW, Liu Q, Yang X, et al. Noninvasive prediction of axillary lymph node status in breast cancer using promoter profiling of circulating cell-free DNA. J Transl Med 2022;20:557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Samiei S, Granzier RWY, Ibrahim A, et al. Dedicated axillary MRI-based radiomics analysis for the prediction of axillary lymph node metastasis in breast cancer. Cancers (Basel) 2021;13:757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Mir A, Badi Y, Bugazia S, et al. Efficacy and safety of cardioprotective drugs in chemotherapy-induced cardiotoxicity: an updated systematic review & network meta-analysis. Cardiooncology 2023;9:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Mathew G, Agha R, Albrecht J, et al. STROCSS 2021: strengthening the reporting of cohort, cross-sectional and case-control studies in surgery. Int J Surg 2021;96:106165. [DOI] [PubMed] [Google Scholar]
- 14.Lingle W, Erickson BJ, Zuley ML, et al. The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA) (Version 3) [Data set] The Cancer Imaging Archive. Cancer Imag Arch 2016. [Google Scholar]
- 15.Saha A, Harowicz MR, Grimm LJ, et al. Dynamic contrast-enhanced magnetic resonance images of breast cancer patients with tumor locations [Data set]. Cancer Imag Arch 2021;7:42. [Google Scholar]
- 16.Mayerhoefer ME, Materka A, Langs G, et al. Introduction to radiomics. J Nucl Med 2020;61:488–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Morrow M, Waters J, Morris E. MRI for breast cancer screening, diagnosis, and treatment. Lancet 2011;378:1804–1811. [DOI] [PubMed] [Google Scholar]
- 18.Mann RM, Kuhl CK, Moy L. Contrast-enhanced MRI for breast cancer screening. J Magn Reson Imaging 2019;50:377–390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wekking D, Porcu M, De Silva P, et al. Breast MRI: clinical indications, recommendations, and future applications in breast cancer diagnosis. Curr Oncol Rep 2023;25:257–267. [DOI] [PubMed] [Google Scholar]
- 20.Rosen EM, Fan S, Pestell RG, et al. BRCA1 gene in breast cancer. J Cell Physiol 2003;196:19–41. [DOI] [PubMed] [Google Scholar]
- 21.Ho JM, Jafferjee N, Covarrubias GM, et al. Dense breasts: a review of reporting legislation and available supplemental screening options. AJR Am J Roentgenol 2014;203:449–456. [DOI] [PubMed] [Google Scholar]
- 22.Parker A, Schroen AT, Brenin DR. MRI utilization in newly diagnosed breast cancer: a survey of practicing surgeons. Ann Surg Oncol 2013;20:2600–2606. [DOI] [PubMed] [Google Scholar]
- 23.Schipper RJ, Paiman ML, Beets-Tan RG, et al. Diagnostic performance of dedicated axillary T2- and diffusion-weighted MR imaging for nodal staging in breast cancer. Radiology 2015;275:345–355. [DOI] [PubMed] [Google Scholar]
- 24.Scaranelo AM, Eiada R, Jacks LM, et al. Accuracy of unenhanced MR imaging in the detection of axillary lymph node metastasis: study of reproducibility and reliability. Radiology 2012;262:425–434. [DOI] [PubMed] [Google Scholar]
- 25.Choi EJ, Youk JH, Choi H, et al. Dynamic contrast-enhanced and diffusion-weighted MRI of invasive breast cancer for the prediction of sentinel lymph node status. J Magn Reson Imaging 2020;51:615–626. [DOI] [PubMed] [Google Scholar]
- 26.Bera K, Braman N, Gupta A, et al. Predicting cancer outcomes with radiomics and artificial intelligence in radiology. Nat Rev Clin Oncol 2022;19:132–146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Aerts HJ, Velazquez ER, Leijenaar RT, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun 2014;5:4006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology 2016;278:563–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Liu Z, Li Z, Qu J, et al. Radiomics of multiparametric MRI for pretreatment prediction of pathologic complete response to neoadjuvant chemotherapy in breast cancer: a multicenter study. Clin Cancer Res 2019;25:3538–3547. [DOI] [PubMed] [Google Scholar]
- 30.Conti A, Duggento A, Indovina I, et al. Radiomics in breast cancer classification and prediction. Semin Cancer Biol 2021;72:238–250. [DOI] [PubMed] [Google Scholar]
- 31.Zhang X, Yang Z, Cui W, et al. Preoperative prediction of axillary sentinel lymph node burden with multiparametric MRI-based radiomics nomogram in early-stage breast cancer. Eur Radiol 2021;31:5924–5939. [DOI] [PubMed] [Google Scholar]
- 32.Ye DM, Wang HT, Yu T. The application of radiomics in breast MRI: a review. Technol Cancer Res Treat 2020;19:1533033820916191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sparano JA, Gray RJ, Makower DF, et al. Clinical outcomes in early breast cancer with a high 21-gene recurrence score of 26 to 100 assigned to adjuvant chemotherapy plus endocrine therapy: a secondary analysis of the TAILORx randomized clinical trial. JAMA Oncol 2020;6:367–374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Cardoso F, van’t Veer LJ, Bogaerts J, et al. 70-gene signature as an aid to treatment decisions in early-stage breast cancer. N Engl J Med 2016;375:717–729. [DOI] [PubMed] [Google Scholar]
- 35.Hong M, Tao S, Zhang L, et al. RNA sequencing: new technologies and applications in cancer research. J Hematol Oncol 2020;13:166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Jacob A, Linklater E, Bayless BA, et al. The role and regulation of Rab40b-Tks5 complex during invadopodia formation and cancer cell invasion. J Cell Sci 2016;129:4341–4353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Tandon N, Goller K, Wang F, et al. Aberrant expression of embryonic mesendoderm factor MESP1 promotes tumorigenesis. EBioMedicine 2019;50:55–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Yeh CS, Wang JY, Cheng TL, et al. Fatty acid metabolism pathway play an important role in carcinogenesis of human colorectal cancers by Microarray-Bioinformatics analysis. Cancer Lett 2006;233:297–308. [DOI] [PubMed] [Google Scholar]
- 39.Chen D, Feng X, Lv Z, et al. ACADS acts as a potential methylation biomarker associated with the proliferation and metastasis of hepatocellular carcinomas. Aging (Albany NY) 2019;11:8825–8844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Nakagawa T, Huang SK, Martinez SR, et al. Proteomic profiling of primary breast cancer predicts axillary lymph node metastasis. Cancer Res 2006;66:11825–11830. [DOI] [PubMed] [Google Scholar]
- 41.Arefan D, Chai R, Sun M, et al. Machine learning prediction of axillary lymph node metastasis in breast cancer: 2D versus 3D radiomic features. Med Phys 2020;47:6334–6342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wang D, Hu Y, Zhan C, et al. A nomogram based on radiomics signature and deep-learning signature for preoperative prediction of axillary lymph node metastasis in breast cancer. Front Oncol 2022;12:940655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Chen C, Qin Y, Chen H, et al. A meta-analysis of the diagnostic performance of machine learning-based MRI in the prediction of axillary lymph node metastasis in breast cancer patients. Insights Imaging 2021;12:156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Chen M, Kong C, Lin G, et al. Development and validation of convolutional neural network-based model to predict the risk of sentinel or non-sentinel lymph node metastasis in patients with breast cancer: a machine learning study. EClinicalMedicine 2023;63:102176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Li X, Yang L, Jiao X. Development and validation of a nomogram for predicting axillary lymph node metastasis in breast cancer. Clin Breast Cancer 2023;23:538–545. [DOI] [PubMed] [Google Scholar]
- 46.Dihge L, Vallon-Christersson J, Hegardt C, et al. Prediction of lymph node metastasis in breast cancer by gene expression and clinicopathological models: development and validation within a population-based cohort. Clin Cancer Res 2019;25:6368–6381. [DOI] [PubMed] [Google Scholar]
- 47.Spielvogel CP, Stoiber S, Papp L, et al. Radiogenomic markers enable risk stratification and inference of mutational pathway states in head and neck cancer. Eur J Nucl Med Mol Imaging 2023;50:546–558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Deo RC. Machine learning in medicine. Circulation 2015;132:1920–1930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Greener JG, Kandathil SM, Moffat L, et al. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022;23:40–55. [DOI] [PubMed] [Google Scholar]
- 50.Du P, Bai X, Tan K, et al. Advances of four machine learning methods for spatial data handling: a review. J Geovisual Spatial Anal 2020;4:13. [Google Scholar]
- 51.Piccialli V, Sciandrone M. Nonlinear optimization and support vector machines. Ann Operations Res 2022;314:15–47. [Google Scholar]
- 52.Rodríguez-Pérez R, Vogt M, Bajorath J. Support vector machine classification and regression prioritize different structural features for binary compound activity and potency value prediction. ACS Omega 2017;2:6371–6379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Sidey-Gibbons JAM, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction. BMC Med Res Methodol 2019;19:64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Balfer J, Bajorath J. Visualization and interpretation of support vector machine activity predictions. J Chem Inf Model 2015;55:1136–1147. [DOI] [PubMed] [Google Scholar]
- 55.Thabane L, Mbuagbaw L, Zhang S, et al. A tutorial on sensitivity analyses in clinical trials: the what, why, when and how. BMC Med Res Methodol 2013;13:92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Rodríguez-Pérez R, Bajorath J. Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. J Comput Aided Mol Des 2020;34:1013–1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Polishchuk P. Interpretation of quantitative structure-activity relationship models: past, present, and future. J Chem Inf Model 2017;57:2618–2639. [DOI] [PubMed] [Google Scholar]
- 58.Rodríguez-Pérez R, Bajorath J. Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J Med Chem 2020;63:8761–8777. [DOI] [PubMed] [Google Scholar]
- 59.Wargo JA, Reuben A, Cooper ZA, et al. Immune effects of chemotherapy, radiation, and targeted therapy and opportunities for combination with immunotherapy. Semin Oncol 2015;42:601–616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Smith GL, Banegas MP, Acquati C, et al. Navigating financial toxicity in patients with cancer: A multidisciplinary management approach. CA Cancer J Clin 2022;72:437–453. [DOI] [PubMed] [Google Scholar]
- 61.Ahles TA. Brain vulnerability to chemotherapy toxicities. Psychooncology 2012;21:1141–1148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Nakakido M, Tamura K, Chung S, et al. Phosphatidylinositol glycan anchor biosynthesis, class X containing complex promotes cancer cell proliferation through suppression of EHD2 and ZIC1, putative tumor suppressors. Int J Oncol 2016;49:868–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Grasmann G, Smolle E, Olschewski H, et al. Gluconeogenesis in cancer cells - Repurposing of a starvation-induced metabolic pathway? Biochim Biophys Acta Rev Cancer 2019;1872:24–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Borges AR, Link F, Engstler M, et al. The Glycosylphosphatidylinositol anchor: a linchpin for cell surface versatility of trypanosomatids. Front Cell Dev Biol 2021;9:720536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Kreuzaler P, Inglese P, Ghanate A, et al. Vitamin B(5) supports MYC oncogenic metabolism and tumor progression in breast cancer. Nat Metab 2023;5:1870–1886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Wu M, Huang Q, Xie Y, et al. Improvement of the anticancer efficacy of PD-1/PD-L1 blockade via combination therapy and PD-L1 regulation. J Hematol Oncol 2022;15:24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Burris HA, III, Hurwitz HI, Dees EC, et al. Phase I safety, pharmacokinetics, and clinical activity study of lapatinib (GW572016), a reversible dual inhibitor of epidermal growth factor receptor tyrosine kinases, in heavily pretreated patients with metastatic carcinomas. J Clin Oncol 2005;23:5305–5313. [DOI] [PubMed] [Google Scholar]
- 68.Tacar O, Sriamornsak P, Dass CR. Doxorubicin: an update on anticancer molecular action, toxicity and novel drug delivery systems. J Pharm Pharmacol 2013;65:157–170. [DOI] [PubMed] [Google Scholar]
- 69.Hanoodi MMM. Methotrexate. StatPearls Publishing; 2023. https://www.ncbi.nlm.nih.gov/books/NBK556114/ [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The public datasets generated or analyzed during the present study are available in the TCGA database (https://portal.gdc.cancer.gov/); the TCIA datasets(https://www.cancerimagingarchive.net/). Data related to the patients of The First People’s Hospital of Foshan in the study is available from corresponding author by request.





