Skip to main content
Radiology: Imaging Cancer logoLink to Radiology: Imaging Cancer
. 2025 Jul 18;7(4):e240312. doi: 10.1148/rycan.240312

Radiomics-based Machine Learning Prediction of Neoadjuvant Chemotherapy Response in Breast Cancer Using Physiologically Decomposed Diffusion-weighted MRI

Maya Gilad 1, Savannah C Partridge 2,3, Mami Iima 4,5, Rebecca Rakow-Penner MD 6,7, Moti Freiman 1,8,
PMCID: PMC12304534  PMID: 40679371

Abstract

Purpose

To evaluate the performance of a machine learning model developed using radiomics data derived from physiologically decomposed diffusion-weighted MRI data for predicting pathologic complete response (pCR) following neoadjuvant chemotherapy for breast cancer compared with baseline and benchmark models.

Materials and Methods

This retrospective study included data from the Breast Multiparametric MRI for prediction of neoadjuvant chemotherapy Response (BMMR2) challenge dataset, comprising longitudinal multiparametric breast MRI studies (diffusion-weighted imaging [DWI] and dynamic contrast-enhanced MRI) from participants enrolled in the I-SPY 2/ACRIN 6698 trial (ClinicalTrials.gov: NCT01042379). Piecewise linear physiologic decomposition was applied to DWI data (PD DWI) to isolate pseudo-diffusion, pure-diffusion, and pseudo-diffusion fraction components for radiomics feature extraction. These features were used to develop a boosted decision tree model to predict pCR following neoadjuvant chemotherapy. Model performance was compared with performance of baseline models, including data on tumor size and mean apparent diffusion coefficient, and the BMMR2 challenge benchmark model using area under the receiver operating characteristic curve, F1 score, and positive and negative predictive values. Model calibration was assessed via the Brier score, and a decision curve analysis was performed to estimate the potential reduction in unnecessary interventions when using the proposed model.

Results

The study included multiparametric MRI scans from 190 female participants (mean age ± SD, 48.4 years ± 10.5). PD DWI achieved the highest area under the receiver operating characteristic curve (0.89, 95% CI: 0.81, 0.96) among all evaluated models, demonstrating statistically significant improvements over baseline approaches (all P < .04). Decision curve analysis showed that the PD DWI model provided a greater net benefit compared with the BMMR2 challenge benchmark model (0.17, 95% CI: 0.13, 0.21 vs 0.09, 95% CI: 0.05, 0.13; P < .001).

Conclusion

A machine learning model using radiomics data derived from PD DWI achieved higher performance than baseline and benchmark models in predicting pCR following neoadjuvant chemotherapy for breast cancer.

Keywords: Image Postprocessing, MR-Diffusion Weighted Imaging, Breast, Tumor Response, Experimental Investigations

ClinicalTrials.gov: NCT01042379

© RSNA, 2025

Keywords: Image Postprocessing, MR-Diffusion Weighted Imaging, Breast, Tumor Response, Experimental Investigations


graphic file with name rycan.240312.VA.jpg


Summary

A machine learning model including radiomics features extracted from physiologically decomposed diffusion-weighted MRI data achieved higher performance in predicting pathologic complete response following neoadjuvant chemotherapy for breast cancer compared with baseline and benchmark models.

Key Points

  • ■ A machine learning model including radiomics features extracted from physiologically decomposed diffusion-weighted MRI (PD DWI) data achieved an area under the receiver operating characteristic curve (AUC) of 0.89 (95% CI: 0.81, 0.96) for predicting pathologic complete response following neoadjuvant chemotherapy in individuals with breast cancer, outperforming standard apparent diffusion coefficient–based methods and the top performing model from the Breast Multiparametric MRI for prediction of neoadjuvant chemotherapy Response (BMMR2) challenge (AUC, 0.84; 95% CI: 0.75, 0.93).

  • ■ Decision curve analysis revealed a greater net benefit for the PD DWI model compared with the BMMR2 challenge benchmark model (0.17, 95% CI: 0.13, 0.21 vs 0.09, 95% CI: 0.05, 0.13; P < .001), demonstrating its potential to support more personalized and effective treatment strategies while leveraging noncontrast MRI.

Introduction

Breast cancer is the most diagnosed cancer among women worldwide, accounting for 23.8% of all female cancer diagnoses in 2022, and remains a leading cause of cancer-related mortality, responsible for 15.4% of deaths in the same year (13). Neoadjuvant chemotherapy (NAC) is a critical treatment but is not universally effective (2,4). Assessing the efficacy of NAC currently requires invasive procedures and lengthy treatment periods, increasing the risk of delayed intervention for nonresponders. Early and accurate prediction of pathologic complete response (pCR), indicating the absence of invasive cancer after treatment, could substantially improve clinical outcomes, enabling timely treatment adjustments and optimizing surgical planning (4,5).

Multiparametric MRI, which integrates dynamic contrast-enhanced (DCE) MRI and diffusion-weighted imaging (DWI), offers superior diagnostic precision compared with conventional MRI (68), due to enhanced sensitivity to tissue microstructural changes, which is critical for evaluating the effects of NAC (5,912). Although DCE MRI remains widely used for NAC monitoring, the value of DWI has become increasingly recognized for its ability to provide key complementary insights into cellular environments (6,1315). With the additional advantage of not requiring contrast agents, DWI has gained strong academic interest, leading to the American College of Radiology Imaging Network (ACRIN) 6698/Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis 2 (I-SPY2) multicenter study focused on the potential of DWI to predict pCR after NAC (5).

Early analyses of data from the ACRIN 6698/I-SPY2 multicenter trial primarily used a single-compartment monoexponential decay model with multiple b values to estimate overall tumor apparent diffusion coefficient (ADC) (5). Subsequent studies showed that different b value combinations did not improve the predictive performance of mean tumor ADC for assessing response to NAC (16). Recently, the Breast Multiparametric MRI for prediction of NAC Response (BMMR2) computational challenge explored the use of multiparametric MRI data for post-NAC pCR prediction (17); however, models developed in this effort focused solely on overall ADC maps, overlooking the distinct physiologic information contained within individual components of the DWI signal.

This study aimed to evaluate the performance of a machine learning model developed using radiomics data derived from physiologically decomposed diffusion-weighted MRI (PD DWI) data for predicting pCR following NAC for breast cancer, as compared with baseline and benchmark models.

Materials and Methods

Data

This secondary retrospective analysis, conducted in compliance with the Health Insurance Portability and Accountability Act, used data from the publicly available BMMR2 challenge dataset. The dataset was derived from the ACRIN 6698/I-SPY2 multicenter trial (ClinicalTrials.gov identifier: NCT01042379) (17). This dataset includes participants with invasive breast cancer who were enrolled in the trial before receiving NAC treatment. The study evaluated several NAC treatment protocols. Participants were randomized into one of the tested protocols, which included either paclitaxel alone or paclitaxel in combination with an investigational new drug. Following the taxol-based regimen, all participants received standard chemotherapy with doxorubicin and cyclophosphamide (17).

Participants underwent longitudinal multiparametric MRI scans at four key points during NAC treatment. Specifically, MRI scans were performed before the treatment (pre-NAC), 3 weeks after the commencement of chemotherapy (early-NAC), at approximately the 12-week mark, coinciding with a shift in chemotherapy regimens (mid-NAC), and finally, after the completion of all chemotherapy cycles, immediately preceding surgery (5,18).

The BMMR2 challenge dataset is a subset of MRI studies from the ACRIN 6698/I-SPY2 multicenter trial, selected based on the availability of analyzable DWI data extending up to the midtreatment time point. Consequently, the dataset includes 191 participants with breast cancer divided into a stratified training set (60% [117 of 191]) and a test set (40% [74 of 191]), as previously reported (16). In this study, one participant was excluded from the training set due to incomplete DWI data, resulting in a dataset of 190 participants. Figure 1 illustrates the dataset split into training, validation, and test sets as used in our study.

Figure 1:

Flowchart of the PD DWI cohort showing participant inclusion, exclusion, and five-fold validation setup.

PD DWI cohort diagram. The dataset incorporated all BMMR2 participants, excluding one from the training set due to missing DWI data. For model validation, the training set was divided into five folds. BMMR2 = Breast Multiparametric MRI for prediction of NAC Response, DWI = diffusion-weighted imaging, NAC = neoadjuvant chemotherapy, PD DWI = physiologically decomposed DWI.

Imaging Protocol

Each imaging session included a multi–b value DWI sequence capturing images at four diffusion levels (0, 100, 600, and 800 sec/mm2), typically performed as a single series acquisition. Following the administration of a contrast agent, DCE MRI was also performed, with both these scans included in the BMMR2 dataset for comprehensive analysis (19).

Image Analysis

For the primary ACRIN 6698/I-SPY2 trial, analysis of DCE MRI data involved tumor segmentation and functional tumor volume calculation using percentage enhancement and signal enhancement ratio mapping techniques (5,17). Additionally, manual segmentation of DWI sequences was performed, using the DCE MRI segmentation as a reference to ensure accuracy and consistency in the analysis process. These tumor segmentations were provided along with the imaging data in the BMMR2 challenge dataset.

We first calculated the overall ADC (ADC0–800) map using the monoexponential decay model:

S(b)=S0×eb×ADC,

where S(b) represents the signal intensity at a given diffusion weighting (b), and S0 is the signal intensity without diffusion weighting, using the full range of available b values (0–800 sec/mm2).

To further decompose the DWI signal into physiologically relevant components, we generated two additional ADC maps: (a) ADC0-100: Calculated using low b values (0–100 sec/mm2). This map highlights pseudo-diffusion behavior, providing an indirect indicator of microcirculation. (b) ADC100−800: Derived from higher b values (100–800 sec/mm2). This map captures hindered and restricted diffusion, which is associated with cellular density.

We assessed the impact of pseudo-diffusion on the overall DWI signal by calculating the “F” map, which represents the relative contribution of pseudo-diffusion. This was defined as the ratio of ADC0–800 to ADC100−800 (Fig 2). By using this approach, we were able to distinguish pseudo-diffusion effects within the DWI signal based on the four available b values (Fig 3).

Figure 2:

Graphs illustrate pre-NAC DWI signal decay patterns for participants with and without pCR, showing different ADC model fits.

Graphs show pre-NAC DWI signal decay as a function of the b value for a participant with pCR (A) and a participant who did not achieve pCR (B). Black circles represent intensity levels at various b values. The solid black line represents the overall original ADC, as supplied by the challenge organizers, and the dashed lines illustrate additional ADC models derived as part of the physiologic decomposing of DWI signal. The bold gray vertical square signifies the pseudo-diffusion fraction, calculated from the ratio of ADC0-800 to ADC100-800. ADC = apparent diffusion coefficient, DWI = diffusion-weighted imaging, NAC = neoadjuvant chemotherapy, pCR = pathologic complete response.

Figure 3:

DWI feature maps compare tumor characteristics before NAC in participants with and without pCR.

DWI physiologic decomposition. The first row depicts feature maps in a 46-year-old female participant with breast cancer who did not achieve a pathologic complete response (non-pCR) after neoadjuvant chemotherapy, and the second row shows a 35-year-old female participant with breast cancer who achieved pCR by the end of treatment. All feature maps are derived from the pre-NAC imaging sessions. For each DWI sequence, three ADC maps are generated and used to compute the F map. The segmented tumor region of interest, used for radiomics feature extraction, is outlined with a solid yellow line. ADC = apparent diffusion coefficient, DWI = diffusion-weighted imaging, NAC = neoadjuvant chemotherapy.

PD DWI Model Development

Data representation

We analyzed DWI and DCE MRI sequences from the BMMR2 dataset at the three treatment phases: pre-NAC, early-NAC, and mid-NAC. Corresponding tumor segmentations, clinical data, and pCR outcomes were also included. The clinical data underwent standard cleaning procedures, and in one case in which tumor grade information was missing, it was replaced with the most common value (mode). All evaluated models incorporated clinical biomarkers.

Physiologically decomposed models

To assess the impact of physiologic decomposition in DWI, we developed different models based on extracted features from all time points: (a) PD DWI model used features derived from ADC100−800 and the F map, which highlight pseudo-diffusion and microcirculation effects; (b) multiparametric MRI model used ADC0−800 and signal enhancement ratio maps, capturing both diffusion and perfusion characteristics; and (c) physiologically decomposed model variants created four additional model variations, incorporating either ADC or F maps for feature extraction.

To enhance early pCR prediction, we adjusted both models to integrate imaging data from initial treatment phases, resulting in additional models for each feature set. These modifications allowed us to evaluate the impact of early treatment dynamics on prediction performance. Table 1 provides an overview of the developed models and the data incorporated in each.

Table 1:

Comparison of the PD DWI Model against Five Alternatives Highlighting Diverse Radiomics Feature Utilization

Model SER ADC0–100 ADC100–800 ADC0–800
mpMRI Included Not included Not included Included
ADC0–100-only Not included Included Not included Not included
ADC100–800-only Not included Not included Included Not included
ADC0–800-only Not included Not included Not included Included
F-only Not included Not included Included Included
PD DWI Not included Included Included Included

Note.—A comparative assessment of the PD DWI model alongside five alternative models, each characterized by their reliance on distinct sets of radiomics features extracted from varying feature maps. The mpMRI was the only model incorporating an extra modality, merging DCE MRI with DWI. Models labeled ADC-only were each based on a single ADC map. The F-only model used a ratio derived from ADC100-800 to ADC0-800 maps. ADC = apparent diffusion coefficient, DCE = dynamic contrast enhancement, DWI = diffusion-weighted imaging, mpMRI = multiparametric MRI, SER = signal enhancement ratio.

Feature extraction and selection

We extracted 120 radiomics features from the segmented tumor regions across the various parametric maps derived from DWI and DCE MRI. The features included: first-order statistics (eg, intensity distribution), morphologic features (eg, tumor shape and size), and texture-based features (eg, spatial patterns within the tumor). Feature extraction was performed using the PyRadiomics version 3.0.1 software package (20). To identify the most relevant features, we calculated the F value for each feature based on its association with pCR outcomes. We then selected the top 100 features with the highest F values (21) for model training.

Classifier training

We trained an XGBoost classifier (22) while optimizing key parameters—including min-child-weight, max-depth, and subsample settings—to prevent overfitting and improve model robustness. The optimal training parameters were selected through fivefold stratified cross-validation, following the data split defined in the BMMR2 challenge (17). The final parameters were chosen based on the highest mean area under the receiver operating characteristic curve (AUC), and the model was then retrained on the entire training set using these optimized parameters.

Statistical Analysis

We developed two baseline models using logistic regression, both using data from mid-NAC only: one with the longest diameter as an indicator of lesion size (LRsize) and another incorporating both the longest diameter and ADC0–800 (LRsize+ADC). Additionally, we reimplemented the BMMR2 challenge baseline (17), which integrates hormone receptor and human epidermal growth factor receptor 2 status, mean ADC at pre-NAC and mid-NAC, and longest diameter at pre-NAC (LRsize+ADC+hrher4g).

We analyzed the predictive performance of the PD DWI and baseline models with the AUC and 95% CI as the primary metric. We compared the models’ receiver operating characteristic curves against those of the three baseline models using the DeLong test (23) and evaluated the models’ calibration with the Brier score after applying sigmoid smoothing (24). We retrospectively assessed the model’s potential net benefit in clinical decision-making by conducting decision curve analysis, evaluating its ability to guide treatment adjustments at mid-NAC and reduce unnecessary interventions (25).

We applied an adjusted Youden index (26) with a sensitivity-specificity balance of 0.4–0.6 to determine optimal cutoffs, prioritizing specificity over sensitivity to effectively identify nonresponders while minimizing the misclassification of responders. We then calculated sensitivity, specificity, F1 score, positive predictive value, and negative predictive value to evaluate model performance. We used the McNemar test to analyze statistical significance of the differences in the models’ predictive performance, based on a contingency table of classification correctness (ie, correct vs incorrect predictions by each model). All statistical tests were two-sided, with a significance level of .05. Statistical analyses were conducted using Python (version 3.12.9, https://www.python.org/).

Results

Study Sample

Demographic and clinicopathologic characteristics of the study sample are shown in Table 2. The study included 190 female participants with a mean age ± SD of 48.4 years ± 10.5. Participants were categorized based on hormone receptor and human epidermal growth factor receptor 2 subtype, lesion type, Scarff-Bloom-Richardson grade, and pCR outcome. The majority of participants were hormone receptor positive and human epidermal growth factor receptor 2 negative (44% [83 of 190]) and had multiple masses (51% [97 of 190]). A high Scarff-Bloom-Richardson grade was most common and observed in 69% (131 of 190) of participants. There was no evidence of a difference in age distribution between the training (n = 116) and test (n = 74) sets (t test, P = .77). The pCR outcome was achieved in 31% (59 of 190) of the participants, consistent across both the training and test sets. One participant was excluded from the original BMMR2 training set due to missing DWI sequences.

Table 2:

Summary of Demographics Data in BMMR2 Challenge Cohort

Characteristic All (n = 190) Training Set (n = 116)* Test Set (n = 74)
Age (y) 48.4 ± 10.5 49.0 ± 11.3 48.6 ± 9.4
Sex
 Female 190 (100) 116 (100) 74 (100)
 Male 0 (0) 0 (0) 0 (0)
HR and HER2 subtypes
 HR− and HER2− 59 (31) 36 (31) 23 (31)
 HR+ and HER2− 83 (44) 50 (43) 33 (45)
 HR− and HER2+ 15 (8) 10 (9) 5 (7)
 HR+ and HER2+ 33 (17) 20 (17) 13 (18)
Lesion type
 Single mass 73 (38) 43 (37) 30 (41)
 Single NME 9 (5) 4 (3) 5 (7)
 Multiple masses 97 (51) 64 (55) 33 (45)
 Multiple NME 11 (6) 5 (4) 6 (8)
SBR grade
 Low 5 (3) 3 (3) 2 (3)
 Intermediate 53 (28) 36 (31) 17 (23)
 High 131 (69) 76 (66) 55 (74)
 NA 1 (1) 1 (1) 0 (0)
pCR outcome
 Non-pCR 131 (69) 80 (69) 51 (69)
 pCR 59 (31) 36 (31) 23 (31)

Note.—Data are presented as means ± SDs or frequencies with percentages in parentheses. BMMR2 = Breast Multiparametric MRI for prediction of neoadjuvant chemotherapy Response, HER2 = human epidermal growth factor receptor 2, HR = hormone receptor, NME = nonmass enhancement, pCR = pathologic complete response, SBR = Scarff-Bloom-Richardson.

*

The original training set of BMMR2 included 117 participants; here we excluded one participant with some missing diffusion-weighted imaging sequences.

Feature Importance Analysis

Figure 4 presents the feature importance analysis of the PD DWI model, highlighting the contribution of different input features. The results show that the model relied on features from both ADC0–100 and F maps. Furthermore, the model heavily relies on features from both early-NAC and mid-NAC.

Figure 4:

Bar graph highlights key predictive features from PD-DWI, with F map and ADC0–100 radiomics showing highest model impact.

Bar graph shows PD DWI predictive features. The model’s performance is primarily enhanced by radiomics features derived from the F map and ADC0–100, indicated by average gain. The HER2 clinical indicator also positively impacts the model, albeit to a lesser extent than the radiomics features. ADC = apparent diffusion coefficient, GLCM = gray-level co-occurrence matrix, GLDM = gray-level dependence matrix, GLRLM = gray-level run length matrix, HER2 = human epidermal growth factor receptor 2, NAC = neoadjuvant chemotherapy, PD DWI = physiologically decomposed diffusion-weighted imaging, 3D = three-dimensional.

Predictive Performance of PD DWI

Table 3 summarizes the models’ predictive performance. The LRsize, LRADC, and the LRsize+ADC models had the lowest AUC scores ranging from 0.64 to 0.70. The BMMR2 baseline model LRsize+ADC+hrher4g achieved a higher predictive performance (AUC, 0.78; 95% CI: 0.67, 0.89). However, the difference was not statistically significant (DeLong test, P = .11, .13, and .34, respectively).

Table 3:

Comparison of Model Predictive Performance

Model AUC (95% CI) Sensitivity (%) Specificity (%) F1 NPV (%) PPV (%) Brier
LRsize 0.65 (0.51, 0.78) 52 (12 of 23) 75 (38 of 51) 0.5 78 (38 of 49) 48 (12 of 25) 0.2
LRADC 0.64 (0.50, 0.78) 57 (13 of 23) 73 (37 of 51) 0.52 79 (37 of 47) 49 (13 of 27) 0.2
LRsize+ADC 0.7 (0.58, 0.83) 83 (19 of 23) 59 (30 of 51) 88 (30 of 34) 48 (19 of 34) 0.19
LRsize+ADC+hrher4g 0.78 (0.67, 0.89) 48 (11 of 23) 9 (46 of 51) 0.56 79 (46 of 58) 69 (11 of 16) 0.17
mpMRI 0.81 (0.7, 0.91) 65 (15 of 23) 86 (44 of 51) 0.67 85 (44 of 52) 68 (15 of 22) 0.16
ADC0–100 0.86 (0.77, 0.95) 78 (18 of 23) 86 (44 of 51) 0.75 9 (44 of 49) 7 (18 of 25) 0.13
ADC100–800 0.85 (0.75, 0.94) 7 (16 of 23) 92 (47 of 51) 0.74 87 (47 of 54) 80 (16 of 20) 0.15
F 0.84 (0.75, 0.94) 78 (18 of 23) 82 (42 of 51) 0.72 89 (42 of 47) 67 (18 of 27) 0.14
ADC0–800 0.88 (0.8, 0.96) 61 (14 of 23) 96 (49 of 51) 0.72 85 (49 of 58) 88 (14 of 16) 0.13
PD DWI 0.89 (0.81, 0.96) 74 (17 of 23) 88 (45 of 51) 0.74 88 (45 of 51) 74 (17 of 23) 0.13

Note.—All models underwent evaluation using the BMMR2 test set, not seen during training. Cutoff values were selected by maximizing the adjusted Youden index (27), which is calculated by using the following formula: w × Sensitivity + [1 – w] × Specificity – 1, where w equals 0.4, and was used for measuring the sensitivity, specificity, F1, PPV, and NPV scores. ADC = apparent diffusion coefficient, AUC = area under the receiver operating characteristic curve, BMMR2 = Breast Multiparametric MRI for prediction of neoadjuvant chemotherapy Response, hrher4g = hormone receptor human epidermal growth factor receptor 2 4 groups, LR = logistic regression, mpMRI = multiparametric MRI, NPV = negative predictive value, PD DWI = physiologically decomposed diffusion-weighted imaging, PPV = positive predictive value.

All the XGBoost-based models outperformed the baseline logistic regression models, with PD DWI achieving the highest AUC (0.89; 95% CI: 0.81, 0.96). The improvement over LRsize, LRADC, and LRsize+ADC was statistically significant (DeLong test: PD DWI, all P < .02). The PD DWI also had the lowest Brier score at 0.13, indicating better calibration and discrimination.

The PD DWI model achieved the highest F1 score at 0.74 and negative predictive value (88% [45 of 51]) and the third-best sensitivity (74% [17 of 23]), which was surpassed by LRsize+ADC (83% [19 of 23]), which exhibited poor balance between sensitivity and specificity (F1 = 0.6, positive predictive value = 48% [19 of 34]). The improvements in the PD DWI predictive performance over the LRsize, LRADC, and LRsize+ADC models were statistically significant (McNemar test, P < .04).

Compared with the BMMR2 challenge results (21), the PD DWI model outperformed the highest reported AUC of 0.84. Although the top performing teams in the BMMR2 challenge (21) (AUC, 0.84; 95% CI: 0.75, 0.93; AUC, 0.84; 95% CI: 0.75, 0.93; and AUC, 0.81; 95% CI: 0.70, 0.90) demonstrated improved performance over the BMMR2 baseline (AUC, 0.78; 95% CI: 0.67, 0.90), the differences were not statistically significant (DeLong test, P = .30, .36, and .70). Similarly, the PD DWI model exceeded the baseline with a smaller P value (P = .13), though it did not reach statistical significance.

Clinical Utility of PD DWI

Figure 5 presents the decision curve analysis, estimating the net reduction in unnecessary interventions when applying the model in clinical decision-making. The evaluation compared PD DWI’s ability to detect nonresponders against the LRsize+ADC+hrher4g, multiparametric MRI, and ADC0–800 models. Decision curve analysis at the respective optimal cutoff thresholds showed that PD DWI (cutoff, 0.42, sensitivity, 74% [17 of 23]) and ADC0–800 (cutoff, 0.63, sensitivity, 61% [14 of 23]) provided a greater net benefit (0.17; 95% CI: 0.13, 0.21 and 0.12; 95% CI: 0.08, 0.17, respectively) compared with LRsize+ADC+hrher4g (cutoff, 0.41, sensitivity, 48% [11 of 23], net benefit, 0.09; 95% CI: 0.05, 0.13). The difference was statistically significant (P < .001). However, ADC0–800 had a higher false-negative rate (40%) compared with PD DWI (26%) and multiparametric MRI (35%).

Figure 5:

Decision curve and ROC analysis compare the clinical utility and performance of PD-DWI and other models in predicting pCR.

Graphs show clinical utility comparison using decision curve analysis and ROC curves. A comparative analysis of PD DWI, ADC0–800, LRsize+ADC+hrher4g, and mpMRI models using the BMMR2 testing set: (A) The net benefit of each model as a function of selected cutoff. Lower thresholds result in higher sensitivity and larger net benefit (bounded by pCR prevalence). (B) The net reduction in interventions as a function of selected cutoff. (C) ROC curves of AUC at mid-NAC. AUCs and 95% CIs are given in the legend. ADC = apparent diffusion coefficient, AUC = area under the ROC curve, BMMR2 = Breast Multiparametric MRI for prediction of NAC Response, LR = logistic regression, mp = multiparametric, NAC = neoadjuvant chemotherapy, pCR = pathologic complete response, PD DWI = physiologically decomposed diffusion-weighted imaging, ROC = receiver operating characteristic.

Discussion

In this study, we demonstrated that radiomics features derived from physiologically decomposed ADC maps serve as a noninvasive, quantitative imaging biomarker for predicting pCR in participants undergoing NAC for invasive breast cancer. By incorporating low b value ADC maps and pseudo-diffusion fraction maps, our method achieves a higher predictive performance compared with conventional ADC-based approaches (AUC, 0.89 vs 0.64; P < .003), outperforming both baseline models (AUC, 0.78 and 0.81; P > .05) and previously reported techniques (AUC ≈ 0.60) throughout treatment reported on the ACRIN 6698 trial (5).

Our physiologically decomposed approach leverages piecewise linear decomposition of DWI and boosted decision trees to integrate radiomics and clinical features. Whereas ACRIN 6698 primarily focused on percentage changes in ADC, our method isolates pseudo-diffusion and pure-diffusion components, potentially capturing additional microstructural details. Moreover, we incorporated longitudinal imaging time points and machine learning–based feature selection, which may further explain the improved performance.

Our results show that pseudo-diffusion has a notable contribution when predicting pCR outcome. The results support theories that exclusive reliance on the ADC map might not capture all the necessary details for precise pCR prediction from DWI (9). Moreover, our results indicate that a PD DWI model can be used in the early stages of NAC treatment; thus, it paves the way for personalized and effective breast cancer treatment strategies. Our results also build upon earlier studies investigating decomposed DWI signals, including pseudo-diffusion, restricted and/or hindered diffusion, and their relative contributions via intravoxel incoherent motion analysis (2731). Although not all research identified pseudo-diffusion fraction as a definitive predictor, these studies collectively highlight the importance of signal decomposition for extracting meaningful biomarkers from DWI data.

In contrast to Partridge et al (16) and BMMR2 challenge participants (17), who reported limited benefits from mean ADC values derived from multiple b values, our study shows that physiologically decomposed ADC maps add substantial predictive value. The PD DWI model not only surpassed ADC-only approaches but also outperformed the leading BMMR2 challenge models, which depended on DCE MRI alone or a combination of DWI and DCE MRI. Notably, pseudo-diffusion and pseudo-diffusion fraction biomarkers from DWI offered stronger predictive indicators throughout NAC than those derived from DCE MRI, emphasizing the potential of decomposed DWI for more accurate response assessment. From the feature importance analysis (Fig 4), we observed that early-NAC and mid-NAC scans contributed the most predictive features, whereas pre-NAC features played a less prominent role. Consistent with this, our model predicts end-of-NAC pCR status by mid-NAC, likely capturing tumor changes already underway rather than purely forecasting future response. These findings indicate that PD DWI is more effective for ongoing NAC monitoring than for pretreatment prognosis, though pre-NAC imaging remains essential for baseline assessment and treatment planning. Overall, our results underscore the importance of longitudinal imaging in guiding therapy adjustments.

Our study also highlights PD DWI’s potential as a noninvasive, quantitative tool for tracking NAC response. Although further external testing is warranted, the model’s higher predictive accuracy (AUC, 0.89 vs 0.64; P < .003), clinical net benefit (0.17; 95% CI: 0.13, 0.21 vs 0.09; 95% CI: 0.05, 0.13; P < .001), and practical advantages over contrast-enhanced imaging suggest that DWI-based approaches could play a key role in personalized breast cancer treatment strategies including unneeded treatment adjustments for responsive patients.

Nonetheless, incorporating XGBoost-based machine learning, which integrates radiomics and clinical data to enhance prediction, presents several challenges. First, standardization of DWI protocols is necessary, given that this study was conducted on ACRIN 6698/I-SPY2 data with strict acquisition parameters, limiting generalizability. Second, the scarcity of publicly accessible multiparametric breast MRI datasets restrict broader validation across diverse imaging protocols. Finally, although DWI is noncontrast, cost-effective, and involves shorter scan times than DCE MRI, implementing automated feature extraction and machine learning predictions in clinical practice requires additional validation and infrastructure.

Our study had some limitations. First, ADC tumor segmentations were provided by the BMMR2 challenge organizers and were generated using DCE MRI segmentation as a reference, which implies that, despite our model not using DCE MRI radiomics features, DCE MRI data were still needed for the segmentation process. Second, the limited range of b values available from the ACRIN 6698/I-SPY2 trial constrained our capacity to compare the method with more advanced models like intravoxel incoherent motion, or to extensively explore the effects of different b value ranges on pCR prediction. Third, the strictly controlled setting of ACRIN 6698/I-SPY2 limits the generalizability of our model, and the absence of large, publicly accessible multiparametric breast MRI datasets further impede external testing. Such validation across diverse patient populations and imaging protocols is essential before applying this approach in routine clinical practice.

In conclusion, our findings demonstrate that radiomics features derived from PD DWI achieve high performance in predicting pCR by mid-NAC, potentially enabling more personalized therapy. The results suggest that DWI data alone can be sufficient for effective pCR prediction, potentially reducing the need for DCE MRI, which is more time-consuming and requires contrast material administration. Finally, the study shows that decomposing the DWI signal into physiologic components, even with a limited number of b values, enhances the extracted information, potentially improving performance of machine learning models applied to such data in clinical settings. Future studies should aim to validate these findings in larger, multicenter cohorts and explore the extension of this approach to other clinically relevant tasks, such as predicting axillary lymph node involvement, further leveraging the potential of DWI-based models in comprehensive breast cancer management.

Funding: M.G. and M.F. supported by Israel Innovation Authority, Ministry of Science and Technology (MOST), Israel. R.R.P. supported by Krueger V. Wyeth Fund. S.C.P. supported by NIH/NCI grants R01CA207290 and R01CA190299. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data sharing: Data generated by the authors or analyzed during the study are available at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=50135447.

Disclosures of conflicts of interest: M.G. No relevant relationships. S.C.P. Research grants paid to institution from GE HealthCare, Guerbet, Microsoft AI for Good, Sloan Precision Oncology Institute; breast MRI book royalties from Elsevier; consulting fee from Guerbet; honoraria from the Global Breast Cancer Conference; travel reimbursement from the Global Breast Cancer Conference; in-kind research support to institution from Philips Healthcare, Microsoft AI for Good, GE HealthCare; and associate editor of Radiology: Imaging Cancer. M.I. No relevant relationships. R.R.P. Grants from Curebound and GE HealthCare; consulting fees from Human Longevity and Bayer; payments from Efficiency Learning Systems Educational Symposia; participation on a Data Safety Monitoring Board or Advisory Board for Cortechs AI and Imagine Scientific; leadership or fiduciary role for Academy of Radiology Research and RSNA Research and Development Committee; stock options from Cortechs AI and Curemetrix. M.F. Research grants from AbbVie, Helmsley Charitable Trust, ERA-CVD (EU Horizon 2020), the Israel-USA Binational Science Foundation, Israel Innovation Authority, and Israel Ministry of Science and Technology.

Abbreviations:

ACRIN
American College of Radiology Imaging Network
ADC
apparent diffusion coefficient
AUC
area under the receiver operating characteristic curve
BMMR2
Breast Multiparametric MRI for prediction of NAC Response
DCE
dynamic contrast-enhanced
DWI
diffusion-weighted imaging
I-SPY2
Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis 2
NAC
neoadjuvant chemotherapy
pCR
pathologic complete response
PD DWI
physiologically decomposed DWI

References

  • 1. Banaie M , Soltanian-Zadeh H , Saligheh-Rad HR , Gity M . Spatiotemporal features of DCE-MRI for breast cancer diagnosis . Comput Methods Programs Biomed 2018. ; 155 : 153 – 164 . [DOI] [PubMed] [Google Scholar]
  • 2. Bhushan A , Gonsalves A , Menon JU . Current state of breast cancer diagnosis, treatment, and theranostics . Pharmaceutics 2021. ; 13 ( 5 ): 723 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Bray F , Laversanne M , Sung H , et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries . CA Cancer J Clin 2024. ; 74 ( 3 ): 229 – 263 . [DOI] [PubMed] [Google Scholar]
  • 4. Song D , Man X , Jin M , Li Q , Wang H , Du Y . A Decision-Making Supporting Prediction Method for Breast Cancer Neoadjuvant Chemotherapy . Front Oncol 2020. ; 10 : 592556 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Partridge SC , Zhang Z , Newitt DC , et al. ; ACRIN 6698 Trial Team and I-SPY 2 Trial Investigators . Diffusion-weighted MRI findings predict pathologic response in neoadjuvant treatment of breast cancer: The ACRIN 6698 multicenter trial . Radiology 2018. ; 289 ( 3 ): 618 – 627 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Gullo RL , Partridge SC , Shin HJ , Thakur SB , Pinker K . Update on DWI for Breast Cancer Diagnosis and Treatment Monitoring . AJR Am J Roentgenol 2024. ; 222 ( 1 ): e2329933 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Leithner D , Wengert GJ , Helbich TH , et al. Clinical role of breast MRI now and going forward . Clin Radiol 2018. ; 73 ( 8 ): 700 – 714 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Kim SH , Shin HJ , Shin KC , et al. Diagnostic Performance of Fused Diffusion-Weighted Imaging Using T1-Weighted Imaging for Axillary Nodal Staging in Patients With Early Breast Cancer . Clin Breast Cancer 2017. ; 17 ( 2 ): 154 – 163 . [DOI] [PubMed] [Google Scholar]
  • 9. Baltzer P , Mann RM , Iima M , et al. ; EUSOBI international Breast Diffusion-Weighted Imaging working group . Diffusion-weighted imaging of the breast-a consensus and mission statement from the EUSOBI International Breast Diffusion-Weighted Imaging working group . Eur Radiol 2020. ; 30 ( 3 ): 1436 – 1450 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Gao W , Guo N , Dong T . Diffusion-weighted imaging in monitoring the pathological response to neoadjuvant chemotherapy in patients with breast cancer: A meta-analysis . World J Surg Oncol 2018. ; 16 ( 1 ): 145 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Liang J , Zeng S , Li Z , et al. Intravoxel Incoherent Motion Diffusion-Weighted Imaging for Quantitative Differentiation of Breast Tumors: A Meta-Analysis . Front Oncol 2020. ; 10 : 585486 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Zhang M , Horvat JV , Bernard-Davila B , et al. Multiparametric MRI model with dynamic contrast-enhanced and diffusion-weighted imaging enables breast cancer diagnosis with high accuracy . J Magn Reson Imaging 2019. ; 49 ( 3 ): 864 – 874 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Parekh VS , Jacobs MA . Multiparametric radiomics methods for breast cancer tissue characterization using radiological imaging . Breast Cancer Res Treat 2020. ; 180 ( 2 ): 407 – 421 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Partridge SC , Nissan N , Rahbar H , Kitsch AE , Sigmund EE . Diffusion-weighted breast MRI: Clinical applications and emerging techniques: Diffusion-Weighted Breast MRI . J Magn Reson Imaging 2017. ; 45 ( 2 ): 337 – 355 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Woodhams R , Matsunaga K , Kan S , et al. ADC mapping of benign and malignant breast tumors . Magn Reson Med Sci 2005. ; 4 ( 1 ): 35 – 42 . [DOI] [PubMed] [Google Scholar]
  • 16. Partridge SC , Steingrimsson J , Newitt DC , et al. Impact of Alternate b-Value Combinations and Metrics on the Predictive Performance and Repeatability of Diffusion-Weighted MRI in Breast Cancer Treatment: Results from the ECOG-ACRIN A6698 Trial . Tomography 2022. ; 8 ( 2 ): 701 – 717 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Li W , Partridge SC , Newitt DC , et al. Breast Multi-parametric MRI for Prediction of Neoadjuvant Chemotherapy Response in Breast Cancer: The BMMR2 Challenge . Radiol Imaging Cancer 2024. ; 6 ( 1 ): e230033 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Clark K , Vendt B , Smith K , et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository . J Digit Imaging 2013. ; 26 ( 6 ): 1045 – 1057 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Newitt DC , Partridge SC , Zhang Z , et al. ACRIN 6698/I-SPY2 Breast DWI . 10.7937/TCIA.KK02-6D95 . 2021. . [DOI]
  • 20. van Griethuysen JJM , Fedorov A , Parmar C , et al. Computational radiomics system to decode the radiographic phenotype . Cancer Res 2017. ; 77 ( 21 ): e104 – e107 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Buitinck L , Louppe G , Blondel M , et al. API design for machine learning software: experiences from the scikit-learn project . arXiv 2013. Preprint posted online September 1, 2013; doi: 10.48550/arxiv.1309.0238 . [DOI]
  • 22. Chen T , Guestrin C . XGBoost: A Scalable Tree Boosting System . In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘16. ACM . 2016. . 10.1145/2939672.2939785 . [DOI] [Google Scholar]
  • 23. DeLong ER , DeLong DM , Clarke-Pearson DL , et al. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach . Biometrics 1988. ; 44 ( 3 ): 837 – 845 . [PubMed] [Google Scholar]
  • 24. Rufibach K . Use of Brier score to assess binary predictions . J Clin Epidemiol 2010. ; 63 ( 8 ): 938 – 939 ; author reply 939 . [DOI] [PubMed] [Google Scholar]
  • 25. Zhao F , Polley E , McClellan J , et al. Predicting pathologic complete response to neoadjuvant chemotherapy in breast cancer using a machine learning approach . Breast Cancer Res 2024. ; 26 ( 1 ): 148 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Youden WJ . Index for rating diagnostic tests . Cancer 1950. ; 3 ( 1 ): 32 – 35 . [DOI] [PubMed] [Google Scholar]
  • 27. Cho GY , Gennaro L , Sutton EJ , et al. Intravoxel incoherent motion (IVIM) histogram biomarkers for prediction of neoadjuvant treatment response in breast cancer patients . Eur J Radiol Open 2017. ; 4 : 101 – 107 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Kim Y , Kim SH , Lee HW , et al. Intravoxel incoherent motion diffusion-weighted MRI for predicting response to neoadjuvant chemotherapy in breast cancer . Magn Reson Imaging 2018. ; 48 : 27 – 33 . [DOI] [PubMed] [Google Scholar]
  • 29. Almutlaq ZM , Wilson DJ , Bacon SE , et al. Evaluation of Monoexponential, Stretched-Exponential and Intravoxel Incoherent Motion MRI Diffusion Models in Early Response Monitoring to Neoadjuvant Chemotherapy in Patients With Breast Cancer-A Preliminary Study . J Magn Reson Imaging 2022. ; 56 ( 4 ): 1079 – 1088 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Cheung SM , Wu WS , Senn N , et al. Towards detection of early response in neoadjuvant chemotherapy of breast cancer using Bayesian intravoxel incoherent motion . Front Oncol 2023. ; 13 : 1277556 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Mendez AM , Fang LK , Meriwether CH , et al. Diffusion Breast MRI: Current Standard and Emerging Techniques . Front Oncol 2022. ; 12 : 844790 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Radiology: Imaging Cancer are provided here courtesy of Radiological Society of North America

RESOURCES