Machine learning with textural analysis of longitudinal multiparametric MRI and molecular subtypes accurately predicts pathologic complete response in patients with invasive breast cancer

Aaquib Syed; Richard Adam; Thomas Ren; Jinyu Lu; Takouhie Maldjian; Tim Q Duong

doi:10.1371/journal.pone.0280320

. 2023 Jan 17;18(1):e0280320. doi: 10.1371/journal.pone.0280320

Machine learning with textural analysis of longitudinal multiparametric MRI and molecular subtypes accurately predicts pathologic complete response in patients with invasive breast cancer

Aaquib Syed ¹, Richard Adam ¹, Thomas Ren ¹, Jinyu Lu ², Takouhie Maldjian ¹, Tim Q Duong ^1,^*

Editor: Marco Giannelli³

PMCID: PMC9844845 PMID: 36649274

Abstract

Purpose

To predict pathological complete response (pCR) after neoadjuvant chemotherapy using extreme gradient boosting (XGBoost) with MRI and non-imaging data at multiple treatment timepoints.

Material and methods

This retrospective study included breast cancer patients (n = 117) who underwent neoadjuvant chemotherapy. Data types used included tumor ADC values, diffusion-weighted and dynamic-contrast-enhanced MRI at three treatment timepoints, and patient demographics and tumor data. GLCM textural analysis was performed on MRI data. An extreme gradient boosting machine learning algorithm was used to predict pCR. Prediction performance was evaluated using the area under the curve (AUC) of the receiver operating curve along with precision and recall.

Results

Prediction using texture features of DWI and DCE images at multiple treatment time points (AUC = 0.871; 95% CI: (0.768, 0.974; p<0.001) and (AUC = 0.903 95% CI: 0.854, 0.952; p<0.001) respectively), outperformed that using mean tumor ADC (AUC = 0.850 (95% CI: 0.764, 0.936; p<0.001)). The AUC using all MRI data was 0.933 (95% CI: 0.836, 1.03; p<0.001). The AUC using non-MRI data was 0.919 (95% CI: 0.848, 0.99; p<0.001). The highest AUC of 0.951 (95% CI: 0.909, 0.993; p<0.001) was achieved with all MRI and all non-MRI data at all time points as inputs.

Conclusion

Using XGBoost on extracted GLCM features and non-imaging data accurately predicts pCR. This early prediction of response can minimize exposure to toxic chemotherapy, allowing regimen modification mid-treatment and ultimately achieving better outcomes.

Introduction

Neoadjuvant chemotherapy (NAC) [1] in the setting of locally advanced breast cancer can reduce tumor size, making breast conservation surgery feasible and obviating the need for mastectomy. Pathological complete response (pCR) is a desirable endpoint of NAC entailing no residual invasive tumor is present at surgery post NAC [2, 3]. Patients with pCR are more likely to be candidates for breast-conserving surgery and to have longer progression-free and overall survival [2, 3]. Therefore, pCR can be used as a surrogate for favorable outcome. The ability to predict pCR prior to treatment would help in determining which patients will benefit from NAC and which will not. Furthermore, predicting pCR can allow for changes in treatment regimens, maximizing the chances of pCR. Thus, the accurate prediction of pCR could help in identifying patients who are likely to respond to specific NAC drugs while enabling oncologists to alter treatments mid-course if needed in order to maximize successful outcomes while minimizing the adverse effects of unnecessary chemotherapy.

Currently, the efficacy of a chemotherapy regimen is tested invasively through core needle biopsy. Clinical biomarkers, such as Ki67, give a limited assessment of the entire tumor as they are obtained by core needle biopsy and therefore may not be representative of the entire tumor. Noninvasive imaging can overcome this problem of tumor heterogeneity, as the entire tumor is depicted on images [4]. In fact, the capacity of pre and post NAC MRI to depict response to treatment and predict pCR has already been demonstrated [5–7].

Machine learning (ML) has been used to predict eventual pCR. Radiomic features (such as volume, sphericity, DCE MRI signal of wash in and wash out) [8–12] and deep learning analysis of whole MR images [13], DCE dynamics [14] that include demographics and molecular receptor subtypes [15, 16] have been used to predict pCR. ML analysis has also been applied to include diffusion MRI [17–20]. However, the role of diffusion MRI to predict pCR is understudied and the results remain controversial [21].

The goal of this study was to apply the extreme gradient boosting (XGBoost) algorithm to predict pCR using multiparametric MRI data along with non-imaging data at multiple treatment timepoints as inputs. Extreme gradient boosting (XGBoost) was chosen due to the relatively small sample size of the dataset, as XGBoost has been shown to be effective even with limited data. Furthermore, XGBoost has been proven to be effective at classifying with tabular data. This approach has the potential to non-invasively identify patients who are likely to respond to neoadjuvant chemotherapy at diagnosis or early treatment. This approach may prove useful for treatment planning, treatment execution, and mid-treatment adjustment to achieve better outcomes.

Materials and methods

Data sources

In this retrospective study, data from the Breast Multiparametric MRI for prediction of NAC Response-2 (BMMR2) competition training dataset, which was curated from the ACRIN-6698 sub trial of the I-SPY 2 TRIAL (NCT Number: NCT01564368), was used to create machine learning models to predict pCR. Patients from the ACRIN-6698 multicenter trial were previously reported in a paper by Partridge et. Al titled “Diffusion-weighted MRI Findings Predict Pathologic Response in Neoadjuvant Treatment of Breast Cancer: The ACRIN 6698 Multicenter Trial”, published in Radiology. Data collected in that trial was published as an open dataset. Our paper utilizes that data (n = 117 patients, of which 36 had pCR). While the previous paper attempted to predict pCR solely using DWI MRI with logistic regression as the model technique, we use extreme gradient boosting with multiple types of MRI data as well as patient demographic data to predict pCR. As the data containing the training and testing sets came from a public dataset, no IRB was required.

All 117 female patients in the dataset were diagnosed with invasive breast cancer and underwent 12 weeks of paclitaxel followed by 4 weeks of anthracycline treatment. Each patient sample contained collections of MRI images taken at 3 distinct timepoints, namely, tp0: pre-treatment, tp1: 3 weeks post paclitaxel (± experimental agent), and tp2: 12 weeks post paclitaxel (± experimental agent).

Non-imaging data included patient age, race, lesion type (one of multiple masses, single mass or non-mass enhancement), hormone receptor status (hormone receptor (HR) positive/negative, human epidermal growth factor receptor 2 (HER2) positive/negative), and Scarff-Bloom-Richardson (SBR) grade. Of the 117 patients, 12 patients had missing data (11 had an unknown race, and 1 had an unknown SBR grade). These values were filled using backward fill, which populates the missing data with the data from the next patient. There were no other missing data (Fig 1).

MRI data, including DWI and DCE MRI, were performed bilaterally in the axial orientation using a 1.5- or 3.0-T field strength magnet and a dedicated breast radiofrequency coil as described in [22]. Acquisition parameters are provided in the data source in our data availability statement. DCE images were aligned with their corresponding rectangular tumor masks with position metadata, which highlighted the tumor and the surrounding area. Textural features were extracted using the 3rd dynamic DCE data for all 3 time points in the rectangular region of interest mask. DWI images were aligned with manually defined binary segmentation data provided in the dataset. Features were calculated in the smallest rectangle that encompassed the segmented tumor using b = 800 s/mm² DWI imaging data. ADC tumor segmentations were carefully aligned with their corresponding ADC map with position metadata. Tumor ADC values, and changes in these values, were extracted for ML analysis. All images were interpolated to 0.7825 mm x 0.7825 mm voxel spacing with a 2.5 mm slice thickness, which was the median spacing and thickness for all patients. When applying machine learning techniques on multicenter data, data harmonization [23] may be necessary. However, we did not perform additional data harmonization as all MRI acquisitions were taken with ISPY2 acquisition requirements as described in [22]. Furthermore, we ensured that data from different field strengths and different molecular subtypes, etc., were not over-represented in either training and testing data sets by using 5-fold cross validation so that all data has the chance to be in both the test and training sets.

Ground truth

pCR determination was done using histopathologic analysis at study sites by institutional pathologists (blinded to functional tumor volume (FTV) and ADC MRI measures) according to the I-SPY 2 trial protocol using the residual cancer burden system. Following U.S. FDA rationale and guidelines, pCR was the reference standard for determining response to neoadjuvant chemotherapy in our study, defined and reported as no residual invasive disease in either breast or axillary lymph nodes after neoadjuvant therapy. Patients were categorized as having pCR or non-pCR based on postsurgical histopathologic examination findings.

Features used

Texture analysis was performed using a small bounding box enclosing the tumor as determined by functional tumor volume provided by ISPY-2 data. Calculations were performed using the Scikit-Image library in Python. Images were cast to 8-bits, and the number of bins was set to the maximum value of 256 to maximize the number of grey levels counted. GLCM features (Energy, Homogeneity, Contrast, Dissimilarity, ASM, and Correlation) were calculated at distances of 1, 3, and 5 pixels at angles of 0, $\frac{π}{2}$ , $\frac{π}{4}$ , and $\frac{7 π}{4}$ radians, representing all cardinal and ordinal directions. These features were calculated for all 3 treatment time points for both DCE and DWI MRI images, except for the 5-pixel direction for DWI images, as sometimes the segmentation was too small. Table 1 summarizes the significance of each feature and the formula used to calculate it. This resulted in a total of 360 GLCM features. Combined with 6 non-imaging patient data features, and 6 features extracted from ADC parametric map, the total number of features used was 372. Feature selection to reduce the number of features was unnecessary, as XGBoost inherently selects for the most important features when splitting leaves of decision trees, automatically ignoring irrelevant features.

Table 1. GLCM textural features used.

GLCM Feature Name	Formula	Purpose
Contrast	$\sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} {(i - j)}^{2} P (i, j)$	A measure of the intensity difference between a pixel and its neighbor, 0 for a constant image.
Energy	$\sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} p {(i, j)}^{2}$	A measure of the sum of squared elements in the GLCM, 1 for a constant image.
Correlation	$\sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} \frac{(i - μ_{i}) (j - μ_{j}) P (i, j)}{σ_{i} σ_{j}}$	A measure of how correlated a pixel is to its neighbors over the whole image.
Homogeneity	$\sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} \frac{p (i, j)}{1 + \| i - j \|}$	A measure of the closeness of the distribution of elements in the GLCM compared to the diagonal.
Dissimilarity	$\sum_{x = 1}^{K} \sum_{y = 1}^{K} \| x - y \| p_{x y}$	A measure of distance between pairs of objects (pixels) in the region of interest.
Angular Second Moment (ASM)	$\sum_{i = 0}^{N} \sum_{j = 0}^{N} p {(i, j)}^{2}$	A measure of the uniformity of distribution of grey level in the image

Open in a new tab

Extreme gradient boosting models

A total of 13 XGBoost models were created using different combinations of data and time points. These XGBoost models were created in Python using the Scikit-Learn API. Minority oversampling was used to balance the frequency of each class in the data set by randomly oversampling the minority (pCR) class. The oversampled balanced dataset consisted of 162 patients with 50% pCR and 50% non-PCR outcomes.

Bayesian optimization along with a sequential domain reduction transformer was used to find optimal values of these hyperparameters. The mean AUC after 5-fold cross-verification was selected as the variable to maximize. 15 rounds of random exploration and 80 rounds of optimization were used. If the optimal value was an extreme of the bound, the bounds were adjusted, and optimization was run again. Fig 2 shows this process for a sample hyperparameter.

Statistical analysis

Distributions of patient characteristics, such as the distribution of lesion types and race, were compared using the χ2 test for homogeneity. Patient age and maximum tumor diameter distributions between classes were compared using a t-test. Investigation of the receptor status characteristics were done using 2 sample z-test of proportions. F-scores were used to calculate the importance of each individual feature in predicting pCR.

K-fold cross-validation is considered the gold standard for determining model performance after training. 5 folds of verification were used to designate 20% of the dataset as testing and 80% as training for each fold. All performance metrics were calculated as the mean value of each testing metric of the 5 folds after 1000 rounds of training. Standard error of the mean was calculated as the variability of this mean AUC.

Models were analyzed primarily through the area under the receiver operator curve (AUC). AUC has been shown to be the optimal method for comparing AI models. Precision and recall were secondary metrics for model performance. P-values < 0.05 were considered significant and were calculated as the probability that the null hypothesis is true.

Results

Patient characteristics and the sample sizes (n = 117) are described in Table 2. There were 36 patients with pCR and 81 without pCR. There was no significant difference in the mean age for patients with pCR (49.08 +/- 10.31 years) and patients without pCR (49.0 +/- 11.73 years) (p = 0.971). There was also no significant difference in the longest diameter of the tumor between classes (p = 0.252). The distributions of patient races (p = 0.205), lesion types (p = 0.409), and SBR grades (p = 0.488) showed no significant difference. There was a significant difference in distributions of receptor statuses (p<0.001), with the patient without pCR class having a greater proportion of patients that have HR+/HER2- status type (p<0.001) and a lesser proportion of patients with the HR-/HER2+ status type (p<0.001).

Table 2. Patient demographics and sample sizes.

	Patients with pCR (n = 36)	Patients without pCR (n = 81)	P-Value
Age	49.08 ± 10.31 years	49.0 ± 11.73 years	0.971
Race	White (n = 26)	White (n = 61)	0.205
	Asian (n = 3)	Asian (n = 7)
	Black (n = 1)	Black (n = 8)
	Unknown (n = 6)	Unknown (n = 5)
Lesion Type	Multiple masses (n = 16)	Multiple masses (n = 49)	0.409
	Multiple NME (n = 2)	Multiple NME (n = 3)
	Single mass (n = 16)	Single mass (n = 27)
	Single NME (n = 2)	Single NME (n = 2)
Receptor Status	HR + / HER2 + (n = 8)	HR + / HER2 + (n = 12)	<0.001
	HR + / HER2 - (n = 8)	HR + / HER2 - (n = 43)
	HR—/ HER2 - (n = 12)	HR—/ HER2 - (n = 24)
	HR—/ HER2 + (n = 8)	HR—/ HER2 + (n = 2)
SBR Grade	III [High] (n = 22)	III [High] (n = 55)	0.488
	II [Intermediate] (n = 13)	II [Intermediate] (n = 23)
	I [Low] (n = 0)	I [Low] (n = 3)
	Unknown (n = 0)	Unknown (n = 1)
Longest Diameter	3.67 +/- 2.21 cm	4.18 +/- 2.19 cm	0.252

Open in a new tab

Distributions of patient characteristics, such as the distribution of lesion types and race, were compared using the χ2 test for homogeneity. Patient age and maximum tumor diameter distributions between classes were compared using a t-test. Investigation of the receptor status characteristics were done using 2 sample z-test of proportions. NME: non-mass enhancing; HR: hormone receptor; HER2: human epidermal growth factor receptor 2; pCR: complete pathological response; SBR: Scarff-Bloom-Richardson.

Fig 3A shows post-contrast DCE MRIs for a pCR patient and Fig 3B shows these MRIs for a non pCR patient at the pre-treatment, early treatment, and mid-treatment time points. The tumors were hyperintense relative to background tissue. Tumor of the non-pCR patient shrunk moderately whereas the tumor of the pCR patient shrunk markedly with time.

Fig 4A shows post-contrast DWI MRIs for a pCR patient and Fig 4B shows these MRIs for a non pCR patient. DWI signals of the tumors were hyperintense relative to background tissue. Tumor of the non-pCR patient shrunk moderately whereas the tumor of the pCR patient shrunk markedly with time.

Fig 5 describes the 10 most important texture features out of 372 features. All top 10 features were GLCM textural features derived from MRI data. In comparison, patient data scored far lower in terms of F-score. Patient age, race, and lesion type had F-scores of 1, 0, and 3 respectively. Receptor status, SBR grade, and longest diameter of the tumor had F-scores of 6, 3, and 3 respectively.

For pCR patients, regional tumor ADCs were 0.59±0.05, 0.81±0.04, and 1.11±0.06 x10^-3 mm²/s at pre-treatment, early treatment, and mid-treatment, respectively (Fig 6). For non-pCR patients, the corresponding tumor ADC values were 0.59±0.03, 0.74±0.03, and 0.91±0.05 x10^-3 mm²/s. ADC values for both PCR and non-PCR patients increased with treatment, with pCR patients showing a larger increase progressively.

Fig 6 — Error bars are standard deviations. pCR: complete pathological response.

Prediction of pCR

Table 3 shows the AUC, precision, and recall performance of XGBoost for different data inputs after five-fold cross-validation. These AUC values were obtained using all treatment time points as input. With the mean tumor ADC, the AUC was 0.850 (95% CI: 0.764, 0.936; p<0.001). With texture features of DWI, the AUC was 0.871 (95% CI: 0.768, 0.974; p<0.001). With texture features of DCE images, the AUC was 0.903 (95% CI: 0.854, 0.952; p<0.001). Combining texture features of both DCE and DWI, the AUC was 0.916 (95% CI: 0.851, 0.981; p<0.001). All MRI data yielded an AUC of 0.933 (95% CI: 0.836, 1.03; p<0.001). By comparison, using non-imaging data yielded an AUC of 0.910 (95% CI: 0.848, 0.99; p<0.001). Prediction using all available imaging and non-imaging data yielded an AUC of 0.951 (95% CI: 0.909, 0.993; p<0.001).

Table 3. Model performances for models trained on ADC, DCE(GLCM), DWI(GLCM), their combinations, all non-imaging data, and the combination of all MRI and all non-MRI data with 5-fold cross-validation.

Features	AUC	Precision	Recall
ADC	0.850 (0.764, 0.936; p<0.001)	0.752 (0.666, 0.838; p<0.002)	0.827 (0.753, 0.901; p<0.001))
DWI GLCM	0.871 (0.768, 0.974; p<0.001)	0.779 (0.713, 0.845; p<0.001)	0.926 (0.861, 0.991; p<0.001)
DCE GLCM	0.903 (0.854, 0.952; p<0.001)	0.856 (0.808, 0.904; p<0.001)	0.939 (0.891, 0.987; p<0.001)
DCE+DWI GLCM	0.916 (0.851, 0.981; p<0.001)	0.779 (0.678, 0.880; p<0.003)	0.915 (0.861, 0.969; p<0.001)
All MRI Data	0.933 (0.836, 1.030; p<0.001)	0.824 (0.780, 0.868; p<0.001)	0.889 (0.848, 0.930; p<0.001)
All Non-MRI Data	0.919 (0.848, 0.990; p<0.001)	0.762 (0.665, 0.859; p<0.003)	0.914 (0.888, 0.940; p<0.001)
All MRI + Non-MRI Data	0.951 (0.909, 0.993; p<0.001)	0.815 (0.690, 0.940; p<0.004)	0.926 (0.844, 1.008; p<0.001)

Open in a new tab

Values in parentheses are 95% confidence intervals. P-values are probability of AUC being 0.5. DWI: diffusion-weighted imaging; ADC: apparent diffusion coefficient; AUC: area under the curve; GLCM: gray-level co-occurrence matrix.

We also evaluated the effects of using different combinations of treatment time points on prediction performance. For prediction models using all available MRI and non-MRI data, AUC using tp0, tp1, tp2, and tp0+tp1 were respectively, 0.918 (95% CI: 0.856, 0.98; p<0.001), 0.844 (95% CI: 0.705, 0.983; p = 0.004), 0.920 (95% CI: 0.858, 0.982; p<0.001), and 0.938 (95% CI: 0.89, 0.986; p<0.001). The AUC of using all time points were 0.951 (95% CI: 0.91, 0.992; p<0.001) (as shown in Table 4).

Table 4. Model performances for models trained on all features restricted to tp0, tp1, tp2, tp0 + tp1 and all timepoints using 5-fold cross-validation.

Timepoints	AUC	Precision	Recall
Tp0	0.918 (0.856, 0.980; p<0.001)	0.778 (0.716, 0.840; p<0.001)	0.913 (0.885, 0.941; p<0.001)
Tp1	0.844 (0.705, 0.983; p = 0.004)	0.793 (0.653, 0.933; p = 0.007)	0.841 (0.709, 0.973; p = 0.003)
Tp2	0.920 (0.858, 0.982; p<0.001)	0.832 (0.77, 0.894; p<0.001)	0.951 (0.929, 0.973; p<0.001)
Tp0 + Tp1	0.938 (0.89, 0.986; p<0.001)	0.805 (0.757, 0.853; p<0.001)	0.963 (0.918, 1.008; p<0.001)
Tp0 + Tp1 + Tp2	0.951 (0.910, 0.992; p<0.001)	0.815 (0.690, 0.940; p = 0.004)	0.926 (0.844, 1.008; p<0.001)

Open in a new tab

Values in parentheses are 95% confidence intervals. P-values are probability of AUC being 0.5. AUC: area under the curve; Tp: time point.

Discussion

This study aimed to predict pCR non-invasively using XGBoost models with multiparametric MRI data at multiple treatment timepoints along with non-imaging data. This paper is the first to apply an extreme gradient boosting algorithm to the ISPY-2 data set to predict pathological complete response, and the second ever to use such an algorithm for predicting NAC response. This paper is also the first ever to use GLCM features to do so.

In the original I-SPY2 study, Partridge et al. [22] used logistic regression to evaluate if change in tumor ADC values (not texture analysis) is predictive of PCR in breast cancer patients. They found that changes in tumor ADC values was moderately predictive of pCR at mid-treatment (AUC = 0.60; 95% CI: 0.52, 0.68). In a test subset, a model combining tumor subtype and mid-treatment changes in ADC significantly improved predictive performance (AUC = 0.72; 95% CI: 0.61, 0.83).

There have only been a few studies that have applied deep learning on MR images themselves as inputs to predict pCR. Braman et al. [7] applied deep learning to predict NAC in HER2 patients from pre-treatment 2D DCE MRI in a retrospective study. They explored both pre-contrast and late post-contrast phases of DCE MRI and found an AUC of 0.74 (± 0.03). Qu et al. [24] built a CNN model using pre and post-NAC. Tumor regions were manually segmented by two expert radiologists on enhanced T1-weighted images. They found an AUC of 0.553 (95% CI: 0.416, 0.683) for pre-NAC data.

There have also only been a few studies that have applied non-deep learning on MR images themselves as inputs to predict pCR. Suo et al. [17] evaluated mono-exponential (ADC), bi-exponential and stretched-exponential from diffusion MRI data to predict pCR. They also included tumor size and relative enhancement ratio from DCE MRI data at 3 time points: before treatment, at mid-treatment, and after treatment with NAC. They found that flow-insensitive ADC change at mid-treatment was the most predictive feature, with an AUC of 0.831 (95% CI: 0.747, 0.915; P < 0.001). Combining this with receptor statuses, the AUC increased to 0.905 (95% CI: 0.843, 0.966; P < 0.001). Bian et al. [25] analyzed radiomic signatures based on T2W imaging, diffusion-weighted imaging, dynamic contrast-enhanced imaging and their combination to predict pCR. Logistic regression was then used to assess the association between features and clinical risk factors. The combined radiomic signature and nomogram model achieved an AUC of 0.91 (95% CI: 0.86, 1.00). Chen et al. [26] similarly analyzed the ability of radiomic signatures extracted from MRI data to predict pCR. They achieved an AUC of 0.848 by combining DCE-MRI and ADC maps. Eun et al. [19] performed non-GLCM texture analysis of pre and mid-treatment T2-weighted MRI, DCE, DWI, and ADC mapping. The random forest classifier to predict PCR showed the highest diagnostic performance with mid-treatment DCE MRI (AUC = 0.82; 95% CI: 0.74, 0.88). Huang et al. [27] used a multilayer perceptron trained on radiomics features generated from MRI images (ADC maps, DCE, and fat-suppressed T2-weighted imaging), and reported an AUC of 0.900 AUC (95% CI: 0.849, 0.935) on the validation dataset. Tahmassebi et al. [20] was the only other paper to employ XGBoost to predict pCR. Using quantitative pharmacokinetic DCE features and ADC values in different classifier algorithms (support vector machine, linear discriminant analysis, logistic regression, random forests, stochastic gradient descent, decision tree, adaptive boosting and extreme gradient boosting), they achieved a final AUC of 0.86 with an extreme gradient boosting.

Limitations

This study was performed on a relatively small multi-center dataset. As more data releases, our findings need to be replicated on a larger dataset to improve the generalizability of the model as well as account for the racial bias towards white women in this dataset. We have only evaluated XGBoost, other machine learning methods such as LightGBM should also be explored. Finally, we have only extracted GLCM features from MRI imaging data. Using additional features such as radiomic textural features may improve the model and give it more perspectives on the images. We employed images as provided and performed only visual image quality checks. We did not perform distortion correction, eddy current correction, check for multisite scanner consistency, among others, as raw data were not available. It has been shown that accurate estimates and reproducibility of diffusion indices in clinical studies require careful data quality assurance [28]. Moreover, large multicenter studies have shown a non-negligible variability in quantitative diffusion indices measured using different scanner systems [29, 30]. Therefore, MRI scanner system should be adequately characterized in diffusion-MRI of the breast [31]. Deep learning methods could also be applied to predict PCR [24, 32–37] (see review [38]) but were not analyzed.

Conclusions

XGBoost models using multiparametric MRI data along with demographic and molecular subtype data, accurately predict pCR to NAC. With further development and testing on larger multi-institutional sample sizes, this approach has the potential to non-invasively identify patients who are likely to respond to neoadjuvant chemotherapy at diagnosis or early treatment. This approach may prove useful for treatment planning, treatment execution, and mid-treatment adjustment to achieve better outcomes.

Data Availability

Data was obtained as a public dataset in the BMMR2 Challenge. The full dataset is available at (https://doi.org/10.7937/TCIA.KK02-6D95).

Funding Statement

The author(s) received no specific funding for this work.

References

1.Curigliano G, Burstein HJ, Winer EP, Gnant M, Dubsky P, Loibl S, et al. Correction to: De-escalating and escalating treatments for early-stage breast cancer: the St. Gallen International Expert Consensus Conference on the Primary Therapy of Early Breast Cancer 2017. Annals of Oncology. 2018;29: 2153. doi: 10.1093/annonc/mdx806 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Cortazar P, Zhang L, Untch M, Mehta K, Costantino JP, Wolmark N, et al. Pathological complete response and long-term clinical benefit in breast cancer: the CTNeoBC pooled analysis. The Lancet. 2014;384: 164–172. doi: 10.1016/S0140-6736(13)62422-8 [DOI] [PubMed] [Google Scholar]
3.Cortazar P, Geyer CE. Pathological Complete Response in Neoadjuvant Treatment of Breast Cancer. Ann Surg Oncol. 2015;22: 1441–1446. doi: 10.1245/s10434-015-4404-8 [DOI] [PubMed] [Google Scholar]
4.Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RGPM, Granton P, et al. Radiomics: Extracting more information from medical images using advanced feature analysis. European Journal of Cancer. 2012;48: 441–446. doi: 10.1016/j.ejca.2011.11.036 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Li X, Arlinghaus LR, Ayers GD, Chakravarthy AB, Abramson RG, Abramson VG, et al. DCE-MRI analysis methods for predicting the response of breast cancer to neoadjuvant chemotherapy: Pilot study findings: DCE-MRI to Predict Breast Cancer Treatment Response. Magn Reson Med. 2014;71: 1592–1602. doi: 10.1002/mrm.24782 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Li X, Abramson RG, Arlinghaus LR, Kang H, Chakravarthy AB, Abramson VG, et al. Multiparametric Magnetic Resonance Imaging for Predicting Pathological Response After the First Cycle of Neoadjuvant Chemotherapy in Breast Cancer: Investigative Radiology. 2015;50: 195–204. doi: 10.1097/RLI.0000000000000100 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Braman NM, Etesami M, Prasanna P, Dubchuk C, Gilmore H, Tiwari P, et al. Intratumoral and peritumoral radiomics for the pretreatment prediction of pathological complete response to neoadjuvant chemotherapy based on breast DCE-MRI. Breast Cancer Res. 2017;19: 57. doi: 10.1186/s13058-017-0846-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Mani S, Chen Y, Li X, Arlinghaus L, Chakravarthy AB, Abramson V, et al. Machine learning for predicting the response of breast cancer to neoadjuvant chemotherapy. J Am Med Inform Assoc. 2013;20: 688–695. doi: 10.1136/amiajnl-2012-001332 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Tahmassebi A, Gandomi AH, Fong S, Meyer-Baese A, Foo SY. Multi-stage optimization of a deep model: A case study on ground motion modeling. Olier I, editor. PLoS ONE. 2018;13: e0203829. doi: 10.1371/journal.pone.0203829 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Cain EH, Saha A, Harowicz MR, Marks JR, Marcom PK, Mazurowski MA. Multivariate machine learning models for prediction of pathologic response to neoadjuvant therapy in breast cancer using MRI features: a study using an independent validation set. Breast Cancer Res Treat. 2019;173: 455–463. doi: 10.1007/s10549-018-4990-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Lo Gullo R, Eskreis-Winkler S, Morris EA, Pinker K. Machine learning with multiparametric magnetic resonance imaging of the breast for early prediction of response to neoadjuvant chemotherapy. The Breast. 2020;49: 115–122. doi: 10.1016/j.breast.2019.11.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Houssein EH, Emam MM, Ali AA, Suganthan PN. Deep and machine learning techniques for medical imaging-based breast cancer: A comprehensive review. Expert Systems with Applications. 2021;167: 114161. doi: 10.1016/j.eswa.2020.114161 [DOI] [Google Scholar]
13.El Adoui M, Drisis S, Benjelloun M. A PRM approach for early prediction of breast cancer response to chemotherapy based on registered MR images. Int J CARS. 2018;13: 1233–1243. doi: 10.1007/s11548-018-1790-y [DOI] [PubMed] [Google Scholar]
14.Ravichandran K, Braman N, Janowczyk A, Madabhushi A. A deep learning classifier for prediction of pathological complete response to neoadjuvant chemotherapy from baseline breast DCE-MRI. In: Mori K, Petrick N, editors. Medical Imaging 2018: Computer-Aided Diagnosis. Houston, United States: SPIE; 2018. p. 11. doi: 10.1117/12.2294056 [DOI] [Google Scholar]
15.Schettini F, Pascual T, Conte B, Chic N, Brasó-Maristany F, Galván P, et al. HER2-enriched subtype and pathological complete response in HER2-positive breast cancer: A systematic review and meta-analysis. Cancer Treatment Reviews. 2020;84: 101965. doi: 10.1016/j.ctrv.2020.101965 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kalinowski L, Saunus JM, McCart Reed AE, Lakhani SR. Breast Cancer Heterogeneity in Primary and Metastatic Disease. In: Ahmad A, editor. Breast Cancer Metastasis and Drug Resistance. Cham: Springer International Publishing; 2019. pp. 75–104. doi: 10.1007/978-3-030-20301-6_6 [DOI] [PubMed] [Google Scholar]
17.Suo S, Yin Y, Geng X, Zhang D, Hua J, Cheng F, et al. Diffusion-weighted MRI for predicting pathologic response to neoadjuvant chemotherapy in breast cancer: evaluation with mono-, bi-, and stretched-exponential models. J Transl Med. 2021;19: 236. doi: 10.1186/s12967-021-02886-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Zhao R, Lu H, Li Y-B, Shao Z-Z, Ma W-J, Liu P-F. Nomogram for Early Prediction of Pathological Complete Response to Neoadjuvant Chemotherapy in Breast Cancer Using Dynamic Contrast-enhanced and Diffusion-weighted MRI. Academic Radiology. 2022;29: S155–S163. doi: 10.1016/j.acra.2021.01.023 [DOI] [PubMed] [Google Scholar]
19.Eun NL, Kang D, Son EJ, Park JS, Youk JH, Kim J-A, et al. Texture Analysis with 3.0-T MRI for Association of Response to Neoadjuvant Chemotherapy in Breast Cancer. Radiology. 2020;294: 31–41. doi: 10.1148/radiol.2019182718 [DOI] [PubMed] [Google Scholar]
20.Tahmassebi A, Wengert GJ, Helbich TH, Bago-Horvath Z, Alaei S, Bartsch R, et al. Impact of Machine Learning With Multiparametric Magnetic Resonance Imaging of the Breast for Early Prediction of Response to Neoadjuvant Chemotherapy and Survival Outcomes in Breast Cancer Patients: Investigative Radiology. 2019;54: 110–117. doi: 10.1097/RLI.0000000000000518 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.van der Hoogt KJJ, Schipper RJ, Winter-Warnars GA, Ter Beek LC, Loo CE, Mann RM, et al. Factors affecting the value of diffusion-weighted imaging for identifying breast cancer patients with pathological complete response on neoadjuvant systemic therapy: a systematic review. Insights Imaging. 2021;12: 187. doi: 10.1186/s13244-021-01123-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Partridge SC, Zhang Z, Newitt DC, Gibbs JE, Chenevert TL, Rosen MA, et al. Diffusion-weighted MRI Findings Predict Pathologic Response in Neoadjuvant Treatment of Breast Cancer: The ACRIN 6698 Multicenter Trial. Radiology. 2018;289: 618–627. doi: 10.1148/radiol.2018180273 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8: 118–127. doi: 10.1093/biostatistics/kxj037 [DOI] [PubMed] [Google Scholar]
24.Qu YH, Zhu HT, Cao K, Li XT, Ye M, Sun YS. Prediction of pathological complete response to neoadjuvant chemotherapy in breast cancer using a deep learning (DL) method. Thorac Cancer. 2020;11(3):651–8. Epub 20200116. doi: 10.1111/1759-7714.13309 ; PubMed Central PMCID: PMC7049483. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Bian T, Wu Z, Lin Q, Wang H, Ge Y, Duan S, et al. Radiomic signatures derived from multiparametric MRI for the pretreatment prediction of response to neoadjuvant chemotherapy in breast cancer. BJR. 2020;93: 20200287. doi: 10.1259/bjr.20200287 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Chen X, Chen X, Yang J, Li Y, Fan W, Yang Z. Combining Dynamic Contrast-Enhanced Magnetic Resonance Imaging and Apparent Diffusion Coefficient Maps for a Radiomics Nomogram to Predict Pathological Complete Response to Neoadjuvant Chemotherapy in Breast Cancer Patients: Journal of Computer Assisted Tomography. 2020;44: 275–283. doi: 10.1097/RCT.0000000000000978 [DOI] [PubMed] [Google Scholar]
27.Huang Y, Chen W, Zhang X, He S, Shao N, Shi H, et al. Prediction of Tumor Shrinkage Pattern to Neoadjuvant Chemotherapy Using a Multiparametric MRI-Based Machine Learning Model in Patients With Breast Cancer. Front Bioeng Biotechnol. 2021;9: 662749. doi: 10.3389/fbioe.2021.662749 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Jones DK. Precision and accuracy in diffusion tensor magnetic resonance imaging. Top Magn Reson Imaging. 2010;21: 87–99. doi: 10.1097/RMR.0b013e31821e56ac [DOI] [PubMed] [Google Scholar]
29.Fedeli L, Belli G, Ciccarone A, Coniglio A, Esposito M, Giannelli M, et al. Dependence of apparent diffusion coefficient measurement on diffusion gradient direction and spatial position—A quality assurance intercomparison study of forty-four scanners for quantitative diffusion-weighted imaging. Phys Med. 2018;55: 135–141. doi: 10.1016/j.ejmp.2018.09.007 [DOI] [PubMed] [Google Scholar]
30.Fedeli L, Benelli M, Busoni S, Belli G, Ciccarone A, Coniglio A, et al. On the dependence of quantitative diffusion-weighted imaging on scanner system characteristics and acquisition parameters: A large multicenter and multiparametric phantom study with unsupervised clustering analysis. Phys Med. 2021;85: 98–106. doi: 10.1016/j.ejmp.2021.04.020 [DOI] [PubMed] [Google Scholar]
31.Giannelli M, Sghedoni R, Iacconi C, Iori M, Traino AC, Guerrisi M, et al. MR Scanner Systems Should Be Adequately Characterized in Diffusion-MRI of the Breast. Woloschak GE, editor. PLoS ONE. 2014;9: e86280. doi: 10.1371/journal.pone.0086280 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Duanmu H, Huang PB, Brahmava S, Lin S, Ren T, Kong RJ, et al. Prediction of Pathological Complete Response to Neoadjuvant Chemotherapy in Breast Cancer Using Deep Learning with Integrative Imaging, Molecular and Demographic Data. MICCAI 2020: Medical Image Computing and Computer Assisted Intervention 2020. p. 242–52. [Google Scholar]
33.Liu MZ, Mutasa S, Chang P, Siddique M, Jambawalikar S, Ha R. A novel CNN algorithm for pathological complete response prediction using an I-SPY TRIAL breast MRI database. Magn Reson Imaging. 2020;73:148–51. Epub 20200902. doi: 10.1016/j.mri.2020.08.021 ; PubMed Central PMCID: PMC8111786. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Huynh BQ, Antropova N, Giger ML. Comparison of breast DCE-MRI contrast time points for predicting response to neoadjuvant chemotherapy using deep convolutional neural network features with transfer learning. SPIE Medical Imaging. 2017;10134. doi: 10.1117/12.2255316 [DOI] [Google Scholar]
35.Duanmu H, Ren T, Duong TQ. Deep learning prediction of pathological complete response, residual cancer burden, and progression-free survival in breast cancer patients. PlosOne. 2022, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Ren T, Lin S, Huang P, Duong TQ. Convolutional Neural Network of Multiparametric MRI Accurately Detects Axillary Lymph Node Metastasis in Breast Cancer Patients With Pre Neoadjuvant Chemotherapy. Clin Breast Cancer. 2022;22(2):170–7. Epub 20210713. doi: 10.1016/j.clbc.2021.07.002 . [DOI] [PubMed] [Google Scholar]
37.Hussain L, Huang P, Nguyen T, Lone KJ, Ali A, Khan MS, et al. Machine learning classification of texture features of MRI breast tumor and peri-tumor of combined pre- and early treatment predicts pathologic complete response. Biomed Eng Online. 2021;20(1):63. Epub 20210628. doi: 10.1186/s12938-021-00899-z . [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Khan N, Adam R, Huang P, Maldjian T, Duong TQ. Deep Learning Prediction of Pathologic Complete Response in Breast Cancer Using MRI and Other Clinical Data: A Systematic Review. Tomography. 2022;8(6):2784–95. Epub 20221121. doi: 10.3390/tomography8060232 . [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0280320.r001

Decision Letter 0

Marco Giannelli

25 Jul 2022

PONE-D-22-10717Machine learning with textural analysis of longitudinal multiparametric MRI and molecular subtypes accurately predicts Pathologic Complete Response in patients with invasive breast cancerPLOS ONE

Dear Dr. Duong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

ACADEMIC EDITOR: This is a potentially interesting paper. However, the Authors should adequately address all the critical comments raised by the Reviewers. Moreover, they should discuss in greater detail the clinical impact of their findings.

Please submit your revised manuscript by Sep 08 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Marco Giannelli

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

3. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: 1) Given that DTI is a truly quantitative technique, a quality control of scanner performances in general and of correct application of diffusion weighting gradients in particular is needed, in order to guaranty accurate estimates of diffusion indices and reproducible results in clinical studies (Jones, Top Magn Reson Imaging 2010, 21: 87-99). Moreover, this is of paramount importance when pooling data from different scanners. Indeed, previous large multicenter studies have shown a non-negligible variability in quantitative diffusion indices measured using different scanner systems (Fedeli et al, Phys Med 2018, 55: 135-141; Fedeli et al, Phys Med 2021, 85: 98-106). Therefore, MR scanner system should be adequately characterized in diffusion-MRI of the breast (Giannelli et al, PLoS One 2014, 9: e86280). The Authors should discuss in detail these aspects.

2) The Authors have focused on a single feature class, i.e., GLCM. The Authors should explain this choice and why they have not considered additional feature classes.

3) When applying machine learning techniques on multicenter data, it would be appropriate using methods of data harmonization (Johnson et al, Biostatistics 2007, 8: 118-127). The Authors should hence discuss this limitation of their study.

4) Given the sensitivity of radiomic features to image resampling and discretization, the Authors should report the resampling voxel size and bin width/number of bins used for the preprocessing of radiomic data, discussing whether the used values are optimal.

Reviewer #2: In this retrospective paper, the authors were able to predict pCR after neoadjuvant chemotherapy using XGBoost with MRI and non-imaging data at multiple treatment timepoints.

1. In Abstract Purpose and throughout the paper, I suggest pathological complete response (pCR) and extreme gradient boosting (XGBoost).

2. In Abstract Results, 1st sentence, was 0.903 an AUC value?

3. In Introduction, 1st paragraph, last sentence, more details are needed.

4. In Introduction, 2nd paragraph, 1st sentence, a should be added before chemotherapy.

5. In Introduction, 3rd paragraph, 2nd sentence, see reviews can be cut and and should be added before DCE dynamics.

6. In Methodology, Data Sources, 1st paragraph, penultimate sentence, I suggest logistic regression.

7. In Methodology, Data Sources, 2nd paragraph, 1st sentence, and throughout the paper, I suggest paclitaxel and anthracycline.

8. In Methodology, Ground Truth, 1st sentence, FTV is not clear.

9. In Methodology, Statistical Analysis, 3rd paragraph, line 2, Recall should be recall. In line 4, the material in parentheses, starting with model, can be deleted.

10. In Results, 1st paragraph, 2nd sentence, the should be added before mean.

11. In Discussion, 4th paragraph, line 8, derived from should be cut.

12. In Conclusion, 1st sentence, subtypes should be changed to subtype. In the 2nd sentence, a should be added after predict.

13. The references should be written according to the journal's style.

14. In the figures, I suggest one patient per figure, and the pulse sequences should be described in the legends

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Gary J. Whitman MD

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Jan 17;18(1):e0280320. doi: 10.1371/journal.pone.0280320.r002

Author response to Decision Letter 0

7 Aug 2022

All responses are included in the "Response to Reviews" file. Below is a copy of the document:

PONE-D-22-10717

Dear Editors and Reviewers,

We thank you for your thoughtful reviews of our manuscript. The point by point response and revised manuscript are attached. Thank you.

Tim Duong on behalf of coauthors.

Machine learning with textural analysis of longitudinal multiparametric MRI and molecular subtypes accurately predicts Pathologic Complete Response in patients with invasive breast cancer

PLOS ONE

ACADEMIC EDITOR:

This is a potentially interesting paper. However, the Authors should adequately address all the critical comments raised by the Reviewers. Moreover, they should discuss in greater detail the clinical impact of their findings.

- Thank you. The followings have been added:

- With further development and testing on larger multi-institutional sample sizes, this approach has the potential to non-invasively identify patients who are likely to respond to neoadjuvant chemotherapy at diagnosis or early treatment. This approach may prove useful for treatment planning, treatment execution, and mid-treatment adjustment to achieve better outcomes.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Partly

Reviewer #2: Yes

________________________________________

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

- We expanded the section on Methods to better describe the statistical tests used. We also include statistical tests used in the results and legends where appropriate or unclear.

Reviewer #2: Yes

________________________________________

3. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

4. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

________________________________________

5. Review Comments to the Author

We completely agree with these comments. Raw images are not available to make corrections. Per open data sources, these images underwent reasonable and vigorous data quality check, , ranging from distortion correction, eddy current correction, quality assurance with phantom and human data, pooling data from multiple sites.

- Thank you for the references. They are cited. The following is added:

- We only performed visual check on images as provided but did not perform distortion correction, eddy current correction, check for multisite scanner consistency, among others, as raw data were not available. It has been shown that accurate estimates and reproducibility of diffusion indices in clinical studies require careful data quality assurance (Jones, Top Magn Reson Imaging 2010, 21: 87-99). Moreover, large multicenter studies have shown a non-negligible variability in quantitative diffusion indices measured using different scanner systems (Fedeli et al, Phys Med 2018, 55: 135-141; Fedeli et al, Phys Med 2021, 85: 98-106). Thus, MRI scanner system should be adequately characterized in diffusion-MRI of the breast (Giannelli et al, PLoS One 2014, 9: e86280).

2) The Authors have focused on a single feature class, i.e., GLCM. The Authors should explain this choice and why they have not considered additional feature classes.

- Thank you for your comment. The following is added.

- While there are many texture features have been explored, we only chose GLCM features because they haven't been widely used before compared to other radiomic features. Future studies should compare results from different texture features.

- Thank you for the references. They are cited. The following is added:

- When applying machine learning techniques on multicenter data, data harmonization (Johnson et al, Biostatistics 2007, 8: 118-127) may be necessary. However, we did not perform additional data harmonization as all MRI acquisitions were taken with ISPY2 acquisition requirements as described in (20). Furthermore, we ensured that data from different field strengths and different molecular subtypes, etc., were not over-represented in either training and testing data sets by using 5-fold cross validation so that all data has the chance to be in both the test and training sets for a given fold.

- Thank you for the comment. The following is added:

- All images were interpolated to 0.7825 mm x 0.7825 mm voxel spacing with a 2.5 mm slice thickness, which was the median spacing and thickness for all patients. Images were cast to 8-bits, and the number of bins was set to the maximum value of 256 to maximize the number of grey levels counted.

Reviewer #2: In this retrospective paper, the authors were able to predict pCR after neoadjuvant chemotherapy using XGBoost with MRI and non-imaging data at multiple treatment timepoints.

- For all comments below, all were revised as suggested (but not marked) except where details are noted:

1. In Abstract Purpose and throughout the paper, I suggest pathological complete response (pCR) and extreme gradient boosting (XGBoost).

2. In Abstract Results, 1st sentence, was 0.903 an AUC value?

3. In Introduction, 1st paragraph, last sentence, more details are needed.

- It is revised as:

- Thus, accurate prediction of pCR could help treatment planning, identifying patients who are likely to respond to specific NAC drugs , and alter treatments mid-course if needed, to maximize successful outcomes while minimize adverse effects of unnecessary chemotherapy.