Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Aug 1.
Published in final edited form as: Abdom Radiol (NY). 2022 Jun 16;47(8):2770–2782. doi: 10.1007/s00261-022-03572-8

Combined Artificial Intelligence and Radiologist Model for Predicting Rectal Cancer Treatment Response from Magnetic Resonance Imaging: An External Validation Study

Natally Horvat a,b, Harini Veeraraghavan c, Caio SR Nahas d, David DB Bates a, Felipe R Ferreira b, Junting Zheng e, Marinela Capanu e, James L Fuqua III a, Maria Clara Fernandes a, Ramon E Sosa a, Vetri Sudar Jayaprakasam a, Giovanni G Cerri b, Sergio C Nahas d, Iva Petkovska a
PMCID: PMC10150388  NIHMSID: NIHMS1889612  PMID: 35710951

Abstract

Purpose:

To evaluate an MRI-based radiomic texture classifier alone and combined with radiologist qualitative assessment in predicting pathological complete response (pCR) using restaging MRI with internal training and external validation.

Methods:

Consecutive patients with locally advanced rectal cancer (LARC) who underwent neoadjuvant therapy followed by total mesorectal excision from March 2012–February 2016 (Memorial Sloan Kettering Cancer Center/internal dataset, n=114, 41% female, median age=55) and July 2014–October 2015 (Instituto do Câncer do Estado de São Paulo/external dataset, n=50, 52% female, median age=64.5) were retrospectively included. Two radiologists (R1, senior; R2, junior) independently evaluated restaging MRI, classifying patients (radiological complete response vs radiological partial response). Model A (n=33 texture features), model B (n=91 features including texture, shape, and edge features), and two combination models (model A+B+R1, model A+B+R2) were constructed. Pathology served as the reference standard for neoadjuvant treatment response. Comparison of the classifiers’ AUCs on the external set was done using DeLong’s test.

Results:

Models A and B had similar discriminative ability (P=0.3; Model B AUC=83%, 95% CI: 70%–97%). Combined models increased inter-reader agreement compared with radiologist-only interpretation (κ=0.82, 95% CI: 0.70–0.89 vs k=0.25, 95% CI: 0.11–0.61). The combined model slightly increased junior radiologist specificity, positive predictive value (PPV), and negative predictive values (NPV) (93% vs 90%, 57% vs 50%, and 91% vs 90%, respectively).

Conclusion:

We developed and externally validated a combined model using radiomics and radiologist qualitative assessment, which improved inter-reader agreement and slightly increased the diagnostic performance of the junior radiologist in predicting pCR after neoadjuvant treatment in patients with LARC.

Keywords: Rectal Cancer, Neoadjuvant Therapy, Magnetic Resonance Imaging, Artificial Intelligence, Watchful Waiting

INTRODUCTION

There is multidisciplinary effort to improve the diagnostic accuracy of treatment response in patients with locally advanced rectal cancer (LARC) after neoadjuvant treatment, particularly as they may be eligible for a non-surgical watch-and-wait approach (1). Rectal magnetic resonance imaging (MRI), digital rectal examination, and endoscopy are the main modalities used to diagnose complete response but have variable predictive performance against reference standard pathologic complete response (pCR) as well as variable inter-reader agreement (25).

MRI-based radiomics has shown promise for predicting pCR to neoadjuvant therapy (area under the curve (AUC) from 0.72–0.91) (617). However, before MRI-based radiomics can be applied in the clinical routine, limitations including scanner-dependent differences and the lack of external validation of developed algorithms need to be overcome (18). To date, MRI-based radiomic models have also shown inconsistent results when their performance has been compared against radiologist performance (1921).

To our knowledge, no radiomic model has been externally validated using post-treatment rectal MRI radiomic features combined with qualitative radiologist assessment to predict pCR. Therefore, in this study, our aim was to evaluate the feasibility of applying an MRI radiomic classifier constructed on a single institution (internal discovery) cohort in a different single-institution, external (validation) cohort. We also studied whether combining the radiomic classifier with radiologist interpretation would improve the reliability of pCR prediction in patients with LARC after neoadjuvant treatment, using restaging MRI with internal and external validation. Furthermore, we evaluated whether MRI harmonization could improve the robustness of the texture features across different scanners and institutions, namely by reducing scanner-dependent intensity variation.

MATERIALS AND METHODS

Study sample

This multi-center study was approved by the institutional review board at Memorial Sloan Kettering Cancer Center (MSK)/internal dataset and at Instituto do Câncer do Estado de São Paulo (ICESP)/external dataset; the need for informed consent was waived. Institutional databases were searched to identify the consecutive patients who met the following inclusion criteria: diagnosed with LARC, and who underwent neoadjuvant treatment followed by total mesorectal excision, from March 2012–February 2016 at MSK, and from July 2014–October 2015 at ICESP. The exclusion criteria were as follows: interval between rectal MRI and surgery > 3 months; visible mucinous tumor due to high intrinsic T2 signal (as we aimed for uniformity of the dataset to develop the algorithm); tumor bed not completely included on MRI to avoid relevant parts of tumor being excluded; and technical issues that could compromise analyses. See Figure 1 for the patient inclusion flowchart.

Fig. 1.

Fig. 1

Flowchart demonstrating patient inclusion. Abbreviations: CRT, chemoradiotherapy; ICESP, Instituto do Câncer do Estado de São Paulo; MSK, Memorial Sloan Kettering Cancer Center; MRI, magnetic resonance imaging; NAT, neoadjuvant treatment; TME, total mesorectal excision

All patients from MSK were included in a previous study (10) to evaluate the added value of radiomic features on restaging rectal MRI performed after neoadjuvant treatment to predict complete response. In this study, our focus was to validate radiomic features in the same group of patients (MSK) and in an external dataset from ICESP as well as to combine radiomic analysis with qualitative radiologist assessment. In order to assess inter-reader agreement, scans from 38 patients were independently segmented by the two radiologists from MSK.

MRI protocol

All rectal MRIs were acquired on GE Healthcare platforms (MSK: 1.5T or 3.0T GE Discovery MR750, GE Optima MR450w, GE Signa EXCITE, and GE Signa HDxt; ICESP: Signa HDx 1.5T) with phase-array coils. The main sequences evaluated for this study included oblique T2-weighted imaging (T2WI) without fat suppression perpendicular to the long axis of the rectum and diffusion-weighted imaging (DWI). MRI parameters are summarized in Table S1.

Qualitative rectal MRI evaluation

Two abdominal radiologists from each institution (R1, senior; R2, junior) qualitatively assessed the tumor bed in all restaging rectal MRIs from their institution. The radiologists read the cases separately, with discrepancies re-assessed to reach consensus; thereafter, patients were classified as having either radiological complete response (rCR) or rPR (radiological partial response) which included patients with poor or no response. Radiologists were aware that patients had completed neoadjuvant therapy and had primary staging MRI available during qualitative assessment to guide the delimitation of the tumor bed, but were blinded to final pathological results.

The definition of rCR on T2WI was no residual intermediate signal intensity in the T2 dark scar or normalized rectal wall. On DWI, rCR was defined as no foci of restricted diffusion in the tumor bed defined as bright signal on DWI and dark on apparent diffusion coefficient (ADC) maps. Radiologists performed a combined assessment of T2WI and DWI to reach a final diagnosis. If rCR was not seen on both T2WI and DWI, the patient was classified as rPR.

Quantitative texture analysis

Image segmentation

Manual segmentation of the entire tumor bed within the rectal wall on oblique T2WI was performed separately at each institution, whereby one radiologist from each institution manually segmented images, excluding equivocal normal rectal wall and mucosal edema. Free open-source software (ITK-SNAP, version 3.4.0; http://itksnap.org) was used to delineate all volumes of interest (VOI) for computer-based image analysis. DWI sequences were not segmented and, consequently, not included in the radiomic model since the sequences are prone to artifacts and high variability among different institutions. Additionally, 38 patients were manually segmented by two independent abdominal radiologists from MSK.

Radiomics analysis method

Separate random forest (RF) classifiers to differentiate pCR from pathological partial response (pPR) were constructed using (a) 33 texture features (model A) resulting from first order statistics of T2WI (n = 4), Haralick textures (n = 5), Gabor edges (n = 4), and Haralick textures computed on Gabor edge images (n = 20), and (b) an integrated radiomic classifier (model B) using textures, shape, and commonly used radiomic measures (gray level run length matrix (GLRLM), gray level size zone matrix (GLSZM), neighborhood gray tone difference matrix (NGTDM), neighborhood gray level difference matrix (NGLDM), peak, and valley) (n = 91). Both models were trained with five-fold repeated (10 times) cross-validation on the internal dataset. External validation was performed on the dataset from ICESP.

See the Electronic Supplementary Material for more details on the radiomic features.

Impact of MRI harmonization on radiomic classifier performance

We evaluated whether applying MRI harmonization using histogram standardization would improve the robustness of radiomic features to scanner magnet (1.5T vs. 3.0T) differences. All images were acquired on GE Healthcare platforms (MSK: 1.5T or 3.0T GE Discovery MR750, GE Optima MR450w, GE Signa EXCITE, and GE Signa HDxt; ICESP: Signa HDx 1.5T). Histogram standardization matches the input MRI histogram with a reference (a randomly selected case from the training set) to reduce scanner-dependent intensity variations. We used an in-house C++ wrapper code (22, 23) for histogram standardization, using the open-source Insight Toolkit (24).

In both cases with and without MRI harmonization, texture features were extracted under identical parameters of 32 bins for discretizing the MRI signal intensities and under default settings for radiomic feature extraction in the CERR software environment. Radiomic features extracted using CERR were Image Biomarker Standard Initiative (IBSI)-compliant (25). The same feature settings as used in prior work (10) were applied. The features identified to be robust (P > 0.05) were used for the construction of random forest classifiers. Twenty-six out of 33 texture features were found to be robust and were used to construct model A. Seventy-nine out of 91 features were found to be robust and used to construct model B.

Furthermore, the impact of the target scan used for MRI harmonization was evaluated using three randomly selected MRIs serving as reference. All images from the training set were harmonized to these scans using histogram standardization and the number of robust features in the individual harmonization setting was computed.

Random forest classifier training details

A random forest (RF) classifier (26) (see Electronic Supplementary Material for further details) with 1000 trees was trained with 5-fold cross-validation and 10 repetitions on the internal discovery dataset separately for model A and B. Only the features found to be robust between magnet strengths (n = 79) were used for analysis. Synthetic minority oversampling technique (SMOTE) (27) to handle class imbalance (pCR vs. pPR) as done in other prior works (2831) was performed within each cross-validation fold (31). SMOTE is an approach to overcome the issue of overfitting when using unbalanced datasets. This approach avoids issues of data leakage and over-optimistic classification. SMOTE parameters used five nearest neighbors of the same class to produce augmented datasets, with 300% oversampling of the minority class and 100% undersampling of the majority class. Random forest (RF) classifiers were constructed with 1000 trees and the hyper-parameters, namely the number of predictors (two to N, where N is the number of features), used in each tree was optimized through nested cross-validation using linear search to reduce overfitting. Furthermore, RF itself constructs a large number of trees (n = 1000 in this case), wherein each tree is constructed with out-of-bag estimates using different sets of samples selected randomly with replacement and random samples of predictors selected before and after each node split in a tree. This approach ensures robustness to a large number of chosen examples (26). The best model obtained from the training set was applied to the external validation set. Caret, DwR, pROC, and epiR packages in the R software (version 3.1; R Foundation for Statistical Computing) were used in the analysis. All R code used in the analysis will be made available through the author’s GitHub repository.

Combining radiomics with radiologist assessment for pCR prediction (model A + model B + Radiologist)

We constructed a late or decision fusion strategy (32) using majority voting of the classification labels produced by model A, model B, and radiologist assessment (model A + B + R1; model A + B + R2), whereby the label with two out of three votes was selected as the prediction for each test sample. No additional training was necessary for late fusion and models A and B’s binary classifications (probability of classification > 0.5 used typically in machine learning) were used. The late fusion strategy was used to assess whether combining radiomics with radiologist assessment would increase the reliability of classification. Another advantage is that, once constructed, the MR radiomic classifiers can be combined directly with radiologist interpretation.

Reproducibility of radiomic features to segmentation differences

Reproducibility of radiomic features to tumor segmentation differences was evaluated in a subset of scans from 38 patients which were segmented by two radiologists. The radiologists segmented the tumors independently of each other.

Reference standard

Pathological reports of specimens from total mesorectal excision served as the reference standard for response to neoadjuvant treatment. Pathologists with 5–20 years of experience evaluated the specimens as part of the clinical routine.

Statistical analysis

The goal of statistical analysis was to determine (i) whether model B (with a higher number of features) has different discriminative ability compared with model A, and (ii) whether combining radiomics with radiologist assessment (model A + B + R1; model A + B + R2) using majority voting improves the reliability of classification. Secondary analysis consisted of assessing the added value of MRI harmonization (i.e., with vs without harmonization) and the added value of shape and standard radiomic features for improving predictive performance (i.e., iRC).

Continuous variables were summarized as medians (interquartile ranges [IQRs]) and means (ranges). Classifier performances were measured using the area under the receiver operating characteristic curve (AUC) for probabilistic scores. Additionally, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were computed for binary classifications using the pathological reports as the reference standard. Comparison of the classifiers’ AUCs on the external testing set was done using DeLong’s test. Inter-reader agreements between the two readers and inter-model reliability were assessed using the unweighted Cohen’s Kappa statistic (the psych and iRR package available in R version 4.0.2). The reproducibility of radiomic features with respect to radiologists’ segmentation was computed using Lin’s concordance correlation coefficient (CCC) (33) available in the epiR package in R version 4.0.2. Features with a CCC ≥ 0.75 as used previously (34, 35) for analyzing the reproducibility of radiomic measures were used to identify reproducible features.

RESULTS

Patient characteristics

The dataset at MSK consisted of 114 patients, 67 (58.8%) men and 47 (41.2%) women, with a median age of 55 years (interquartile range, IQR: 48–67). The median intervals between the end of neoadjuvant therapy and restaging MRI and between MRI and surgery at MSK were 37 (IQR: 30–67) and 29 (20–646) days, respectively. The dataset at ICESP consisted of 50 patients, 24 (48.0%) men and 26 (52.0%) women, with a median age of 64.5 years (IQR: 52–73). The median intervals between the end of neoadjuvant therapy and restaging MRI and between MRI and surgery at ICESP were 60 (IQR: 56–66) and 36 (29–46) days, respectively.

Most of the patients had tumors located in the middle rectum 48.2% (55/114) at MSK and 68% (34/50) at ICESP. Regarding the pathological staging, most of the tumors were ypT3 (51/114, 44.7% at MSK and 26/50, 52% at ICESP) and ypN0 (75/114, 65.8% at MSK and 30/50, 60% at ICESP). Pathological complete response was demonstrated in 21/114 (18%) patients at MSK and 8/50 (16%) at ICESP.

Patient characteristics are summarized in Table 1. Overall, the patients from ICESP were older (p=0.013) and the intervals between the end of neoadjuvant therapy and restaging MRI and from restaging MRI and surgery were longer (p<0.001 and p=0.004, respectively).

Table 1.

Patient characteristics.

Characteristics MSK/internal dataset, n = 114
n (%)
ICESP/external dataset, n = 50
n (%)
p-values
Gender 0.2
 Male 67 (58.8%) 24 (48.0%)
 Female 47 (41.2%) 26 (52.0%)
Age in years (median, IQR) 55 (48, 67) 64 (52, 73) 0.013
Interval between the end of neoadjuvant treatment and restaging MRI, days (median, IQR) 37 (30, 67) 60 (56, 66) <0.001
Interval between restaging MRI and surgery, days (median, IQR) 29 (20, 46) 36 (29, 46) 0.004
Tumor location <0.001
 Upper 25 (21.9%) 0 (0.0%)
 Lower 34 (29.8%) 16 (32.0%)
 Middle 55 (48.2%) 34 (68.0%)
Pathological T-stage 0.7
 0 23 (20.2%) 8 (16.0%)
 1 5 (4.4%) 4 (8.0%)
 2 32 (28.1%) 11 (22.0%)
 3 51 (44.7%) 26 (52.0%)
 4 3 (2.6%) 1 (2.0%)
Pathological N-stage 0.2
 0 75 (65.8%) 30 (60.0%)
 1 27 (23.7%) 18 (36.0%)
 2 12 (10.5%) 2 (4.0%)
Pathological Response 0.8
 pCR 21 (18.0%) 8 (16.0%)
 pPR 93 (82.0%) 42 (84.0%)

Abbreviations: ICESP, Instituto do Câncer do Estado de São Paulo; MSK, Memorial Sloan Kettering Cancer Center; pCR, pathological complete response; pPR, pathological partial response.

Qualitative assessment

Table 2 demonstrates the predictive performance of Readers 1 and 2, respectively. Both readers had low sensitivity and high specificity in predicting a pCR diagnosis. The inter-reader agreement was fair (κ = 0.25, 95% CI: −0.11 to 0.61).

Table 2.

Predictive performance of qualitative assessment of each reader (Reader 1, senior; Reader 2, junior), and when qualitative assessment is combined with the integrated radiomics classifier (n = 91). The results are from the external validation set.

Classifier Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI)
Reader 1 38% (9%–76%)
3/8
98% (87%–100%)
41/42
75% (19%–99%)
3/4
89% (76%–96%)
41/46
Reader 2 50% (16%–84%)
4/8
90% (77%–97%)
38/42
50% (16%–84%)
4/8
90% (77%–97%)
38/42
Model A + B + R1 38% (9%–76%)
3/8
95% (84%–99%)
40/42
60% (15%–95%)
3/5
89% (76%–96%)
40/45
Model A + B + R2 50% (16%–84%)
4/8
93% (81%–99%)
39/42
57% (18%–90%)
4/7
91% (78%–97%)
39/43

Abbreviations: NPV, negative predictive value; PPV, positive predictive value; pCR, pathological complete response; pPR, pathological partial response.

Quantitative assessment

Classification performance on external institution dataset

Table 3 summarizes the predictive performance of models A and B on external validation. AUC was not significantly different between models A and B (P = 0.3). In order to ensure robust generalization, the testing results shown in Table 3 are for a held-out external dataset that was not used for model training or selection. The testing dataset was not made available until after the construction of the random forest classifier using the institutional dataset.

Table 3.

Predictive performance of the various random forest radiomics classifiers (model A and model B) on the external validation set.

Feature set AUROC (95% CI) Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI)
Model A (n = 33) 79% (62%–96%) 50% (16%–84%)
4/8
93% (81%–99%)
39/42
57% (18%–90%)
4/7
91% (78%–97%)
39/43
Model B (n = 91) 83% (70%–97%) 38% (9%–76%)
3/8
90% (77%–97%)
38/42
43% (10%–82%)
3/7
88% (75%–96%)
38/43

Abbreviations: AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value; pCR, pathological complete response; pPR, pathological partial response

Relevant radiomic features in model A and model B

Fourteen texture and 43 radiomic features had variable importance > 25 for models A and B, respectively (see Figure 2 for all the features with variable importance > 25 for the two models). In model A, 3 features, i.e., energy, homogeneity of Gabor (135°, 1.414), and homogeneity of Gabor (180°, 1.414), had variable importance > 75 and were significantly different between pCR and pPR (Table S2). The maximum importance possible for any feature is 100 and a cut-off of 75 represents features with high variable importance or weights while those below 25 have very low relevance for classification. In model B, 10 features had variable importance > 75, and out of those ten features, 3/10 features, i.e., low gray level zone emphasis (LZLGLE), homogeneity of Gabor (135°, 1.414), and homogeneity of Gabor (180°, 1.414), were significantly different between pPR and pCR after adjustment for multiple comparisons (see Table S3). Note that for homogeneity of Gabor, 1.414 refers to the kernel bandwidth and the angle corresponds to the directional bandwidth for computing the Gabor features as used in previous publications (10, 3638). Homogeneity of Gabor (135°, 1.414) and homogeneity of Gabor (180°, 1.414) were found to be relevant in both models A and B in differentiating patients with pCR from patients with pPR (P = 0.004 and 0.015, respectively, for both models); specifically, homogeneity of Gabor values were higher in patients with pCR (homogeneity of Gabor (135°, 1.414): median = 27.9, IQR = 21.2–43.9; homogeneity of Gabor (180°, 1.414): median = 25.7, IQR = 16.2 to 34.5) than patients with pPR (homogeneity of Gabor (135°, 1.414): median = 14.9, IQR=9.3–23.1; homogeneity of Gabor (180°, 1.414): median = 15.7, IQR = 10.7 to 19.9). Figure 3 shows the homogeneity of Gabor (135°, 1.414) feature for representative patients with pCR and pPR.

Fig. 2.

Fig. 2

Most relevant features that with importance > 25 for (a) Model A. (b) Model B

Fig. 3.

Fig. 3

(a, b) A 76-year-old man underwent restaging rectal MRI 56 days after the end of chemoradiotherapy. (a) Axial oblique T2WI: reader 1 classified this as showing radiological complete response, while reader 2 classified this as showing radiological partial response. (b) Homogeneity of Gabor (135°) overlaid on high-resolution T2WI maps shows lower values of this feature (blue colors), which in our study suggested pathological partial response. The patient underwent laparoscopic low anterior resection 30 days after MRI scan and the pathology demonstrated residual tumor (i.e., pathological partial response). (c, d) A 52-year-old woman underwent restaging rectal MRI 60 days after the end of chemoradiotherapy. (C) Axial oblique T2WI: reader 1 classified this as showing radiological complete response, while reader 2 classified this as showing radiological partial response. (d) Homogeneity of Gabor (135°) overlaid on high-resolution T2WI maps demonstrates higher values of this feature (red colors), which in our study suggested pathological complete response. The patient underwent an open low anterior resection 23 days after MRI scan and pathology demonstrated no residual tumor (i.e., pathological complete response)

Combining radiomics with radiologist assessment

As shown in Table 2, combining radiomic models (models A and B) with radiologist interpretation reduced the inter-reader variability with respect to specificity, as well as the PPV and NPV. The combination also yielded higher inter-model reliability with κ = 0.82 (95% CI: 0.70–0.89) and improved concordant cases to 96%. Additionally, this increased the junior radiologist’s diagnostic performance with respect to specificity, as well as the positive and negative predictive values.

Influence of MRI harmonization on feature robustness

The number of features that were robust to scanner differences (n = 79) remained the same both before and after MRI harmonization to a randomly chosen target MRI scan (Target I). Non-robust features included the MRI mean signal intensity; the mean values of all the Gabor edge features; the Haralick textures contrast and entropy; the GLRLM features long run high gray level emphasis (LRHGLE), complexity, valley; and skewness and kurtosis of Sobel edge features. Median values of the various features before and after MRI harmonization are shown in Tables S4 and S5. Histogram standardization aims to bring the overall MRI signal intensity distribution closer to a target intensity distribution, such that overall distribution of the MRI intensities, shifting towards the target intensity distribution. After histogram standardization, the median value increased slightly (median of −1.73E−18) for 46 features, including Haralick textures (energy and contrast) and Gabor edge features, with energy of Gabor (135°, 1.414) having the greatest difference of 2.11. After histogram standardization, 12 features including homogeneity, GLN, HGLRE, SRHGLE, LRHGLE, LZHGLE, complexity, HGCE, DCV, mean Sobel edge, entropy of Gabor (90°, 1.414), and energy of Gabor (135°, 1.414) had median values differing by > 1.0 between the 1.5T magnet and 3T magnet.

Using a second target scan (Target II) for MRI harmonization resulted in a slight increase in the number of robust features (n = 82). The CCC between the harmonized features computed with the two different target MRI scans was high, with 84/91 features having a CCC ≥ 0.75, indicating that there was high reproducibility of radiomic features regardless of scanner differences. Unreliable features included the mean MR signal intensity (CCC = 0.041 [0.031 to 0.052]), standard deviation of MR signal intensity (CCC = 0.071 [0.054 to 0.089]), peak (CCC = 0.046 [0.034 to 0.058]), valley (CCC = 0.034 [0.025 to 0.043]), compactness (CCC = 0.537 [0.462 to 0.603]), mean of Sobel edge (CCC = 0.058 [0.044 to 0.073]), and standard deviation of Sobel edge features (CCC = 0.062 [0.0465 to 0.0767]).

Standardization to Target II resulted in similar non-robust features with respect to magnet strength as those found with standardization to Target I and consisted of MR mean signal intensity; mean values of all Gabor features; the Haralick textures contrast and entropy; valley; and skewness and kurtosis of Sobel edges. Supplemental Tables S5 and S6 shows the CCC values of radiomic features computed with respect to features extracted following standardization with Target I and Target II, as well as the differences in radiomic features extracted using Target II with respect to magnet strength.

Reproducibility of radiomics features

Seventy-six out of 91 radiomic features or 83.5% features were reproducible. CCC values of all the evaluated radiomic features with lower and upper bounds are shown in Supplemental Table S7. One GLCM feature (energy CCC = 0.503 [0.228 to 0.704], three NGTDM features (coarseness CCC = 0.494 [0.281 to 0.630], contrast CCC = 0.724 [0.536 to 0.844], and strength = 0.579 [0.391 to 0.721]), one NGLDM feature (LDLGE CCC = 0.547 [0.290 to 0.730]), skewness and kurtosis of Sobel edge features, as well as energy, contrast, and entropy of Gabor features had low reproducibility. All first order features, run length-based GLRLM, size-zone matrix based GLSZM, and shape features were reproducible. In both models A and B, all features with a high variable importance exceeding 75 were reproducible. Figure S1 shows the CCC of all the 91 radiomics features.

DISCUSSION

In this study, combining MRI-based radiomic classification with radiologist qualitative assessment improved the reliability of the prediction of which patients with LARC will be diagnosed with pCR after neoadjuvant treatment. The predictive performance of the radiomic texture classifiers, which were developed based on post-treatment rectal MRI, showed good performance on the single-institution external validation set. Importantly, we found that these radiomic classifiers also produced similar accuracy (sensitivity and specificity) to radiologists in the prediction of pCR vs. pPR. And further, the combination of radiologist assessment and radiomic models increased the reliability of the classifier, with concordance observed in 96% of the cases. Finally, in a subset of cases that were independently segmented by two radiologists, all features showed high reproducibility.

Previous studies comparing radiomics with subjective assessment have been performed (10, 39), with radiomics frequently showing better predictive performance than subjective assessment alone. Our study emphasizes the synergy between radiologists (providing qualitative evaluation of imaging) and radiomics (providing quantitative data of imaging) as complementary tools in predicting pCR instead of competing with one another, since we believe that each presents a unique value in imaging assessment. Additionally, although qualitative assessment of MRI post-neoadjuvant treatment is challenging, it improves with moderate experience, which will further improve pCR prediction (40). Indeed, our results showed that the senior reader’s specificity and PPV decreased when the radiomics model was incorporated, suggesting that, with experience, radiologist performance increases; thus, radiomics may be more useful for readers with less experience whereas it is less useful in aiding experienced readers. It is known that the use of rectal MRI is increasing in both primary and restaging settings, and with the increased acceptance of non-surgical management, the importance of an accurate cCR diagnosis will became even greater. However, not all radiologists are exposed to a high volume of rectal MRIs. We believe that our results show a potential clinical impact particularly among radiologists with less experience in rectal MRI, giving them a similar level of experience as radiologists from high-volume centers. Consequently, by providing a more reliable MRI interpretation, the multidisciplinary team involved in the management of patients with rectal cancer will be more confident regarding patient selection for non-surgical management (watch-and-wait approach), increasing sphincter preservation, avoiding for example definite stoma and sexual dysfunction, and ultimately improving patient quality of life.

In our study, the MRI radiomic classifier using a larger number of texture features produced similar accuracy to the MRI radiomic classifier using only Haralick texture and Gabor edge features, indicating that Haralick textures and Gabor features are potentially sufficient for discriminating between pPR and pCR. Using a smaller number of features will also eliminate potential issues with overfitting when using only cross-validation training without an independent testing set. Importantly, we found that homogeneity of Gabor edge features were associated with pCR in both the texture-based model (n = 33 features) as well as the model trained with a larger number of radiomic features (n = 91 features), indicating that these features are reliable measures of tumor heterogeneity. In the literature, entropy and kurtosis on T2WI restaging MRI have been most reported to be significant texture features to predict pCR; however, there are conflicting results. Homogeneity of Gabor 135° has not yet been described as a relevant feature. In our study, homogeneity of Gabor (135°, 1.414) was one of the most relevant features to predict pCR; patients with pCR had significantly higher values of this feature than patients with pPR (27.9 vs 14.9, P = 0.004, for both models A and B). The biological indication of this result is that patients with pCR have lower intratumoral heterogeneity compared to those with pPR. Our results are in line with previous studies which demonstrated worse outcomes in patients with higher intratumoral heterogeneity, including poorer treatment response (41, 42).

Few studies have performed external validation of radiomics models to predict pCR. In Bulens et al’s study (19), two radiomic models showed an AUC of 83% and 86%, respectively, in the validation set; however, these models did not outperform a previous four-feature semantic model that included percentage change in tumor volume, sphere diameter post-neoadjuvant treatment, average ADC value post-neoadjuvant treatment, and ratio of average ADC pre- and post-neoadjuvant treatment. Our MRI radiomic classifiers (models A and B) were extracted using features computed from post-neoadjuvant treatment MRI. Both these models showed good performance and improved the ability of both radiologists in predicting pCR.

The criteria of rCR used in our study were based on straightforward subjective criteria, and consequently this may improve the clinical acceptability of the model into the daily clinical routine. Dinapoli et al (20) developed a model based on pre-treatment rectal MRI texture features, achieving worse diagnostic accuracies than our study, i.e., AUCs of 0.73 (internal) and 0.75 (external). Griethuysen et al (21) also evaluated pre-neoadjuvant treatment MRI features, which had comparable performance to radiologists. In our study, we evaluated post-treatment MRI which shows the treatment response within the tumor bed; this could explain our higher accuracies in predicting pCR when compared with pre-treatment rectal MRI.

Our study also showed that MRI harmonization based on histogram standardization did not impact the variability of the features between the two institutions. Histogram standardization increased the variability of some features such as the Gabor edges and GLSZM, consistent with a prior study evaluating the impact of pre-processing using MRI histogram standardization on histogram standardization, albeit for brain gliomas (18). This is not surprising because histogram standardization only tries to align the overall MR signal intensities to a target MRI distribution, which can affect the extracted features in arbitrary ways. We also found that the target MRI image used for histogram standardization had minimal impact on the variability of extracted radiomic features, as demonstrated by the high CCC values of features extracted using different target MRI images. These results suggest that radiomic models are potentially applicable without requiring significant intensity homogenization. In addition, these results may also be because rectal MRI scans tend to be more standardized worldwide (43); we note that imaging protocols were similar between the two institutions in our study and more multi-institutional studies are warranted to assess the applicability of the model on other datasets. Furthermore, we also found that radiomic features extracted using independent segmentations produced by two different radiologists resulted in a reasonably high CCC of 83.5%, suggesting that the extracted radiomic features are reliable and have high potential for future multi-institutional studies.

A strength of this study is that we were able to build an externally validated radiomic model with a few texture features extracted from post-treatment MRI that improved radiologist qualitative assessment. Using only restaging post-treatment MRI features is especially helpful for patients without good quality MRI or available baseline MRI. Moreover, it decreases the operational difficulty of texture feature extraction, since only one scan timepoint will need to be extracted.

Our study has several limitations. First, it was retrospective. Second, the internal and external samples were both relatively small; however, they were adequate for us to perform internal and external validation, with a comparable number of outcomes (pCR) and similar imaging characteristics between both samples. Third, segmentations were performed manually, which was time-consuming and less user-friendly. Developing automatic segmentation tools is necessary to make radiomic analyses widely available in future and is one of our future goals. DWI quantitative data was not included in the radiomic model, considering that the sequence is liable to artifacts that can impact the generalizability of the results. It is also possible to take advantage of the developments in deep learning to utilize a larger number of features extracting low-level edges and mid-level textural characteristics to potentially improve the performance of radiomics classification, but this was not studied as it would further increase the number of features relative to the number of examples. Finally, lymph node response to treatment was beyond the scope of this paper. Assessment of lymph nodes response after neoadjuvant treatment is important in the management of patients with LARC and demands further interest. Lastly, although encouraging, our results on the external validation set showed a lower PPV and sensitivity than on the interval validation set. Further studies with a larger number of patients from multiple institutions, and studies of a prospective nature, are needed to overcome several of the aforementioned limitations and to improve the generalizability of our results.

CONCLUSIONS

We developed and externally validated a combined model using radiomics and radiologist qualitative assessment; the combined model improved inter-reader agreement and slightly increased the specificity, PPV and NPV of the junior radiologist in predicting pCR after neoadjuvant treatment in patients with LARC. Further, the features included in the radiomic model had high inter-reader agreement. However, although the results are promising, further multicenter and prospective studies are needed to improve the generalizability of our results.

Supplementary Material

Supplementary material

Acknowledgments:

The authors would like to express their deepest gratitude to Joanne Chin, MFA, ELS, for her editorial support on this manuscript and to Natalie Gangai, MPH, Ye Choi, BS, and Lee Rodriguez, MPH, for their data support.

Funding:

This study was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA008748.

Footnotes

Competing interests: The authors have no competing interests to declare that are relevant to the content of this article.

Ethics approval: This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Institutional Review Board at Memorial Sloan Kettering Cancer Center, United States, and at the University of Sao Paolo, Brazil.

Consent: Written informed consent was waived by the Institutional Review Board at Memorial Sloan Kettering Cancer Center, United States, and at the University of Sao Paolo, Brazil.

Data/code availability:

The datasets used and analyzed in this study are not publicly available due to patient privacy requirements but are available upon reasonable request from the corresponding author. All R code used in the analysis will be made available through the author’s GitHub repository.

REFERENCES

  • 1.Habr-Gama A, Perez RO, Nadalin W, Sabbaga J, Ribeiro U Jr., Silva e Sousa AH Jr., Campos FG, Kiss DR, Gama-Rodrigues J. Operative versus nonoperative treatment for stage 0 distal rectal cancer following chemoradiation therapy: long-term results. Ann Surg 2004;240(4):711–717; discussion 717–718. doi: 10.1097/01.sla.0000141194.27992.32 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Patel UB, Brown G, Rutten H, West N, Sebag-Montefiore D, Glynne-Jones R, Rullier E, Peeters M, Van Cutsem E, Ricci S, Van de Velde C, Kjell P, Quirke P. Comparison of magnetic resonance imaging and histopathological response to chemoradiotherapy in locally advanced rectal cancer. Ann Surg Oncol 2012;19(9):2842–2852. doi: 10.1245/s10434-012-2309-3 [DOI] [PubMed] [Google Scholar]
  • 3.Sclafani F, Brown G, Cunningham D, Wotherspoon A, Mendes LST, Balyasnikova S, Evans J, Peckitt C, Begum R, Tait D, Tabernero J, Glimelius B, Rosello S, Thomas J, Oates J, Chau I. Comparison between MRI and pathology in the assessment of tumour regression grade in rectal cancer. Br J Cancer 2017;117(10):1478–1485. doi: 10.1038/bjc.2017.320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Siddiqui MR, Gormly KL, Bhoday J, Balyansikova S, Battersby NJ, Chand M, Rao S, Tekkis P, Abulafi AM, Brown G. Interobserver agreement of radiologists assessing the response of rectal cancers to preoperative chemoradiation using the MRI tumour regression grading (mrTRG). Clin Radiol 2016;71(9):854–862. doi: 10.1016/j.crad.2016.05.005 [DOI] [PubMed] [Google Scholar]
  • 5.Nahas SC, Rizkallah Nahas CS, Sparapan Marques CF, Ribeiro U Jr., Cotti GC, Imperiale AR, Capareli FC, Chih Chen AT, Hoff PM, Cecconello I. Pathologic Complete Response in Rectal Cancer: Can We Detect It? Lessons Learned From a Proposed Randomized Trial of Watch-and-Wait Treatment of Rectal Cancer. Dis Colon Rectum 2016;59(4):255–263. doi: 10.1097/DCR.0000000000000558 [DOI] [PubMed] [Google Scholar]
  • 6.De Cecco CN, Ganeshan B, Ciolina M, Rengo M, Meinel FG, Musio D, De Felice F, Raffetto N, Tombolini V, Laghi A. Texture analysis as imaging biomarker of tumoral response to neoadjuvant chemoradiotherapy in rectal cancer patients studied with 3-T magnetic resonance. Invest Radiol 2015;50(4):239–245. doi: 10.1097/RLI.0000000000000116 [DOI] [PubMed] [Google Scholar]
  • 7.De Cecco CN, Ciolina M, Caruso D, Rengo M, Ganeshan B, Meinel FG, Musio D, De Felice F, Tombolini V, Laghi A. Performance of diffusion-weighted imaging, perfusion imaging, and texture analysis in predicting tumoral response to neoadjuvant chemoradiotherapy in rectal cancer patients studied with 3T MR: initial experience. Abdom Radiol (NY) 2016;41(9):1728–1735. doi: 10.1007/s00261-016-0733-8 [DOI] [PubMed] [Google Scholar]
  • 8.Meng Y, Zhang C, Zou S, Zhao X, Xu K, Zhang H, Zhou C. MRI texture analysis in predicting treatment response to neoadjuvant chemoradiotherapy in rectal cancer. Oncotarget 2018;9(15):11999–12008. doi: 10.18632/oncotarget.23813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Aker M, Ganeshan B, Afaq A, Wan S, Groves AM, Arulampalam T. Magnetic Resonance Texture Analysis in Identifying Complete Pathological Response to Neoadjuvant Treatment in Locally Advanced Rectal Cancer. Dis Colon Rectum 2019;62(2):163–170. doi: 10.1097/DCR.0000000000001224 [DOI] [PubMed] [Google Scholar]
  • 10.Horvat N, Veeraraghavan H, Khan M, Blazic I, Zheng J, Capanu M, Sala E, Garcia-Aguilar J, Gollub MJ, Petkovska I. MR Imaging of Rectal Cancer: Radiomics Analysis to Assess Treatment Response after Neoadjuvant Therapy. Radiology 2018;287(3):833–843. doi: 10.1148/radiol.2018172300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cusumano D, Dinapoli N, Boldrini L, Chiloiro G, Gatta R, Masciocchi C, Lenkowicz J, Casa C, Damiani A, Azario L, Van Soest J, Dekker A, Lambin P, De Spirito M, Valentini V. Fractal-based radiomic approach to predict complete pathological response after chemo-radiotherapy in rectal cancer. Radiol Med 2018;123(4):286–295. doi: 10.1007/s11547-017-0838-3 [DOI] [PubMed] [Google Scholar]
  • 12.Cui Y, Yang X, Shi Z, Yang Z, Du X, Zhao Z, Cheng X. Radiomics analysis of multiparametric MRI for prediction of pathological complete response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer. Eur Radiol 2019;29(3):1211–1220. doi: 10.1007/s00330-018-5683-9 [DOI] [PubMed] [Google Scholar]
  • 13.Bibault JE, Giraud P, Housset M, Durdux C, Taieb J, Berger A, Coriat R, Chaussade S, Dousset B, Nordlinger B, Burgun A. Deep Learning and Radiomics predict complete response after neo-adjuvant chemoradiation for locally advanced rectal cancer. Sci Rep 2018;8(1):12611. doi: 10.1038/s41598-018-30657-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Liu Z, Zhang XY, Shi YJ, Wang L, Zhu HT, Tang Z, Wang S, Li XT, Tian J, Sun YS. Radiomics Analysis for Evaluation of Pathological Complete Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer. Clin Cancer Res 2017;23(23):7253–7262. doi: 10.1158/1078-0432.CCR-17-1038 [DOI] [PubMed] [Google Scholar]
  • 15.Nie K, Shi L, Chen Q, Hu X, Jabbour SK, Yue N, Niu T, Sun X. Rectal Cancer: Assessment of Neoadjuvant Chemoradiation Outcome based on Radiomics of Multiparametric MRI. Clin Cancer Res 2016;22(21):5256–5264. doi: 10.1158/1078-0432.CCR-15-2997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Petkovska I, Tixier F, Ortiz EJ, Golia Pernicka JS, Paroder V, Bates DD, Horvat N, Fuqua J, Schilsky J, Gollub MJ, Garcia-Aguilar J, Veeraraghavan H. Clinical utility of radiomics at baseline rectal MRI to predict complete response of rectal cancer after chemoradiation therapy. Abdom Radiol (NY) 2020;45(11):3608–3617. doi: 10.1007/s00261-020-02502-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Di Re AM, Sun Y, Sundaresan P, Hau E, Toh JWT, Gee H, Or M, Haworth A. MRI radiomics in the prediction of therapeutic response to neoadjuvant therapy for locoregionally advanced rectal cancer: a systematic review. Expert Rev Anticancer Ther 2021:1–25. doi: 10.1080/14737140.2021.1860762 [DOI] [PubMed] [Google Scholar]
  • 18.Um H, Tixier F, Bermudez D, Deasy JO, Young RJ, Veeraraghavan H. Impact of image preprocessing on the scanner dependence of multi-parametric MRI radiomic features and covariate shift in multi-institutional glioblastoma datasets. Phys Med Biol 2019;64(16):165011. doi: 10.1088/1361-6560/ab2f44 [DOI] [PubMed] [Google Scholar]
  • 19.Bulens P, Couwenberg A, Intven M, Debucquoy A, Vandecaveye V, Van Cutsem E, D’Hoore A, Wolthuis A, Mukherjee P, Gevaert O, Haustermans K. Predicting the tumor response to chemoradiotherapy for rectal cancer: Model development and external validation using MRI radiomics. Radiother Oncol 2020;142:246–252. doi: 10.1016/j.radonc.2019.07.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Dinapoli N, Barbaro B, Gatta R, Chiloiro G, Casa C, Masciocchi C, Damiani A, Boldrini L, Gambacorta MA, Dezio M, Mattiucci GC, Balducci M, van Soest J, Dekker A, Lambin P, Fiorino C, Sini C, De Cobelli F, Di Muzio N, Gumina C, Passoni P, Manfredi R, Valentini V. Magnetic Resonance, Vendor-independent, Intensity Histogram Analysis Predicting Pathologic Complete Response After Radiochemotherapy of Rectal Cancer. Int J Radiat Oncol Biol Phys 2018;102(4):765–774. doi: 10.1016/j.ijrobp.2018.04.065 [DOI] [PubMed] [Google Scholar]
  • 21.van Griethuysen JJM, Lambregts DMJ, Trebeschi S, Lahaye MJ, Bakers FCH, Vliegen RFA, Beets GL, Aerts H, Beets-Tan RGH. Radiomics performs comparable to morphologic assessment by expert radiologists for prediction of response to neoadjuvant chemoradiotherapy on baseline staging MRI in rectal cancer. Abdom Radiol (NY) 2020;45(3):632–643. doi: 10.1007/s00261-019-02321-8 [DOI] [PubMed] [Google Scholar]
  • 22.Nyul LG, Udupa JK, Zhang X. New variants of a method of MRI scale standardization. IEEE Trans Med Imaging 2000;19(2):143–150. doi: 10.1109/42.836373 [DOI] [PubMed] [Google Scholar]
  • 23.Robitaille N, Mouiha A, Crepeault B, Valdivia F, Duchesne S, The Alzheimer’s Disease Neuroimaging I. Tissue-based MRI intensity standardization: application to multicentric datasets. Int J Biomed Imaging 2012;2012:347120. doi: 10.1155/2012/347120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Yoo TS, Ackerman MJ, Lorensen WE, Schroeder W, Chalana V, Aylward S, Metaxas D, Whitaker R. Engineering and algorithm design for an image processing Api: a technical report on ITK--the Insight Toolkit. Stud Health Technol Inform 2002;85:586–592. [PubMed] [Google Scholar]
  • 25.Zwanenburg A, Vallieres M, Abdalah MA, Aerts H, Andrearczyk V, Apte A, Ashrafinia S, Bakas S, Beukinga RJ, Boellaard R, Bogowicz M, Boldrini L, Buvat I, Cook GJR, Davatzikos C, Depeursinge A, Desseroit MC, Dinapoli N, Dinh CV, Echegaray S, El Naqa I, Fedorov AY, Gatta R, Gillies RJ, Goh V, Gotz M, Guckenberger M, Ha SM, Hatt M, Isensee F, Lambin P, Leger S, Leijenaar RTH, Lenkowicz J, Lippert F, Losnegard A, Maier-Hein KH, Morin O, Muller H, Napel S, Nioche C, Orlhac F, Pati S, Pfaehler EAG, Rahmim A, Rao AUK, Scherer J, Siddique MM, Sijtsema NM, Socarras Fernandez J, Spezi E, Steenbakkers R, Tanadini-Lang S, Thorwarth D, Troost EGC, Upadhaya T, Valentini V, van Dijk LV, van Griethuysen J, van Velden FHP, Whybra P, Richter C, Lock S. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology 2020;295(2):328–338. doi: 10.1148/radiol.2020191145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Breiman L Random forests. Mach Learn 2001;45(1):5–32. doi: Doi 10.1023/A:1010933404324 [DOI] [Google Scholar]
  • 27.Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 2002;16:321–357. doi: DOI 10.1613/jair.953 [DOI] [Google Scholar]
  • 28.Gitto S, Cuocolo R, Annovazzi A, Anelli V, Acquasanta M, Cincotta A, Albano D, Chianca V, Ferraresi V, Messina C, Zoccali C, Armiraglio E, Parafioriti A, Sciuto R, Luzzati A, Biagini R, Imbriaco M, Sconfienza LM. CT radiomics-based machine learning classification of atypical cartilaginous tumours and appendicular chondrosarcomas. EBioMedicine 2021;68:103407. doi: 10.1016/j.ebiom.2021.103407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hu J, Zhao Y, Li M, Liu J, Wang F, Weng Q, Wang X, Cao D. Machine learning-based radiomics analysis in predicting the meningioma grade using multiparametric MRI. Eur J Radiol 2020;131:109251. doi: 10.1016/j.ejrad.2020.109251 [DOI] [PubMed] [Google Scholar]
  • 30.Min X, Li M, Dong D, Feng Z, Zhang P, Ke Z, You H, Han F, Ma H, Tian J, Wang L. Multi-parametric MRI-based radiomics signature for discriminating between clinically significant and insignificant prostate cancer: Cross-validation of a machine learning method. Eur J Radiol 2019;115:16–21. doi: 10.1016/j.ejrad.2019.03.010 [DOI] [PubMed] [Google Scholar]
  • 31.Fehr D, Veeraraghavan H, Wibmer A, Gondo T, Matsumoto K, Vargas HA, Sala E, Hricak H, Deasy JO. Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images. Proc Natl Acad Sci U S A 2015;112(46):E6265–6273. doi: 10.1073/pnas.1505935112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kittler J, Hatef M, Duin RPW, Matas J. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998;20(3):226–239. doi: 10.1109/34.667881 [DOI] [Google Scholar]
  • 33.Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989;45(1):255–268. [PubMed] [Google Scholar]
  • 34.Kalpathy-Cramer J, Mamomov A, Zhao B, Lu L, Cherezov D, Napel S, Echegaray S, Rubin D, McNitt-Gray M, Lo P, Sieren JC, Uthoff J, Dilger SK, Driscoll B, Yeung I, Hadjiiski L, Cha K, Balagurunathan Y, Gillies R, Goldgof D. Radiomics of Lung Nodules: A Multi-Institutional Study of Robustness and Agreement of Quantitative Imaging Features. Tomography 2016;2(4):430–437. doi: 10.18383/j.tom.2016.00235 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhao B, Tan Y, Tsai WY, Qi J, Xie C, Lu L, Schwartz LH. Reproducibility of radiomics for deciphering tumor phenotype with imaging. Sci Rep 2016;6:23428. doi: 10.1038/srep23428 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Lakhman Y, Veeraraghavan H, Chaim J, Feier D, Goldman DA, Moskowitz CS, Nougaret S, Sosa RE, Vargas HA, Soslow RA, Abu-Rustum NR, Hricak H, Sala E. Differentiation of Uterine Leiomyosarcoma from Atypical Leiomyoma: Diagnostic Accuracy of Qualitative MR Imaging Features and Feasibility of Texture Analysis. Eur Radiol 2017;27(7):2903–2915. doi: 10.1007/s00330-016-4623-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tixier F, Um H, Bermudez D, Iyer A, Apte A, Graham MS, Nevel KS, Deasy JO, Young RJ, Veeraraghavan H. Preoperative MRI-radiomics features improve prediction of survival in glioblastoma patients over MGMT methylation status alone. Oncotarget 2019;10(6):660–672. doi: 10.18632/oncotarget.26578 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bhatia A, Birger M, Veeraraghavan H, Um H, Tixier F, McKenney AS, Cugliari M, Caviasco A, Bialczak A, Malani R, Flynn J, Zhang Z, Yang TJ, Santomasso BD, Shoushtari AN, Young RJ. MRI radiomic features are associated with survival in melanoma brain metastases treated with immune checkpoint inhibitors. Neuro Oncol 2019;21(12):1578–1586. doi: 10.1093/neuonc/noz141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Horvat N, Bates DDB, Petkovska I. Novel imaging techniques of rectal cancer: what do radiomics and radiogenomics have to offer? A literature review. Abdom Radiol (NY) 2019;44(11):3764–3774. doi: 10.1007/s00261-019-02042-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Maas M, Lambregts DM, Nelemans PJ, Heijnen LA, Martens MH, Leijtens JW, Sosef M, Hulsewe KW, Hoff C, Breukink SO, Stassen L, Beets-Tan RG, Beets GL. Assessment of Clinical Complete Response After Chemoradiation for Rectal Cancer with Digital Rectal Examination, Endoscopy, and MRI: Selection for Organ-Saving Treatment. Ann Surg Oncol 2015;22(12):3873–3880. doi: 10.1245/s10434-015-4687-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Greenbaum A, Martin DR, Bocklage T, Lee JH, Ness SA, Rajput A. Tumor Heterogeneity as a Predictor of Response to Neoadjuvant Chemotherapy in Locally Advanced Rectal Cancer. Clin Colorectal Cancer 2019;18(2):102–109. doi: 10.1016/j.clcc.2019.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Molinari C, Marisi G, Passardi A, Matteucci L, De Maio G, Ulivi P. Heterogeneity in Colorectal Cancer: A Challenge for Personalized Medicine? Int J Mol Sci 2018;19(12). doi: 10.3390/ijms19123733 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Horvat N, Carlos Tavares Rocha C, Clemente Oliveira B, Petkovska I, Gollub MJ. MRI of Rectal Cancer: Tumor Staging, Imaging Techniques, and Management. Radiographics 2019;39(2):367–387. doi: 10.1148/rg.2019180114 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

Data Availability Statement

The datasets used and analyzed in this study are not publicly available due to patient privacy requirements but are available upon reasonable request from the corresponding author. All R code used in the analysis will be made available through the author’s GitHub repository.

RESOURCES