Skip to main content
Springer logoLink to Springer
. 2023 Jul 14;33(12):8889–8898. doi: 10.1007/s00330-023-09920-6

Development and multicenter validation of a multiparametric imaging model to predict treatment response in rectal cancer

Niels W Schurink 1,2, Simon R van Kranen 3, Joost J M van Griethuysen 1,2, Sander Roberti 4, Petur Snaebjornsson 5, Frans C H Bakers 6, Shira H de Bie 7, Gerlof P T Bosma 8, Vincent C Cappendijk 9, Remy W F Geenen 10, Peter A Neijenhuis 11, Gerald M Peterson 12, Cornelis J Veeken 13, Roy F A Vliegen 14, Femke P Peters 3, Nino Bogveradze 1,2,15, Najim el Khababi 1,2, Max J Lahaye 1,2, Monique Maas 1,2, Geerard L Beets 2,16, Regina G H Beets-Tan 1,2,17, Doenja M J Lambregts 1,2,
PMCID: PMC10667134  PMID: 37452176

Abstract

Objectives

To develop and validate a multiparametric model to predict neoadjuvant treatment response in rectal cancer at baseline using a heterogeneous multicenter MRI dataset.

Methods

Baseline staging MRIs (T2W (T2-weighted)-MRI, diffusion-weighted imaging (DWI) / apparent diffusion coefficient (ADC)) of 509 patients (9 centres) treated with neoadjuvant chemoradiotherapy (CRT) were collected. Response was defined as (1) complete versus incomplete response, or (2) good (Mandard tumor regression grade (TRG) 1–2) versus poor response (TRG3-5). Prediction models were developed using combinations of the following variable groups:

(1) Non-imaging: age/sex/tumor-location/tumor-morphology/CRT-surgery interval

(2) Basic staging: cT-stage/cN-stage/mesorectal fascia involvement, derived from (2a) original staging reports, or (2b) expert re-evaluation

(3) Advanced staging: variables from 2b combined with cTN-substaging/invasion depth/extramural vascular invasion/tumor length

(4) Quantitative imaging: tumour volume + first-order histogram features (from T2W-MRI and DWI/ADC)

Models were developed with data from 6 centers (= 412) using logistic regression with the Least Absolute Shrinkage and Selector Operator (LASSO) feature selection, internally validated using repeated (n = 100) random hold-out validation, and externally validated using data from 3 centers (n = 97).

Results

After external validation, the best model (including non-imaging and advanced staging variables) achieved an area under the curve of 0.60 (95%CI=0.48–0.72) to predict complete response and 0.65 (95%CI=0.53–0.76) to predict a good response. Quantitative variables did not improve model performance. Basic staging variables consistently achieved lower performance compared to advanced staging variables.

Conclusions

Overall model performance was moderate. Best results were obtained using advanced staging variables, highlighting the importance of good-quality staging according to current guidelines. Quantitative imaging features had no added value (in this heterogeneous dataset).

Clinical relevance statement

Predicting tumour response at baseline could aid in tailoring neoadjuvant therapies for rectal cancer. This study shows that image-based prediction models are promising, though are negatively affected by variations in staging quality and MRI acquisition, urging the need for harmonization.

Key Points

  • This multicenter study combining clinical information and features derived from MRI rendered disappointing performance to predict response to neoadjuvant treatment in rectal cancer.

  • Best results were obtained with the combination of clinical baseline information and state-of-the-art image-based staging variables, highlighting the importance of good quality staging according to current guidelines and staging templates.

  • No added value was found for quantitative imaging features in this multicenter retrospective study. This is likely related to acquisition variations, which is a major problem for feature reproducibility and thus model generalizability.

Supplementary information

The online version contains supplementary material available at 10.1007/s00330-023-09920-6.

Keywords: Rectal neoplasms, Chemoradiotherapy, Magnetic resonance imaging

Introduction

Locally advanced rectal cancer (LARC) is typically treated with neoadjuvant chemoradiotherapy (CRT) followed by surgery [1]. In up to 15–27% of the cases a complete tumor remission is achieved as a result of CRT [2]. This has contributed to the recent paradigm shift in rectal cancer treatment towards organ preservation (e.g., “watch-and-wait” or local treatment of small tumor remnants) for selected patients with clinical evidence of a very good or complete tumour response after CRT. For these organ-preservation approaches, the morbidity and mortality risks associated with major surgery are avoided, with good reported clinical outcomes regarding quality of life and overall survival [3, 4]. Predicting the response to CRT and thus the chance of achieving organ preservation before the start of treatment, i.e., at baseline, may open up new possibilities to further personalize neoadjuvant treatment strategies depending on the anticipated treatment benefit, particularly for smaller tumors that do not necessarily require CRT for oncological reasons.

Recent studies have suggested a possible role for imaging in this setting [59]. Promising results have been reported for clinical staging variables (MRI-based TN-stage) [6, 7], tumor volume [1012], and functional parameters derived from diffusion-weighted imaging (DWI) [8, 9] or dynamic contrast-enhanced MRI (DCE) [13] to predict rectal tumor response on baseline MRI, and more recently also for more advanced quantitative variables derived using modern post-processing tools such as radiomics [5]. However, the available evidence mainly comes from single-center studies and comprehensive multicenter studies incorporating clinical, functional as well as advanced quantitative imaging data are scarce [14, 15]. Moreover, the effects of multicenter data variations and diagnostic staging differences between observers so far remain largely uninvestigated. Prediction studies on larger multicenter patient cohorts with imaging data acquired and analyzed as part of everyday clinical routine are therefore urgently needed to develop a more realistic view of the potential role of image-based treatment prediction models in general clinical practice.

In this retrospective multicenter study, we therefore set out to develop and validate a model to predict response to neoadjuvant treatment in rectal cancer using rectal MRIs acquired for baseline staging in 9 different centers in the Netherlands, intended to be a representative sample of rectal imaging performed in everyday clinical practice.

Materials and methods

Patients

As part of an institutional review board-approved multicenter study project, the clinical and imaging data of 670 LARC patients undergoing standard-of-care neoadjuvant chemoradiotherapy between February 2008 and March 2018 were retrospectively collected from 9 study centers (1 university hospital, 7 large teaching hospitals, and 1 comprehensive cancer center). Patients were identified based on the following inclusion criteria: (a) biopsy-proven rectal adenocarcinoma, (b) non-metastasized disease, (c) availability of a pre-treatment MRI (including at least T2-weighted (T2W) sequences in multiple planes and an axial DWI sequence) with corresponding radiological staging report (d) long-course neoadjuvant treatment consisting of radiotherapy (total dose 50.0–50.4 Gray) with concurrent capecitabine-based chemotherapy, (e) final treatment consisting of surgery or watch-and-wait with >2 years clinical follow-up to establish a reliable final response to CRT. From this initial cohort, 161 patients were excluded for reasons detailed in Fig. 1, leaving a total study population of n=509. Due to the retrospective nature of this study, informed consent was waived.

Fig. 1.

Fig. 1

In- and exclusion flowchart. Note, mucinous tumors were excluded because these are known to exhibit distinctly different signal characteristics on both T2W-MRI and DWI

Imaging and image pre-processing

MRIs were acquired according to routine practice in the participating centers with substantial variations in scan protocols and corresponding image quality between and within centers (Fig. 2); images were acquired using 25 different scanners (19 1.5T; 6 3.0T) and a total of 112 unique T2W and 94 unique DWI protocols. Further parameters are summarized in Supplementary Materials A. From the source DW images we calculated the Apparent Diffusion Coefficient (ADC) maps using all available b-values (varying from 2–7 b values per sequence; b values ranging between b0 and b2000) using a mono-exponential fit. ADC values <0 or >3 standard deviations from the tumor mean were marked as invalid. Since T2W pixel values are represented on an arbitrary scale, these images were normalized to mean=0 and standard deviation=100 [16]. All images were resampled to a common isotropic pixel spacing of 2 mm x 2 mm x 2 mm using a Linear interpolator for the DWI and ADC maps (linear was chosen to prevent out-of-range intensities which may occur due to overshoot with higher order interpolations) and a B-Spline interpolator for T2W images.

Fig. 2.

Fig. 2

Examples illustrating differences in image quality and acquisition for T2W-MRI (ad) and DWI (eh) between centers, related to for example field-of-view, tissue contrast (e.g., TR/TE settings), image resolution, and noise. For the DWI scans, the highest acquired b-values shown in these examples were b1000 (e), b600 (f), b800 (g), and b1000 (h)

Image evaluation

Baseline staging variables (cT-stage (cT1-2, cT3, cT4), cN-stage (cN0, cN1, cN2), and involvement of the mesorectal fascia (MRF)) were derived from the original staging reports that were performed by a multitude of readers. In addition, all MRIs were retrospectively re-evaluated for the purpose of this study by a dedicated radiologist (DMJL, with >10 years’ experience in reading rectal MRI) who staged all cases in line with the latest staging guidelines and reporting template from the European Society of Gastrointestinal and Abdominal Radiology [17]. For quantitative analysis, tumors were segmented using a 3D slicer (version-4.10.2). Segmentations were acquired semi-automatically using a level-tracing algorithm applied to the high b value DWI, which were then manually adjusted by an expert radiologist (DMJL, the same reader who also staged the cases) taking into account the anatomical information from the corresponding T2W-MRI. Care was taken to include only tumor tissue, excluding the rectal lumen and any non-tumoural perirectal tissues. Segmentations were then copied to the ADC-map and T2W-MRI, after which tumor volume and other quantitative features were extracted with PyRadiomics (version-3.0) using a bin-width of 5 (T2W-MRI) and 5x10-5 (ADC). This bin width was chosen such that the number of histogram bins was between 30 and 130 [16]. Quantitative features were limited to simpler volume, and first-order features as these have previously been reported to be most reproducible [1923] and least dependent on acquisition differences between centers [18].

Variable definitions

Five distinct variable categories were defined:

  1. Non-imaging variables; including age, sex, basic tumor descriptors from clinical examination and endoscopy (tumor location and basic tumor morphology, e.g., polyp/circular), and the time interval between neoadjuvant CRT and surgery.

  2. Basic image-based staging variables:
    • (2a) derived from the original reports, including cT-stage (cT1-2, cT3, cT4), cN-stage (cN0, cN1, cN2), and MRF involvement, that were routinely available from the original staging reports.
    • (2b) derived from expert re-evaluation, including the same descriptors from 2a, but now derived from the expert re-evaluations.
  3. Advanced image-based staging variables; including advanced staging descriptors (tumor length, cT-substage (cT1-2; cT3a,b,c,d; cT4a,b), depth of extramural invasion, and extramural vascular invasion (EMVI)) that were not routinely available from the original staging reports but derived from the expert re-evaluations.

  4. Quantitative imaging features; including tumor volume (extracted directly from the whole-tumor segmentations), and the following first-order features (derived from the pixel values within the tumor on both T2W-MRI and ADC): mean, median, minimum, maximum, variance, mean absolute deviation, range, robust mean absolute deviation, root mean squared, 10th percentile, 90th percentile, energy, entropy, interquartile range, kurtosis, skewness, total energy, and uniformity.

These five variable categories were combined into eight combinations of variable sets for the statistical analysis as detailed in Table 1.

Table 1.

Variable category definition and variable sets

Variable categories Features
1. Non-imaging Age, sex, time between CRT and surgery, tumor morphology (polyp, semicircular, or circular) and tumor height (distal-mid versus proximal-rectosigmoid)
2a. Basic imaging staging (original reports) cT-stage (cT12, cT3, cT4), cN-stage (cN0, cN1, cN2), involvement of the mesorectal fascia (MRF-, MRF+)
2b. Basic imaging staging (expert re-evaluation) cT-stage (cT12, cT3, cT4), cN-stage (cN0, cN1, cN2), involvement of the mesorectal fascia (MRF-, MRF+)
3. Advanced imaging staging (expert re-evaluation) All variables included in 2b (basic imaging staging—expert re-evaluation) + cT-substage (cT12, cT3abcd, cT4ab), extramural invasion depth, EMVI, tumor length
4. Quantitative imaging (derived from T2W-MRI and ADC) Tumor volume*, mean, median, minimum, maximum, variance, mean absolute deviation, range, robust mean absolute deviation, root mean squared, 10th percentile, 90th percentile, energy, entropy, interquartile range, kurtosis, skewness, total energy, uniformity
Variable sets

1. Non-imaging only

2. Non-imaging + basic imaging staging (original reports)

3. Non-imaging + basic imaging staging (expert re-evaluation)

4. Non-imaging + advanced imaging staging (expert re-evaluation)

5. Non-imaging + quantitative imaging

6. Non-imaging + basic imaging staging (original reports) + quantitative imaging

7. Non-imaging + basic imaging staging (expert re-evaluation) + quantitative imaging

8. Non-imaging + advanced imaging staging (expert re-evaluation) + quantitative imaging

* Tumor volume was derived directly from the whole-tumor segmentations

Response outcome

The final treatment response was defined in twofold [8, 24, 25]:

  • Complete response (CR) versus incomplete response: CR was defined as a pathological complete response after surgery (pCR; ypT0N0) or a sustained clinical complete response (cCR) without evidence of recurrence on repeated follow-up MRI and endoscopy for >2 years in patients undergoing watch-and-wait. Patients with ypT1-4 disease after surgery were classified as incomplete responses.

  • Good response (GR) versus poor response: GR included all patients with Mandard’s tumor regression grade (TRG) of 1–2 (total and subtotal regression); patients with TRG of 3–5 (moderate, limited and no regression) were classified as poor responders. Patients with a sustained cCR for >2 years were considered TRG1. If the pathology report did not explicitly mention a TRG score, the complete pathology reports were reviewed with a dedicated gastrointestinal pathologist (P.S. with >8 years of experience) to assign a TRG score retrospectively.

Statistical analysis

The 9 centers were divided into development including 6 centers (n=412) and (external) validation set including 3 centers (n=97). Differences between development and validation sets were assessed using Chi-squared tests for categorical (sex and response) and Kruskal-Wallis tests for continuous/ordinal variables (age, cTN-stage). The model development and validation process are summarized in Fig. 3. For the eight variable sets (see Table 1) the ability to predict the two respective response outcomes (complete vs incomplete response; good vs poor response) was assessed in the development cohort by calculating the average area under the receiver operator characteristic curve (AUC) after repeated (n=100) random hold-out validation. During each iteration, the development cohort was randomly split into a 70% training / 30% test dataset. All training variables were then scaled (mean=0, standard deviation=1), with the same scaling (i.e., using the mean and standard deviation derived from the training set) applied to the test set. When two or more features in a variable set were correlated (with Pearson’s ρ>0.8 in the training data), only the feature with the lowest mean absolute correlation was retained for further analysis. The remaining variables were used to train a logistic regression model with the Least Absolute Shrinkage and Selector Operator (LASSO) regularization [26]. The LASSO regularization parameter (λ) was tuned to select only the most relevant variables by minimizing the negative binomial log-likelihood loss using internal repeated (n=100) 10-fold cross-validation. Each model’s performance was measured on the test dataset, and the model achieving the best average test AUC was trained on the whole development cohort. As a final step, the performance of this best-performing model (N.B. one model for CR and one for GR) was tested on the external validation cohort. 95% confidence intervals for averaged AUCs in the development data were estimated through bootstrapping (200 samples). Confidence intervals for the validation cohort were obtained using DeLong’s method [27].

Fig. 3.

Fig. 3

Schematic overview of the study workflow and statistical analysis. From a total cohort of 509 patients from 9 centers, 412 patients (from 6 centers) were used to develop a prediction model to predict two respective outcomes (complete response, good response) using repeated hold-out validation. For both outcomes, the best-performing model was tested on an external and independent validation cohort consisting of 97 patients (from 3 different centers)

Supplementary Materials B describes two additional analyses: (1) testing the effects of 3 different previously described methods for multicenter data normalization (using a reference organ [28], statistical correction of imaging features using the ComBat algorithm [29], and statistical correction using mixed-effects models [30]), and (2) comparing model performance in the multicenter dataset to a single-center data subset from the cohort acquired with a harmonized MRI acquisition protocol. The latter was done to mimic the comparison of our results with a single-center study design.

Results

Patients

Baseline patient information is presented in Table 2; 332 (65%) patients were male; the median age was 65 years. For the outcome complete (versus incomplete) response, 141 patients (28%) were classified as complete responders. For the outcome good (versus poor) response, 225 patients (44%) were classified as good responders. The development and validation cohort showed no significant differences in sex, age, cT-stage, cN-stage, and tumor response (p=0.37–0.98).

Table 2.

Baseline patient characteristics and variations between centers

Total Development cohort Validation cohort p value
Total, n (%)

n = 509

(100%)

n = 412

(81%)

n = 97 (19%)
Sex, n (%) Female 177 (35%) 139 (34%) 38 (39%) 0.37*
Male 332 (65%) 273 (66%) 59 (61%)
Age, median (range) 65 (25-87) 66 (25-87) 65 (33-81) 0.37**
cT, n (%) 1-2 35 (7%) 30 (7%) 5 (5%) 0.57**
3 441 (81%) 334 (81%) 80 (83%)
4 60 (12%) 48 (12%) 12 (12%)
cN, n (%) 0 68 (13%) 52 (13%) 16 (17%) 0.98**
1 122 (24%) 103 (25%) 19 (20%)
2 319 (63%) 257 (62%) 62 (64%)
Complete response, n (%) CR 141(28%) 111 (27%) 30 (31%) 0.51*
Not-CR 368 (72%) 301 (73%) 67 (69%)
Good response, n (%) Good 225 (44%) 184 (45%) 41 (42%) 0.75*
Poor 284 (56%) 228 (55%) 56 (58%)

*Calculated using chi-squared test

** Calculated using the Kruskal-Wallis test

Model performance and predictive variables

Results for model development and performance are detailed in Table 3. The best-performing model included non-imaging and advanced imaging staging variables and achieved an average AUC of 0.60 (95%CI 0.48–0.72) to predict a complete response and an AUC of 0.65 (95% CI 0.53–0.76) to predict a good response in the external validation cohort, results very similar to those obtained during testing in the development cohort. The addition of quantitative imaging features did not improve predictive performance in any of the model combinations. Basic staging variables consistently achieved lower predictive performance compared to the advanced staging variables, especially (though 95% confidence intervals showed some overlap) when the basic staging variables were derived from the original reports. Based on the model coefficients, a more proximal tumor location, shorter tumor length, longer waiting interval after CRT, lower cT-substage and cN-stage, negative MRF, lower extramural invasion depth, and negative EMVI status were associated with a favorable response outcome (full model coefficients are provided in Supplementary Materials C).

Table 3.

Model performance

Variable groups and combinations Outcome
CR (95% CI) GR (95% CI)
Average AUC on the development cohort
 Non-imaging

0.58

(0.49–0.66)

0.53

(0.42–0.58)

 Non-imaging + basic imaging staging (original reports)

0.63

(0.55–0.70)

0.52

(0.39–0.54)

 Non-imaging + basic imaging staging (expert re-evaluation)

0.66

(0.58–0.70)

0.62

(0.56–0.68)

 Non-imaging + advanced imaging staging (expert re-evaluation)

0.69

(0.620.74)

0.67

(0.620.73)

 Non-imaging + quantitative imaging

0.59

(0.46–0.61)

0.58

(0.47–0.61)

 Non-imaging + basic imaging staging (original reports) + quantitative imaging

0.59

(0.44–0.60)

0.57

(0.44–0.59)

 Non-imaging + basic imaging staging (expert re-evaluation) + quantitative imaging

0.63

(0.51–0.68)

0.62

(0.53–0.68)

 Non-imaging + advanced imaging staging (expert re-evaluation) + quantitative imaging

0.68

(0.59–0.71)

0.67

(0.61–0.72)

Performance of best-performing model on the external validation cohort
 Non imaging + advanced imaging staging (expert re-evaluation)

0.60

(0.48–0.72)

0.65

(0.53–0.76)

Features selected in CR model: [Intercept], tumor height, weeks to surgery, cTsub-stage, cN-stage, invasion depth (mm), tumor length (mm)
Features selected in GR model: Tumor height, weeks to surgery, cTsub-stage, MRF-status, invasion depth (mm), EMVI status

The best-performing models are depicted in bold

95% CI, 95% confidence interval; CR, complete response (pCR and cCR); GR, good response (TRG1-2); NB, confidence intervals on the development cohort AUC are based on the non-studentized pivotal bootstrap method31 using 200 bootstrap samples. For the external validation cohort, DeLong’s method27 was used

The results of Supplementary Materials B show that none of the normalization methods applied to retrospectively harmonize the data led to improved predictive performance. When mimicking a single-center study design (i.e., when performing the same analysis on a single-center subset within our cohort with homogeneous imaging protocols), results were highly variable but showed a trend towards better single-center model performance for most variable subsets to predict a complete response. The best-performing single-center model (including non-imaging and advanced staging variables) achieved an AUC of 0.79, compared to an AUC of 0.69 in the total multicenter development cohort.

Discussion

This multicenter study shows that when combining clinical baseline variables with image-based staging quantitative variables, overall model performance to predict neoadjuvant treatment response in rectal cancer is disappointing, with externally validated AUCs ranging between 0.60 and 0.65 to predict either a complete response (ypT0) or a good response (TRG1-2). Best model performance was achieved when combining clinical baseline information (e.g., time to surgery) and image-based staging variables (e.g., cT-stage). Quantitative imaging features had no added value. Notably, model performance was considerably better when including modern staging parameters such as cT-substage, extramural invasion depth, and EMVI, compared to more traditional staging including only simplified cTN-stage and MRF involvement. Moreover, model performance seemed to be affected by staging variations between observers with better performance when staging was performed by a dedicated expert compared to the original staging reports acquired by a multitude of readers.

Previous studies typically included staging variables such as cTN-stage as part of the “baseline patient variables,” which implies that these are “objective” variables with little variation between readers [31, 32]. While measurement variations are commonly considered when analyzing quantitative imaging data, our results demonstrate that interobserver variation is also an important issue to take into account for the more basic staging variables. The improved model performance when including also modern staging variables such as cT-substage and EMVI in the expert re-evaluations further highlights the importance of high-quality diagnostic staging using up-to-date guidelines. The clinical impact of ‘state-of-the-art’ staging was also demonstrated by Bogveradze et al, who showed in a retrospective analysis of 712 patients that compared to “traditional” staging methods, advanced staging according to recent guideline updates would have led to a change in risk classification (and therefore potentially in treatment stratification) up 18% of patients [33]. The fact that our cohort dates back as far as 2008 and covers a 10-year inclusion period explains why many of these advanced staging variables could not be derived from the original reports. The use of older data will likely also have impacted the quality of the images and thus the quantitative imaging features derived from the data. Following developments in acquisition guidelines and software and hardware updates, the image quality will have evolved over time. This is also reflected by the large number of different imaging protocols including 112 unique T2W and 94 unique DWI protocols. The question, therefore, remains if and how model performance would have improved using only state-of-the-art and/or more harmonized (prospectively acquired) MRI data. In our current dataset, quantitative imaging features showed no added benefit to predict response. This contradicts previous single-center and smaller bi- and tri-institutional studies that achieved more encouraging AUCs ranging from 0.63 to as high as 0.97 [5, 14, 15]. These previous results are likely at least in part an overestimation of how such models would perform in everyday practice, as especially earlier pilot studies are hampered by limitations in methodological design (e.g., small patient cohorts, re-using of training data for testing, and multiple testing) as also outlined in several review papers reporting on the quality and/or reproducibility of image biomarker studies [5, 19, 3438]. The fact that most previous studies have been single-center reports will likely have also played an important role. Though reflective of data acquired in everyday practice, our results confirm the known difficulties of building generally applicable prediction models using heterogeneous retrospectively collected multicenter data. While some data variations are necessary to identify robust features to vendor and acquisition differences, too much variation will negatively impact model generalizability. Attempting to directly compare and investigate the effects of multicenter (heterogeneous) versus single-center (homogeneous) modelling using our own data, we mimicked a single-center comparison by repeating our study analyses on a homogeneous single-center subset within our cohort. Though results have to be interpreted with caution considering the wider confidence intervals and lack of external validation in the single-center arm, this comparison suggests that the best-performing model indeed appeared to be better for the homogeneous single-center subset (AUC 0.79) than for the multicenter (AUC 0.69) cohort. Though full data harmonization will likely never be achieved in daily clinical practice, these findings do support a need for further protocol guidelines and standardization to benefit future multicenter research.

There are some limitations to our study design. As mentioned above, data was acquired over the time span of a decade including scans acquired using outdated protocols dating back as far as 2008. A detailed analysis of the impact of these spectrum effects was outside the scope of this study, but a preliminary analysis (results not reported) showed that the impact of temporal changes was negligible. All segmentations were performed on high b-value DWI and then copied to T2W-MRI and ADC maps. Although care was taken to include anatomical information from T2W-MRI during segmentation, ideally a separate segmentation would have been performed. Finally, the comparison between the original basic staging reports and the advanced staging performed as part of this study was influenced by the fact that all re-evaluations were done by a single reader. In contrast, original staging reports were performed by a multitude of readers with varying levels of expertise. Due to the time-consuming nature of the expert re-evaluations (and segmentations), it was unfortunately not deemed feasible to include an independent extra reader.

In conclusion, this multicenter study combining clinical information and MRIs acquired as part of everyday clinical practice over the time span of a decade rendered disappointing performance to predict response to neoadjuvant treatment in rectal cancer. The best results were obtained when combining clinical baseline information with state-of-the-art image-based staging variables, highlighting the importance of good quality staging according to current guidelines and staging templates. No added value was found for quantitative imaging features in this multicenter retrospective study setting. This is likely at least in part the result of acquisition variations, which is a major problem for feature reproducibility and thus model generalizability. To benefit from quantitative imaging features—assuming a predictive potential—further optimization and harmonization of acquisition protocols will be essential to reduce feature variation across centers. For future research, it would also be interesting to see how model performance may improve when combining the information that can be derived from imaging with other clinical biomarkers such as molecular markers (e.g., DNA mutations, gene expression, microRNA) [39, 40], blood biomarkers (e.g., CEA, circulating tumor DNA) [39, 41], metabolomics (e.g., metabolites, hormones, and other signaling molecules) [42], organoids [43], and immune profiling [44].

Supplementary information

ESM 1 (44KB, docx)

(DOCX 44 kb)

Abbreviations

ADC

Apparent diffusion coefficient

AUC

Area under the receiver operator characteristic curve

cCR

Clinical complete response

CR

Complete response

CRT

Chemoradiotherapy

DCE

Dynamic contrast-enhanced

DWI

Diffusion-weighted imaging

EMVI

Extramural vascular invasion

GR

Good response

LARC

Locally advanced rectal cancer

LASSO

Least Absolute Shrinkage and Selector Operator

MRF

Mesorectal fascia

MRI

Magnetic resonance imaging

pCR

Pathological complete response

T2W

T2-weighted

TRG

Tumor regression grade

Funding

This study has received funding from the Dutch Cancer Society (project number 10138).

Declarations

Guarantor

The scientific guarantor of this publication is Doenja Lambregts, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

Conflict of interest

The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

Statistics and biometry

One of the authors (Sander Roberti) has significant statistical expertise.

Informed consent

Written informed consent was waived by the Institutional Review Board.

Ethical approval

Approval was obtained from the Institutional Review Boards of all participating centres.

Study subjects or cohorts overlap

Some study subjects or cohorts have been previously reported in the following:

Schurink NW, van Kranen S, Roberti S, et al (2022) Sources of variation in multicenter rectal MRI data and their effect on radiomics feature reproducibility. Eur Radiol. 32:1506-1516.

Bogveradze, N., El Khababi, N., Schurink, N. W., et al (2022). Evolutions in rectal cancer MRI staging and risk stratification in The Netherlands. Abdominal radiology, 47(1):38–47.

The aims of these publications are different than the current novel study.

The first study investigates the influence of acquisition variation on imaging features. The lessons from this study were used and applied in this study to develop response prediction models.

The second study describes how radiology staging and reporting have changed over the course of 10 years, which is the span of this cohort.

Methodology

• retrospective

• observational

• multicentre study

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Aklilu M, Eng C. The current landscape of locally advanced rectal cancer. Nat Rev Clin Oncol. 2011;8:649–659. doi: 10.1038/nrclinonc.2011.118. [DOI] [PubMed] [Google Scholar]
  • 2.Maas M, Nelemans PJ, Valentini V, et al. Long-term outcome in patients with a pathological complete response after chemoradiation for rectal cancer: a pooled analysis of individual patient data. Lancet Oncol. 2010;11:835–844. doi: 10.1016/S1470-2045(10)70172-8. [DOI] [PubMed] [Google Scholar]
  • 3.López-Campos F, Martín-Martín M, Fornell-Pérez R, et al. Watch and wait approach in rectal cancer: current controversies and future directions. World J Gastroenterol. 2020;26:4218–4239. doi: 10.3748/wjg.v26.i29.4218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.van der Valk MJM, Hilling DE, Bastiaannet E, et al. Long-term outcomes of clinical complete responders after neoadjuvant treatment for rectal cancer in the International Watch & Wait Database (IWWD): an international multicentre registry study. Lancet. 2018;391:2537–2545. doi: 10.1016/S0140-6736(18)31078-X. [DOI] [PubMed] [Google Scholar]
  • 5.Staal FCR, van der Reijd DJ, Taghavi M, et al. Radiomics for the prediction of treatment outcome and survival in patients with colorectal cancer: a systematic review. Clin Colorectal Cancer. 2021;20:52–71. doi: 10.1016/j.clcc.2020.11.001. [DOI] [PubMed] [Google Scholar]
  • 6.Huang Y, Lee D, Young C. Predictors for complete pathological response for stage II and III rectal cancer following neoadjuvant therapy - a systematic review and meta-analysis. Am J Surg. 2020;220:300–308. doi: 10.1016/j.amjsurg.2020.01.001. [DOI] [PubMed] [Google Scholar]
  • 7.Fischer J, Eglinton TW, Richards SJG, Frizelle FA. Predicting pathological response to chemoradiotherapy for rectal cancer: a systematic review. Expert Rev Anticancer Ther. 2021;21:489–500. doi: 10.1080/14737140.2021.1868992. [DOI] [PubMed] [Google Scholar]
  • 8.Schurink NW, Lambregts DMJ, Beets-Tan RGH. Diffusion-weighted imaging in rectal cancer: current applications and future perspectives. Br J Radiol. 2019;92:20180655. doi: 10.1259/bjr.20180655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Joye I, Deroose CM, Vandecaveye V, Haustermans K. The role of diffusion-weighted MRI and 18F-FDG PET/CT in the prediction of pathologic complete response after radiochemotherapy for rectal cancer: a systematic review. Radiother Oncol. 2014;113:158–165. doi: 10.1016/j.radonc.2014.11.026. [DOI] [PubMed] [Google Scholar]
  • 10.Curvo-Semedo L, Lambregts DMJ, Maas M, et al. Rectal cancer: assessment of complete response to preoperative combined radiation therapy with chemotherapy—conventional MR volumetry versus diffusion-weighted MR imaging. Radiology. 2011;260:734–743. doi: 10.1148/radiol.11102467. [DOI] [PubMed] [Google Scholar]
  • 11.Lambregts DMJ, Rao S-X, Sassen S, et al. MRI and diffusion-weighted MRI volumetry for identification of complete tumor responders after preoperative chemoradiotherapy in patients with rectal cancer. Ann Surg. 2015;262:1034–1039. doi: 10.1097/SLA.0000000000000909. [DOI] [PubMed] [Google Scholar]
  • 12.Ha HI, Kim AY, Yu CS, et al. Locally advanced rectal cancer: diffusion-weighted MR tumour volumetry and the apparent diffusion coefficient for evaluating complete remission after preoperative chemoradiation therapy. Eur Radiol. 2013;23:3345–3353. doi: 10.1007/s00330-013-2936-5. [DOI] [PubMed] [Google Scholar]
  • 13.Dijkhoff RAP, Beets-Tan RGH, Lambregts DMJ, et al. Value of DCE-MRI for staging and response evaluation in rectal cancer: a systematic review. Eur J Radiol. 2017;95:155–168. doi: 10.1016/j.ejrad.2017.08.009. [DOI] [PubMed] [Google Scholar]
  • 14.van Griethuysen JJM, Lambregts DMJ, Trebeschi S, et al. Radiomics performs comparable to morphologic assessment by expert radiologists for prediction of response to neoadjuvant chemoradiotherapy on baseline staging MRI in rectal cancer. Abdom Radiol (NY) 2020;45:632–643. doi: 10.1007/s00261-019-02321-8. [DOI] [PubMed] [Google Scholar]
  • 15.Antunes JT, Ofshteyn A, Bera K, et al. Radiomic features of primary rectal cancers on baseline T 2 -weighted MRI are associated with pathologic complete response to neoadjuvant chemoradiation: a multisite study. J Magn Reson Imaging. 2020;52:1531–1541. doi: 10.1002/jmri.27140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Van Griethuysen JJM, Fedorov A, Parmar C, et al. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 2017;77:e104–e107. doi: 10.1158/0008-5472.CAN-17-0339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Beets-Tan RGH, Lambregts DMJ, Maas M, et al. Magnetic resonance imaging for clinical management of rectal cancer: updated recommendations from the 2016 European Society of Gastrointestinal and Abdominal Radiology (ESGAR) consensus meeting. Eur Radiol. 2018;28:1465–1475. doi: 10.1007/s00330-017-5026-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Schurink NW, van Kranen S, Roberti S et al (2021) Sources of variation in multicenter rectal MRI data and their effect on radiomics feature reproducibility. Eur Radiol. 10.1007/s00330-021-08251-8 [DOI] [PMC free article] [PubMed]
  • 19.Traverso A, Wee L, Dekker A, Gillies R. Repeatability and reproducibility of radiomic features: a systematic review. Int J Radiat Oncol. 2018;102:1143–1158. doi: 10.1016/j.ijrobp.2018.05.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Traverso A, Kazmierski M, Shi Z, et al. Stability of radiomic features of apparent diffusion coefficient (ADC) maps for locally advanced rectal cancer in response to image pre-processing. Phys Med. 2019;61:44–51. doi: 10.1016/j.ejmp.2019.04.009. [DOI] [PubMed] [Google Scholar]
  • 21.Fiset S, Welch ML, Weiss J, et al. Repeatability and reproducibility of MRI-based radiomic features in cervical cancer. Radiother Oncol. 2019;135:107–114. doi: 10.1016/j.radonc.2019.03.001. [DOI] [PubMed] [Google Scholar]
  • 22.Yuan J, Xue C, Lo G, et al. Quantitative assessment of acquisition imaging parameters on MRI radiomics features: a prospective anthropomorphic phantom study using a 3D-T2W-TSE sequence for MR-guided-radiotherapy. Quant Imaging Med Surg. 2021;11:1870–1887. doi: 10.21037/qims-20-865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Mi H, Yuan M, Suo S, et al. Impact of different scanners and acquisition parameters on robustness of MR radiomics features based on women’s cervix. Sci Rep. 2020;10:20407. doi: 10.1038/s41598-020-76989-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Xie H, Sun T, Chen M, et al. Effectiveness of the apparent diffusion coefficient for predicting the response to chemoradiation therapy in locally advanced rectal cancer. Medicine (Baltimore) 2015;94:e517. doi: 10.1097/MD.0000000000000517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Maffione AM, Marzola MC, Capirci C, et al. Value of 18 F-FDG PET for predicting response to neoadjuvant therapy in rectal cancer: systematic review and meta-analysis. AJR Am J Roentgenol. 2015;204:1261–1268. doi: 10.2214/AJR.14.13210. [DOI] [PubMed] [Google Scholar]
  • 26.Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B. 1994;58:267–288. [Google Scholar]
  • 27.Delong ER, Carolina N. Comparing the areas under two or more correlated receiver operating characteristic curves : a nonparametric approach. Biometrics. 1988;44:837–845. doi: 10.2307/2531595. [DOI] [PubMed] [Google Scholar]
  • 28.Koc Z, Erbay G, Karadeli E. Internal comparison standard for abdominal diffusion-weighted imaging. Acta Radiol. 2017;58:1029–1036. doi: 10.1177/0284185116681040. [DOI] [PubMed] [Google Scholar]
  • 29.Orlhac F, Lecler A, Savatovski J, et al. How can we combat multicenter variability in MR radiomics? Validation of a correction procedure. Eur Radiol. 2021;31:2272–2280. doi: 10.1007/s00330-020-07284-9. [DOI] [PubMed] [Google Scholar]
  • 30.Kahan BC. Accounting for centre-effects in multicentre trials with a binary outcome – when, why, and how? BMC Med Res Methodol. 2014;14:20. doi: 10.1186/1471-2288-14-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Yang C, Jiang Z-K, Liu L-H, Zeng M-S. Pre-treatment ADC image-based random forest classifier for identifying resistant rectal adenocarcinoma to neoadjuvant chemoradiotherapy. Int J Color Dis. 2020;35:101–107. doi: 10.1007/s00384-019-03455-3. [DOI] [PubMed] [Google Scholar]
  • 32.Yi X, Pei Q, Zhang Y, et al. MRI-based radiomics predicts tumor response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer. Front Oncol. 2019;9:1–10. doi: 10.3389/fonc.2019.00552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bogveradze N, el Khababi N, Schurink NW et al (2021) Evolutions in rectal cancer MRI staging and risk stratification in The Netherlands. Abdom Radiol (NY). 10.1007/s00261-021-03281-8 [DOI] [PMC free article] [PubMed]
  • 34.Chalkidou A, O’Doherty MJ, Marsden PK. False discovery rates in PET and CT studies with texture features: a systematic review. PLoS One. 2015;10:1–18. doi: 10.1371/journal.pone.0124165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zwanenburg A. Radiomics in nuclear medicine: robustness, reproducibility, standardization, and how to avoid data analysis traps and replication crisis. Eur J Nucl Med Mol Imaging. 2019;46:2638–2655. doi: 10.1007/s00259-019-04391-8. [DOI] [PubMed] [Google Scholar]
  • 36.Sanduleanu S, Woodruff HC, de Jong EEC, et al. Tracking tumor biology with radiomics: a systematic review utilizing a radiomics quality score. Radiother Oncol. 2018;127:349–360. doi: 10.1016/j.radonc.2018.03.033. [DOI] [PubMed] [Google Scholar]
  • 37.Keenan KE, Delfino JG, Jordanova KV et al (2022) Challenges in ensuring the generalizability of image quantitation methods for MRI. Med Phys 49(4):2820–2835. 10.1002/mp.15195 [DOI] [PMC free article] [PubMed]
  • 38.Hagiwara A, Fujita S, Ohno Y, Aoki S. Variability and standardization of quantitative imaging. Invest Radiol. 2020;55:601–616. doi: 10.1097/RLI.0000000000000666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Dayde D, Tanaka I, Jain R, et al. Predictive and prognostic molecular biomarkers for response to neoadjuvant chemoradiation in rectal cancer. Int J Mol Sci. 2017;18:573. doi: 10.3390/ijms18030573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.El Sissy C, Kirilovsky A, Van den Eynde M, et al. A diagnostic biopsy-adapted immunoscore predicts response to neoadjuvant treatment and selects patients with rectal cancer eligible for a watch-and-wait strategy. Clin Cancer Res. 2020;26:5198–5207. doi: 10.1158/1078-0432.CCR-20-0337. [DOI] [PubMed] [Google Scholar]
  • 41.Massihnia D, Pizzutilo EG, Amatu A, et al. Liquid biopsy for rectal cancer: a systematic review. Cancer Treat Rev. 2019;79:101893. doi: 10.1016/j.ctrv.2019.101893. [DOI] [PubMed] [Google Scholar]
  • 42.Jia H, Shen X, Guan Y, et al. Predicting the pathological response to neoadjuvant chemoradiation using untargeted metabolomics in locally advanced rectal cancer. Radiother Oncol. 2018;128:548–556. doi: 10.1016/j.radonc.2018.06.022. [DOI] [PubMed] [Google Scholar]
  • 43.Flood M, Narasimhan V, Wilson K et al (2021) Organoids as a robust preclinical model for precision medicine in colorectal cancer: a systematic review. Ann Surg Oncol. 10.1245/s10434-021-10829-x [DOI] [PubMed]
  • 44.Yuan Z, Frazer M, Ahmed KA, et al. Modeling precision genomic-based radiation dose response in rectal cancer. Future Oncol. 2020;16:2411–2420. doi: 10.2217/fon-2020-0060. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ESM 1 (44KB, docx)

(DOCX 44 kb)


Articles from European Radiology are provided here courtesy of Springer

RESOURCES