Abstract
Objective:
We aimed to improve prediction of outcome for patients with colorectal liver metastases, via prognostic models incorporating PET-derived measures, including radiomic features that move beyond conventional standard uptake value (SUV) measures.
Patients and Methods:
A range of parameters including volumetric and heterogeneity measures were derived from FDG PET images of 52 patients with colorectal intrahepatic-only metastases (29 males and 23 females; mean age 62.9 years [SD 9.8; range 32–82]). The patients underwent PET/CT imaging as part of the clinical workup prior to final decision on treatment. Univariate and multivariate models were implemented, which included statistical considerations (to discourage false discovery and overfitting), to predict overall survival (OS), progression-free survival (PFS) and event-free survival (EFS). Kaplan-Meier survival analyses were performed, where the subjects were divided into high-risk and low-risk groups, from which the hazard ratios (HR) were computed via Cox proportional hazards regression.
Results:
Commonly-invoked SUV metrics performed relatively poorly for different prediction tasks (SUVmax HR=1.48, 0.83 and 1.16; SUVpeak HR=2.05, 1.93, and 1.64, for OS, PFS and EFS, respectively). By contrast, the number of liver metastases and metabolic tumor volume (MTV) each performed well (with respective HR values of 2.71, 2.61 and 2.42, and 2.62, 1.96 and 2.29, for OS, PFS and EFS). Total lesion glycolysis (TLG) also resulted in similar performance as MTV. Multivariate prognostic modeling incorporating different features (including those quantifying intra-tumor heterogeneity) resulted in further enhanced prediction. Specifically, HR values of 4.29, 4.02 and 3.20 (p-values=0.00004, 0.0019 and 0.0002) were obtained for OS, PFS and EFS, respectively.
Conclusions:
PET-derived measures beyond commonly invoked SUV parameters hold significant potential towards improved prediction of clinical outcome in patients with liver metastases, especially when utilizing multivariate models.
Keywords: PET/CT, colorectal liver metastasis, prognosis, volumetric features, intra-tumoral heterogeneity, radiomics
INTRODUCTION
Colorectal cancer is a common cancer worldwide, often burdened by liver metastases [1]. About 15% of patients have liver metastases at the time of diagnosis and an additional 15% developed liver metastases over time [2]; 5-year survival in patients with liver metastases was reported as low as 5% in untreated patients[2]. However, recent studies report a 5-year survival rate of about 40% following surgical resection of colorectal liver metastases [3]. Treatment options for colorectal liver metastases have expanded with new therapeutic modalities such as radiofrequency ablation, which imply a clinical need for improved prognostication to assist choice of therapy.
The emerging area of precision (or personalized) cancer medicine involves efforts towards the discovery and validation of biomarkers that move beyond diagnosis, to domains such as prognostication, disease progression tracking, and therapy response prediction and assessment. To this end, PET imaging provides valuable capabilities for non-invasive assessment and quantification of disease burden, and towards the development of effective imaging biomarkers of disease [4]. Overall, PET images present a wide array of information related to disease. However, in common clinical practice, only intensity-based standard-uptake-value (SUV) metrics are utilized, particularly SUVmax or SUVpeak. This is due to the simplicity in the computation of these metrics, not requiring accurate segmentation of the tumors. Specifically, SUVmax is computed as the maximum uptake in an area of interest, and SUVpeak is obtained by moving a 1-cm3 spherical region of interest over the area with increased tracer uptake (not necessarily conforming to the precise tumor outline) to maximize the enclosed average uptake [5, 6].
Quantitative volumetric tumor parameters, though less straightforward to compute, provide a notable frontier towards improved assessment of disease. In fact, there is increasing evidence that volumetric measures, particularly metabolic tumor volume (MTV) or total lesion glycolysis (TLG) can outperform their SUV counterparts, in a range of human solid tumors such as head & neck cancer, lung cancer, breast cancer, colorectal cancer and lymphoma [7–16]. Tumor volumetric parameters facilitate estimation of total tumor burden in a patient at the time of diagnosis or recurrence. Furthermore, segmentation of PET images enables generation of SUVmean, which is also sometimes reported in the literature.
In the present work, we have performed extensive comparisons, including univariate and multivariate analyses involving a range of quantitative measures of tumor uptake, to assess optimal methods for prediction of clinical outcome in patients with liver metastases from colorectal cancer. Our analyses includes the use of volumetric parameters, as well as other advanced radiomic features which quantify heterogeneity [17–21] as increasingly studied in the emerging field of radiomics. The ultimate aim is that enhanced predictive models would result in significant improvements in management of patients, including non-invasive selection of patients with poor prognosis who could benefit from earlier and more intensive treatment strategies. These high-risk patients could also be identified for participation in clinical trials in order to better power discovery of effective therapies.
PATIENTS AND METHODS
Subjects
We analyzed data from 52 patients with colorectal intrahepatic-only metastases (29 males and 23 females; mean age 62.9 years [SD 9.8; range 32–82]). The patients had FDG PET/CT scans obtained before treatment, in years 2005 to 2010 (with patient outcome follow-ups up to 2017). The scans were performed as part of the clinical workup prior to final decision on treatment, most often in patients considered for liver surgery, as PET/CT was not part of primary standard workup for all patients with liver metastases from colorectal cancer. Treatment for liver metastases following FDG PET/CT included surgical resection, stereotactic radiotherapy, chemotherapy, radiofrequency ablation, or a combination of these therapies. The treatment modalities were modeled in our analyses.
We performed analyses of overall survival (OS), progression-free survival (PFS) and event-free survival (EFS) for imaging biomarker derivation. Progression was defined as local recurrence in the liver, or new metastases in the liver or outside the liver. This could include new tumors in the intestine detected with ordinary control examinations: mainly, contrast-enhanced CT of the thorax, abdomen and pelvis, and in few cases MRI and ultrasound. Of the 52 patients, number of events for OS (death), PFS (progression) and EFS (progression or death) were 40, 25 and 44, respectively. The PET/CT scans were acquired on Siemens Biograph TruePoint scanners at the PET Centre of Aarhus University Hospital. Typical acquisitions started at 60min post-injection, from top of head to mid-thigh, and spanned 3min/bed. Reconstructions involved iterative 2D OSEM which was chosen for consistency amongst patients including those scanned in earlier years (see discussion section).
Data analysis
Segmentation:
Tumors were segmented based on the PET images, though the fused PET/CT images were used initially to ensure that the tumors were intrahepatic (and not metastases in lung or peritoneum). The identified tumors were segmented using: (i) 40% background-corrected SUVmax, (ii) 50% background-corrected SUVmax, (iii) SUV>2.5, or (iv) SUV>3.0 thresholding, all in 3D using the Hermes Hybrid Viewer PDR software (Hermes Medical Solutions, Sweden). Background correction was performed using a liver background ROI (~14 mL) placed on liver tissue with good distance to tumors, followed by contouring based on t=40% or 50% lower threshold, calculated as [SUVmax(tumor) – SUVmean(background)] × t + SUVmean(background) [22]. Histograms of PET counts were generated from the segmented tumors (in increments of ~0.02 SUV units used for creating discretized gray levels). This allowed moving beyond conventional PET-derived measures and to generate radiomic features quantifying heterogeneity (as elaborated next). In patients with multiple liver metastases (average of 1.8 tumors/person; 21 patients with multiple metastases), the histograms were combined, and subsequently analyzed.
Data features:
A total of 51 features were extracted from each patient. This included 41 image-derived radiomic features (as described in the next paragraph), and 10 features as follows: (1) age, (2) sex, and (3) post-imaging treatment information (described earlier, and modeled as input features in our analyses). We also incorporated pre-imaging treatment information, such as whether any therapy was delivered to the liver: (4) prior to PET (liver-therapy-prior), or (5) <3 months prior to PET (liver-therapy-3mon-prior), or whether chemotherapy itself was specifically performed (6) prior to PET (chemotherapy-prior), or (7) <3 months prior to PET (chemotherapy-3mon-prior). We also included (8) number of liver metastases observed in PET scan. Furthermore, we categorized patients based on (9) whether metastases were detected by the time of diagnosis (synchronous) vs. up to 12 months after diagnosis (early metachronous) vs. more than 12 months after diagnosis (late metachronous) [3]. We also explored another categorization for condition of existing metastases: (10) whether metastases were absent by the time of diagnosis (metachronous) vs. present at diagnosis, this latter itself consisting of two subsets: whether the specific tumors visualized by existing PET scan were present vs. absent at time of primary diagnosis.
We extracted 41 quantitative imaging features (radiomic features), which are elaborated in supplement A. To summarize, we included SUVmax, SUVpeak, SUVmean, MTV and TLG (thus n=5). We also computed a range of radiomic features that quantified PET-uptake heterogeneity. This included the recently introduced class of generalized effective total uptake (gETU) measures [23] which place varying degrees of emphasis on volumetric vs. uptake information (n=10), which are further discussed in the discussion section 4.2. It also included intensity histogram (n=19) and intensity-volume histogram (IVH) (n=7) measures [24, 25]. All metrics used in this work were standardized according to the framework of the image biomarker standardization initiative (IBSI) [26] for wider applicability of our results to other users and centers. In the results section, we report on the performance of SUVmax, SUVpeak, SUVmean, MTV and TLG, as well as any other metrics that were found to be significant in univariate or multivariate analyses.
Survival analysis:
Kaplan-Meier survival analysis was performed for OS, PFS and EFS, including both univariate and multivariate analyses. Prior to performing these analyses, feature selection was performed. Spearman correlations (r) amongst the 51 measures were computed, and those with r > 0.95 were considered relatively redundant with respect to one another (the results were nearly identical with the use of Pearson correlations). Subsequently, we reduced the original list to a narrow list. This was followed by application of (a) univariate and (b) multivariate survival analyses, which included statistical methods specific to each, as elaborated next, and as implemented in-house using MATLAB software.
a) Univariate analysis: The subjects were subdivided into two groups using the median threshold (p=50th percentile) for a given metric (e.g. MTV, etc.). Following this, the hazard ratios (HR) between the higher percentile group to the lower percentile groups were computed using Cox proportional hazards regression, and their associated 95% confidence intervals (CI) were also derived. For each metric, we also computed the p-values for curve separation (i.e. ability to reject the null hypothesis that HR=1). Correction for multiple testing of different metrics was performed using the false discovery rate (FDR) Benjamini–Hochberg (BH) step-up procedure.
b) Multivariate analysis: Cox proportional hazards regression was again performed. A prognostic score was then generated for each multivariate Cox model by summing the products of each feature in the model and its corresponding regression coefficient (β). The median value of the prognostic score was then chosen as cut-off for the given model, and patients were thus dichotomized into low- and high-risk groups, for which the log-likelihood (LOGL) of Cox regression was measured. Stepwise forward selection of parameters was performed. Specifically, we tried two initializations: a model with a single metric that outperformed others in univariate analysis, or with a single conventional metric that outperformed others (see discussion). Subsequently we would test the inclusion of every metric that was not in the model, adding to the model the one that most significantly increased LOGL as quantified above. This process was repeated as long as addition of a new metric increased LOGL statistically significantly. Statistical significance between two models was assessed using the Akaike Information Criterion (AIC) for model selection, which would require an increase in LOGL by >1 by addition of a new metric. This constraint was imposed on model selection in order to discourage overfitting. In the discussion section, we discuss the use of more stringent criteria.
RESULTS
An example of segmentation for a subject with liver metastasis is depicted in Figure 1. When using SUV metrics, the four segmentation methods performed relatively similarly, but when performing volumetric analysis, 40% and 50% background-corrected SUVmax thresholding resulted in relatively improved performance especially in PFS (elaborated in the discussion section). Rest of the paper describes results for 40% background-corrected SUVmax thresholding.
Of the original 51 metrics, 26 were retained following correlation analysis (listed in supplement B). We note that SUVmax and SUVmean were found to be highly correlated with SUVpeak (r=0.98, p-value <0.0001 for both). However, they were retained for further analysis and reporting (to allow comparison with prior literature).
Subsequently, univariate and multivariate Cox regression analyses were performed, and the respective results are summarized in Table 1 and Table 2, which indicate HR values, their associated 95% CI and p-values (i.e. of rejecting the null hypothesis that HR=1). In addition, the performances are visually depicted using Kaplan-Meier plots in Figures 2, 3 and 4 for OS, PFS and EFS, respectively.
Table 1.
OS | PFS | EFS | ||||
---|---|---|---|---|---|---|
Parameters* | HR (95% CI) | p-value | HR (95% CI) | p-value | HR (95% CI) | p-value |
Num. of liver mets | 2.71 (1.44–5.12) | 0.0021** | 2.61 (1.18–5.79) | 0.018 | 2.42 (1.32–4.42) | 0.0042 |
MTV | 2.62 (1.38–4.98) | 0.0034** | 1.96 (0.87–4.41) | 0.11 | 2.29 (1.23–4.24) | 0.0086 |
TLG | 2.62 (1.38–4.98) | 0.0034** | 1.96 (0.87–4.41) | 0.11 | 2.29 (1.23–4.24) | 0.0086 |
SUVpeak | 2.05 (1.09–3.86) | 0.027 | 1.93 (0.86–4.33) | 0.11 | 1.64 (0.90–2.99) | 0.10 |
SUVmean | 1.81 (0.96–3.41) | 0.068 | 0.82 (0.37–1.80) | 0.62 | 1.35 (0.74–2.44) | 0.33 |
SUVmax | 1.48 (0.79–2.77) | 0.22 | 0.83 (0.38–1.82) | 0.64 | 1.16 (0.64–2.09) | 0.63 |
Median thresholds for MTV, TLG, SUVpeak, SUVmean and SUVmax were 9.3mL, 58.3mL, 6.8, 5.3 and 7.8, respectively, arriving at 26 patients in each of the lower and higher risk groups. Number of liver mets was set to =1 vs. >1 arriving at 31 vs. 21 patients in the lower and higher risk groups.
p-values significant after correction for multiple testing (to control for FDR).
Table 2.
Parameters in the model | HR (95% CI) | p-value | |
---|---|---|---|
OS | Num. of liver mets Liver-therapy-3mon-prior AUC-IVH |
4.29 (2.15–8.57) | 0.00004 |
PFS | Num. of liver mets SUVmax |
4.02 (1.67–9.70) | 0.0019 |
EFS | Num. of liver mets MTV Histogram uniformity |
3.20 (1.73–5.94) | 0.0002 |
The univariate results for number of liver mets, MTV, TLG, SUVpeak, SUVmean and SUVmax are specifically summarized in Table 1 (presented in order of significance), and plotted in Figures 2–4. We found that the number of liver mets, MTV and TLG outperformed the other metrics for OS, PFS and EFS. Amongst these variables, survival discrimination (p-value) was only significant for MTV and number of liver mets in both cases of OS and EFS after correction for multiple testing, as indicated in Table 1. It is also notable to see that amongst the metrics listed in Table 1, SUVmax (most commonly reported PET metric) performs the most poorly, and that volumetric measures perform better.
The radiomic feature V10–90 had p-values <0.05 in PFS and EFS analyses (0.014 and 0.025, respectively). It is elaborated in supplement A; in short, it is an IVH based metric, quantifying the difference between fraction of volume of the segmented tumor with intensities at least 10% (V10) and 90% (V90) of maximum gray level (i.e. V10–90 = V10-V90). Also, whether any therapy was delivered to the liver <3 months prior to PET (liver-therapy-3mon-prior) had p-values of 0.021 for OS. However, performance of these features was not significant after correction for multiple testing.
Subsequently, we performed multivariate analysis, using stepwise Cox regression (forward selection) as elaborated in the methods section. The results are summarized in Table 2. In the case of OS, HR value of 4.29 was obtained (also depicted in Figure 2). The final multivariate model (arrived at according to the statistical methods described in the methods section) included three metrics, namely (i) number of liver mets, (ii) liver-therapy-3mon-prior, and (iii) AUC-IVH. The definitions of radiomic features are provided in supplement A; in short, AUC-IVH is the area under the IVH curve, also known as AUC-CSH [25], which quantifies tumoral heterogeneity. In the case of PFS, an HR value of 4.02 was obtained (also depicted in Figure 3) where the final model included (i) number of liver mets, and (ii) SUVmax. Finally, in the case of EFS, an HR value of 3.20 was obtained (also depicted in Figure 4), and the final multivariate model consisted of three metrics, namely (i) number of liver mets, (ii) MTV, and (iii) histogram uniformity (also known as histogram energy) which is computed as the sum of squares of occurrence probabilities of discretized histogram intensities (see supplement A). Note that it is possible for parameters not to be significant in univariate analysis but to become significant in multivariate analysis [27].
When building prognostic models excluding the use of volumetric or heterogeneity features (i.e. only using features 1–12 in supplement B), HR for OS dropped from 4.29 to 3.77 (with features: number of liver mets and liver-therapy-3mon-prior), while decreasing from 3.20 to 2.42 for EFS (with feature: number of liver mets). HR for PFS remained the same. When further excluding any imaging features (i.e. making no use of images), models with only a single metric were obtained, with HR values of 2.34 for OS (chemotherapy-3mon-prior), 2.08 for PFS (sex) and 1.82 for EFS (chemotherapy-3mon-prior).
Overall, it was seen that a simple imaging feature, namely the number of liver mets, performed strongly in univariate prediction of outcome, in contrast to SUV measures especially SUVmax, which did not perform well. Volumetric measures of MTV and TLG also depicted significant performance. Moreover, multivariate prognostic models incorporating radiomic features further improved prediction of outcome. Consequently, it was seen that volumetric and/or heterogeneity features that move beyond conventional SUV measures have the potential for significant prediction of outcome in patients with colorectal liver metastases.
DISCUSSION
Conventional measures vs. volumetric and heterogeneity parameters
In our univariate survival analyses of OS, PFS and EFS, SUV measures (max/mean/peak) did not perform as well as volumetric measures MTV or TLG (Figures 2–4). Furthermore, in multivariate analyses, only in the case of PFS, SUV added value in combination with number of liver mets.
In a study by de Geus-Oei et al. [28] of 152 colorectal metastatic patients (majority with involvement of the liver), only SUVmean was evaluated for OS. The resulting HR, though statistically significant, was only 1.17, while it was 1.81 in our study (Table 1). By contrast, SUVmax was evaluated for both OS and PFS by Dimitrova et al. [29] in a study of 43 patients with colon cancer and unresectable liver metastases. SUVmax was not able to predict PFS, though it predicted OS with HR value of 2.05 (while it was 1.48 in our study). In a study by De Bryne et al. [30] of 19 metastatic colorectal cancer patients with potentially resectable liver lesions, post-treatment SUVmax was only reported, and an HR value of 1.20 was obtained for prediction of PFS that was not statistically significant. The key finding in our analyses is that volumetric and heterogeneity metrics beyond SUV hold value for improved predictions of outcome.
Vriens et al. [31] evaluated 23 patients with colorectal liver metastases. The subjects underwent dynamic PET imaging, followed by measurement of glucose metabolic rates (MRglc) via Patlak graphical analysis. The authors demonstrated significant performances for OS (HR=3.61) and PFS (HR=3.11). It is unclear, and remains to be seen, how volumetric or heterogeneity features would perform in comparison if applied in the domain of dynamic PET imaging. The dynamic scans spanned a total of 50min from time of injection, and thus performance for routine static imaging (typically at 60min post-injection) was not reported in the study. Usage of dynamic scanning is expected to remain limited in the wide clinical setting which commonly employs whole-body imaging. An alternative paradigm worth exploring is to incorporate dynamic imaging within multi-bed/whole-body imaging [32, 33].
Gulec et al. [34] and Shady et al. [35] studied 20 and 49 patients, respectively, undergoing 90Y radioembolization of colorectal liver metastasis. Both studies reported ability of MTV and TLG measures to predict OS. In the former study, this was shown for pre- and post-treatment scans individually, while comparison with conventional SUV measures was not reported. In the latter study, by contrast, response measures (i.e. changes from pre- to post-treatment scans) were used, and it was additionally shown that response measures by SUVmax and SUVpeak were not predictive of OS, nor was CT-based Response Evaluation Criteria In Solid Tumors (RECIST) 1.0. By contrast, in a study by Lastoria et al. [36] of 33 colorectal cancer patients with resectable liver metastases, response assessments by both SUVmax and TLG were found to add value to RECIST and pathologic responses towards prediction of OS and PFS (in fact more so for SUVmax than TLG in the case of OS). There were, however, two key differences between the studies by Shady et al. and Lastoria et al.: (i) The former study involved radioembolization therapy while the latter involved chemo+anti-angiogenic therapy; (b) the cut-off threshold of response in the former study was set to 30% decrease while for the more conservative latter study, the cut-off was set to 50% decrease in values of PET-based metrics.
In a study by Tam et al. [37] of 70 patients with colorectal liver metastases undergoing different therapies, SUVmean, SUVmax, TLG and MTV were all considered. The measures were not found to be predictive of OS (unlike other studies), but were significant for PFS (HR=2.46, 2.76, 2.94 and 3.01 for SUVmean, SUVmax, TLG and MTV, respectively). For each measure, threshold optimization was performed using receiver operating characteristic (ROC) analysis. Nonetheless, such ‘optimum cut-off approach’ has the associated problem [38, 39] that, even though it amplifies performance in the evaluation set, the probability of false discovery (erroneously obtaining a statistically significant result) increases. This is the reason we did not pursue this approach for further optimization, and instead used median thresholds (values summarized in Table 1 footnote). In addition, our analyses included a range of heterogeneity features (as elaborated in the supplement). We also performed multivariate analysis in which all metrics were comprehensively considered and only those adding value to prediction were selected. By contrast, in the above work, Cox analyses were performed separately for the above-mentioned imaging metrics, and thus their complementary value (if any) could not be deduced. Finally, the present work utilizes methods to account for multiple testing and to discourage overfitting.
In a very recent study by van Helden et al. [40], the authors performed radiomics analysis on pre-treatment PET images of 99 patients with metastatic colorectal cancer undergoing palliative systematic treatment. They found higher volumetric measures (MTV and TLG), asphericity as well as tumor heterogeneity to be predictive of impaired benefit and survival (OS, PFS) following treatment. Though the analysis (primary tumors and metastases) was different from our work (intrahepatic-only tumors), some similar overall trends were observed in that volumetric and few other radiomic features depicted greater predictive value than SUV measures.
Overall, in the present effort we have shown that the number of liver mets, MTV and TLG (as observed or quantified in PET images) were powerful predictors of outcome. Furthermore, multivariate prognostic modeling incorporating radiomic features resulted in improved predictions of outcome. We also evaluated whether classification of metastases into synchronous vs. early metachronous vs. late metachronous improved prediction of outcome, as suggested elsewhere [3]. Furthermore, we assessed in the synchronous cases, whether there was value associated with our knowledge of whether the specific tumors in the analyzed PET images were originally present vs. absent at diagnosis. We did not find these categorizations to be predictive of outcome in OS, PFS or EFS analyses.
Generalized effective total uptake (gETU)
The recently introduced gETU metric [23] (as defined in supplement A and used in our analyses) enables generation of measures (via a free parameter a) that place varying emphases on PET uptake intensity vs. volumetric information, depending on the a value. As a → 0, gETU increasingly emphasizes the volumetric information, becoming equivalent to MTV when a<<1. For a=1, gETU is TLG, equally emphasizing volumetric and intensity information. For a>1, intensity is emphasized, such that for a>>1, gETU becomes equivalent to SUVmax, neglecting volumetric information altogether.
In Figure 5, we depict OS, PFS and EFS performance by varying parameter a in the gETU metric. It is seen that survival HR performance is especially improved in PFS and EFS when utilizing background-corrected SUVmax thresholding (40% or 50% thresholding methods perform the same for these metrics), outperforming absolute thresholding methods (SUV>2.5 and SUV>3.0). Furthermore, an overall trend is seen for thresholding methods that as one shifts towards volumetric information (lower a values), better performance is obtained in the analyses (OS, PFS, and EFS). This indicates the importance of utilizing volumetric information for prediction of outcome from baseline PET images of liver metastases, relative to relying purely on intensity information (e.g. SUVmax which is obtained on the far right). Finally, we saw in our multivariate analyses (results section) that MTV (which corresponds to gETU with a<<1) was retained in the OS and EFS models while SUVmax/peak/mean were not retained in any of the models. Overall, the implication of this plot is that PET-based metabolic tumor volume information is more important than pure uptake information for prediction of outcome in these patients. This is consistent with increasing evidence (as mentioned in the introduction) that, in a range of cancers, volumetric measures can outperform their SUV counterparts for assessment of disease.
Impact of different statistical criteria and different combinations of metrics
We used the AIC as criterion for multivariate model selection in stepwise Cox regression, accepting a model with an additional parameter if LOGL increased by >1. More conservative criteria may be considered to further discourage overfitting. This includes use of Wilks’ theorem [41] which states that 2(LOGL(model2)-LOGL(model1)) is approximately a chi-squared distribution with degree-of-freedom df=df(model2)-df(model1); setting p-value=0.05 for accepting a new model2 with an additional parameter (degree of freedom), the new LOGL must be higher by >1.92. This turns out to be nearly in par here with the Bayesian Information Criterion (BIC), requiring increase in LOGL (for n=52 patients) by >log(52)/2=1.98. It is also close to a required increase in LOGL by >2 as suggested for the effective parsimony information criterion (EPIC), corresponding to a likelihood ratio test at level 0.05 in the case of testing for the use of one additional parameter between two models [42]. When using the above-mentioned 1.92 threshold (instead of 1 based on AIC), only the first two metrics in Table 1 were retained in the case of OS (number of liver mets and liver-therapy-3mon-prior), with final multivariate HR=3.77. PFS prediction remained the same (HR=4.02). In the case of EFS, only the number of liver mets was retained with HR=2.42 similar to the univariate model.
In our multivariate approach, at each step we accept that metric into the model which increased LOGL the most (and passed the statistical criterion). Nonetheless, it is possible to try different combinations of metrics, to select the combination that at the end produces the largest LOGL. This, however, requires a very large search space and is beyond the scope of our work. We did try one variation: we initialized the multivariate models twice, once by the best performing univariate model, and once by a conventional metric that performed the best (i.e. excluding volumetric or heterogeneity features at first iteration, but then allowing them in subsequent model selection iterations). This was followed by addition of metrics at every iteration that most increased LOGL. Interestingly, the latter initialization resulted in improved performance in one instance (for PFS) which is the result we report in Table 2.
Considerations and limitations
Our analyzed PET studies were performed in years 2005–2010, all involving 2D-OSEM image reconstruction and 8-mm post-reconstruction Gaussian filtering. Even though more advanced reconstructions (3D-OSEM and PSF modeling [43]) became available in later years, for consistency we only included 2D-OSEM reconstructions which were the only options available for earlier studies. For comparison purposes, we note that images with improved spatial resolution could lead to distinct (probably smaller) VOI volumes than obtained in our work, and for such images, one might need to lower the thresholds to obtain similar VOIs as we do.
We utilized a derivation set for univariate and multivariate analysis. To conclusively establish the proposed models, a distinct validation set is also required. In fact, there is an important frontier, awaiting to be more thoroughly explored in radiomics research, of validating previously derived measures and models. At the same time, to address the issue with false discovery in the context of multiple testing [38], the Benjamini–Hochberg (BH) step-up procedure was utilized for statistical analysis. Furthermore, our multivariate analyses invoked statistical criteria for the acceptance of new metrics in order to discourage overfitting. Not using any such criteria resulted in a larger number of metrics accepted into the multivariate models, with the appearance of improved performance. Nonetheless, we reported the more moderate results that included statistical acceptance criteria.
Our studies group of patients is somewhat heterogeneous in terms of treatment, but we incorporated pre- and post-treatment information as individual features within our predictive modeling to account for this heterogeneity. One may argue that our real-life clinical data set, and the significant findings for it, can render the findings more applicable to a general population of patients with liver mets than results from a very select group. A more select group, at the same time, may result in more significant findings.
Finally, we note that in the present work, the radiomic features utilized were those that could be computed from histograms of segmented PET regions, consistent with existing readily available capabilities of imaging vendor platforms to produce histograms. An exception was the computation of SUVpeak which is available in routine practice. Future work and efforts include more sophisticated recording and analysis of segmented tumors, utilizing the various spatial uptake patterns available in the original images for the computation and analyses of larger sets of radiomic features [26, 44, 45]. It remains to be seen how effective the above-mentioned histogram-based features are in comparison to broader set of radiomic features.
CONCLUSION
The present work shows that conventional, commonly-employed SUV metrics (SUVmax, SUVpeak, SUVmean) perform relatively poorly in outcome prediction tasks (OS, PFS, EFS) when assessing colorectal liver metastases from FDG PET images. By contrast, use of the number of liver metastasis provided significant performance. This was also the case for volumetric MTV and TLG measures. Furthermore, use of multivariate prognostic modeling while including radiomic features further improved outcome prediction. Our overall finding is that volumetric features outperform SUV-based metrics in the task of clinical outcome prediction, and that prediction can be further enhanced via multivariate models that include volumetric and/or heterogeneity measures. This improved prediction of clinical outcome has the potential to be used for non-invasive selection of patients for individual treatment modality or participation in clinical trials of different treatment regimes.
Supplementary Material
ACKNOWLEDGEMENTS
This work was supported by the Danish Council for Independent Research (Medical Sciences, 4004-00022), the MSK Cancer Center Support Grant/Core Grant (P30 CA008748), and the National Natural Science Foundation of China under grants 61628105. We also acknowledge very helpful discussions with Dr. Ciprian Crainiceanu.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
IRB Statement:
Formal approval for access and analysis of patient data was obtained from the Danish Patient Safety Authority and the study was also approved by the Danish Data Protection Agency.
REFERENCES
- [1].Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, Parkin DM, Forman D, Bray F, Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012, International journal of cancer 136(5) (2015). [DOI] [PubMed] [Google Scholar]
- [2].Manfredi S, Lepage C, Hatem C, Coatmeur O, Faivre J, Bouvier A-M, Epidemiology and management of liver metastases from colorectal cancer, Annals of surgery 244(2) (2006) 254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Adam R, de Gramont A, Figueras J, Kokudo N, Kunstlinger F, Loyer E, Poston G, Rougier P, Rubbia-Brandt L, Sobrero A, Managing synchronous liver metastases from colorectal cancer: a multidisciplinary international consensus, Cancer treatment reviews 41(9) (2015) 729–741. [DOI] [PubMed] [Google Scholar]
- [4].Wahl RL, Principles and Practice of PET and PET/CT, 2nd ed., Lippincott Williams & Wilkins, Philadelphia, PA, 2008. [Google Scholar]
- [5].Wahl RL, Jacene H, Kasamon Y, Lodge MA, From RECIST to PERCIST: Evolving Considerations for PET Response Criteria in Solid Tumors, J Nucl Med 50(Suppl_1) (2009) 122S–150S. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Lodge MA, Chaudhry MA, Wahl RL, Noise considerations for PET quantification using maximum and peak standardized uptake value, J Nucl Med 53(7) (2012) 1041–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Dibble EH, Alvarez ACL, Truong M-T, Mercier G, Cook EF, Subramaniam RM, 18F-FDG metabolic tumor volume and total glycolytic activity of oral cavity and oropharyngeal squamous cell cancer: adding value to clinical staging, J Nucl Med 53(5) (2012) 709–715. [DOI] [PubMed] [Google Scholar]
- [8].Davison J, Mercier G, Russo G, Subramaniam RM, PET-Based Primary Tumor Volumetric Parameters and Survival of Patients With Non—Small Cell Lung Carcinoma, Am J Roentgenol 200(3) (2013) 635–640. [DOI] [PubMed] [Google Scholar]
- [9].Pak K, Cheon GJ, Nam HY, Kim SJ, Kang KW, Chung JK, Kim EE, Lee DS, Prognostic value of metabolic tumor volume and total lesion glycolysis in head and neck cancer: a systematic review and meta-analysis, J Nucl Med 55(6) 884–90. [DOI] [PubMed] [Google Scholar]
- [10].Ryu IS, Kim JS, Roh JL, Cho KJ, Choi SH, Nam SY, Kim SY, Prognostic significance of preoperative metabolic tumour volume and total lesion glycolysis measured by (18)F-FDG PET/CT in squamous cell carcinoma of the oral cavity, Eur J Nucl Med Mol Imaging 41(3) 452–61. [DOI] [PubMed] [Google Scholar]
- [11].Park GC, Kim JS, Roh JL, Choi SH, Nam SY, Kim SY, Prognostic value of metabolic tumor volume measured by 18F-FDG PET/CT in advanced-stage squamous cell carcinoma of the larynx and hypopharynx, Ann Oncol 24(1) 208–14. [DOI] [PubMed] [Google Scholar]
- [12].Lee SJ, Choi JY, Lee HJ, Baek CH, Son YI, Hyun SH, Moon SH, Kim BT, Prognostic value of volume-based (18)F-fluorodeoxyglucose PET/CT parameters in patients with clinically node-negative oral tongue squamous cell carcinoma, Korean J Radiol 13(6) 752–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Taghipour M, Wray R, Sheikhbahaei S, Wright JL, Subramaniam RM, FDG avidity and tumor burden: survival outcomes for patients with recurrent breast cancer, Am J Roentgenol 206(4) (2016) 846–855. [DOI] [PubMed] [Google Scholar]
- [14].Marcus C, Wray R, Taghipour M, Marashdeh W, Ahn SJ, Mena E, Subramaniam RM, JOURNAL CLUB: Value of Quantitative FDG PET/CT Volumetric Biomarkers in Recurrent Colorectal Cancer Patient Survival, Am J Roentgenol 207(2) (2016) 257–265. [DOI] [PubMed] [Google Scholar]
- [15].Rogasch JMM, Hundsdoerfer P, Hofheinz F, Wedel F, Schatka I, Amthauer H, Furth C, Pretherapeutic FDG-PET total metabolic tumor volume predicts response to induction therapy in pediatric Hodgkin’s lymphoma, BMC Cancer 18(1) (2018) 521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Albano D, Bertoli M, Battistotti M, Rodella C, Statuto M, Giubbini R, Bertagna F, Prognostic role of pretreatment 18F-FDG PET/CT in primary brain lymphoma, Annals of nuclear medicine 32(8) (2018) 532–541. [DOI] [PubMed] [Google Scholar]
- [17].Kumar V, Gu YH, Basu S, Berglund A, Eschrich SA, Schabath MB, Forster K, Aerts HJWL, Dekker A, Fenstermacher D, Goldgof DB, Hall LO, Lambin P, Balagurunathan Y, Gatenby RA, Gillies RJ, Radiomics: the process and the challenges, Magn Reson Imaging 30(9) (2012) 1234–1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Asselin MC, O’Connor JPB, Boellaard R, Thacker NA, Jackson A, Quantifying heterogeneity in human tumours using MRI and PET, Eur J Cancer 48(4) (2012) 447–455. [DOI] [PubMed] [Google Scholar]
- [19].Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RGPM, Granton P, Zegers CML, Gillies R, Boellard R, Dekker A, Aerts HJWL, Consortium Q-C, Radiomics: Extracting more information from medical images using advanced feature analysis, Eur J Cancer 48(4) (2012) 441–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Chicklore S, Goh V, Siddique M, Roy A, Marsden PK, Cook GJR, Quantifying tumour heterogeneity in F-18-FDG PET/CT imaging by texture analysis, Eur J Nucl Med Mol I 40(1) (2013) 133–140. [DOI] [PubMed] [Google Scholar]
- [21].Hatt M, Tixier F, Pierce L, Kinahan PE, Le Rest CC, Visvikis D, Characterization of PET/CT images using texture analysis: the past, the presenta… any future?, Eur J Nucl Med Mol I 44(1) (2017) 151–165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Boellaard R, O’Doherty MJ, Weber WA, Mottaghy FM, Lonsdale MN, Stroobants SG, Oyen WJ, Kotzerke J, Hoekstra OS, Pruim J, FDG PET and PET/CT: EANM procedure guidelines for tumour PET imaging: version 1.0, Eur J Nucl Med Mol I 37(1) (2010) 181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Rahmim A, Schmidtlein CR, Jackson A, Sheikhbahaei S, Marcus C, Ashrafinia S, Soltani M, Subramaniam RM, A novel metric for quantification of homogeneous and heterogeneous tumors in PET for enhanced clinical outcome prediction, Phys. Med. Biol 61 (2016) 227–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].El Naqa I, Grigsby PW, Apte A, Kidd E, Donnelly E, Khullar D, Chaudhari S, Yang D, Schmitt M, Laforest R, Thorstad WL, Deasy JO, Exploring feature-based approaches in PET images for predicting cancer treatment outcomes, Pattern Recognition 42(6) (2009) 1162–1171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].van Velden FHP, Cheebsumon P, Yaqub M, Smit EF, Hoekstra OS, Lammertsma AA, Boellaard R, Evaluation of a cumulative SUV-volume histogram method for parameterizing heterogeneous intratumoural FDG uptake in non-small cell lung cancer PET studies, Eur J Nucl Med Mol I 38(9) (2011) 1636–1647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Zwanenburg A, Leger S, Vallières M, Löck S, Image biomarker standardisation initiative - feature definitions, CoRR abs/1612.07003 (2016). [Google Scholar]
- [27].Lo S, Li I, Tsou T, See L, Non-significant in univariate but significant in multivariate analysis: a discussion with examples, Changgeng Yi Xue Za Zhi 18(2) (1995) 95–101. [PubMed] [Google Scholar]
- [28].de Geus-Oei LF, Wiering B, Krabbe PF, Ruers TJ, Punt CJ, Oyen WJ, FDG-PET for prediction of survival of patients with metastatic colorectal carcinoma, Annals of oncology 17(11) (2006) 1650–1655. [DOI] [PubMed] [Google Scholar]
- [29].Dimitrova EG, Chaushev BG, Conev NV, Kashlov JK, Zlatarov AK, Petrov DP, Popov HB, Stefanova NT, Klisarova AD, Bratoeva KZ, Role of the pretreatment 18F-fluorodeoxyglucose positron emission tomography maximal standardized uptake value in predicting outcomes of colon liver metastases and that value’s association with Beclin-1 expression, Bioscience trends 11(2) (2017) 221–228. [DOI] [PubMed] [Google Scholar]
- [30].De Bruyne S, Van Damme N, Smeets P, Ferdinande L, Ceelen W, Mertens J, Van de Wiele C, Troisi R, Libbrecht L, Laurent S, Value of DCE-MRI and FDG-PET/CT in the prediction of response to preoperative chemotherapy with bevacizumab for colorectal liver metastases, British journal of cancer 106(12) (2012) 1926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Vriens D, Van Laarhoven HW, van Asten JJ, Krabbe PF, Visser EP, Heerschap A, Punt CJ, de Geus-Oei L-F, Oyen WJ, Chemotherapy Response Monitoring of Colorectal Liver Metastases by Dynamic Gd-DTPA–Enhanced MRI Perfusion Parameters and 18F-FDG PET Metabolic Rate, J Nucl Med 50(11) (2009) 1777–1784. [DOI] [PubMed] [Google Scholar]
- [32].Kotasidis FA, Tsoumpas C, Rahmim A, Advanced kinetic modelling strategies: towards adoption in clinical PET imaging, Clinical and Translational Imaging 2(3) (2014) 219–237. [Google Scholar]
- [33].Karakatsanis NA, Lodge MA, Tahari AK, Zhou Y, Wahl RL, Rahmim A, Dynamic whole body PET parametric imaging: I. Concept, acquisition protocol optimization and clinical application, Phys. Med. Bio 58(20) (2013) 7391–7418 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Gulec SA, Suthar RR, Barot TC, Pennington K, The prognostic value of functional tumor volume and total lesion glycolysis in patients with colorectal cancer liver metastases undergoing 90 Y selective internal radiation therapy plus chemotherapy, Eur J Nucl Med Mol I 38(7) (2011) 1289–1295. [DOI] [PubMed] [Google Scholar]
- [35].Shady W, Kishore S, Gavane S, Do RK, Osborne JR, Ulaner GA, Gonen M, Ziv E, Boas FE, Sofocleous CT, Metabolic tumor volume and total lesion glycolysis on FDG-PET/CT can predict overall survival after 90Y radioembolization of colorectal liver metastases: A comparison with SUVmax, SUVpeak, and RECIST 1.0, European journal of radiology 85(6) (2016) 1224–1231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Lastoria S, Piccirillo MC, Caracò C, Nasti G, Aloj L, Arrichiello C, Di Castelguidone EDL, Tatangelo F, Ottaiano A, Iaffaioli RV, Early PET/CT scan is more effective than RECIST in predicting outcome of patients with liver metastases from colorectal cancer treated with preoperative chemotherapy plus bevacizumab, J Nucl Med 54(12) (2013) 2062–2069. [DOI] [PubMed] [Google Scholar]
- [37].Tam HH, Cook GJ, Chau I, Drake B, Zerizer I, Du Y, Cunningham D, Koh D-M, Chua SS, The role of routine clinical pretreatment 18F-FDG PET/CT in predicting outcome of colorectal liver metastasis, Clinical nuclear medicine 40(5) (2015) e259–e264. [DOI] [PubMed] [Google Scholar]
- [38].Chalkidou A, O’Doherty MJ, Marsden PK, False Discovery Rates in PET and CT Studies with Texture Features: A Systematic Review, Plos One 10(5) (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Hilsenbeck SG, Clark GM, McGuire WL, Why do so many prognostic factors fail to pan out?, Breast cancer research and treatment 22(3) (1992) 197–206. [DOI] [PubMed] [Google Scholar]
- [40].van Helden EJ, Vacher YJL, van Wieringen WN, van Velden FHP, Verheul HMW, Hoekstra OS, Boellaard R, Menke-van CW der Houven van Oordt, Radiomics analysis of pre-treatment [(18)F]FDG PET/CT for patients with metastatic colorectal cancer undergoing palliative systemic treatment, Eur J Nucl Med Mol Imaging 45(13) (2018) 2307–2317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Wilks SS, The large-sample distribution of the likelihood ratio for testing composite hypotheses, Ann Math Stat 9 (1938) 60–62. [Google Scholar]
- [42].Shinohara RT, Crainiceanu CM, Caffo BS, Reich DS, Longitudinal analysis of spatiotemporal processes: a case study of dynamic contrast-enhanced magnetic resonance imaging in multiple sclerosis, Johns Hopkins University, Dept. of Biostatistics Working Papers Working Paper 231 (2011). [Google Scholar]
- [43].Rahmim A, Qi J, Sossi V, Resolution modeling in PET imaging: theory, practice, benefits, and pitfalls, Med Phys 40(6) (2013) 064301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Leijenaar RTH, Carvalho S, Velazquez ER, Van Elmpt WJC, Parmar C, Hoekstra OS, Hoekstra CJ, Boellaard R, Dekker ALAJ, Gillies RJ, Aerts HJWL, Lambin P, Stability of FDG-PET Radiomics features: An integrated analysis of test-retest and inter-observer variability, Acta Oncol 52(7) (2013) 1391–1397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Lv W, Yuan Q, Wang Q, Ma J, Jiang J, Yang W, Feng Q, Chen W, Rahmim A, Lu L, Robustness versus disease differentiation when varying parameter settings in radiomics features: application to nasopharyngeal PET/CT, European radiology 8 (2018) 3245–3254. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.