Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 29.
Published in final edited form as: Phys Med. 2018 Feb 21;46:180–188. doi: 10.1016/j.ejmp.2017.10.009

Investigating multi-radiomic models for enhancing prediction power of cervical cancer treatment outcomes

Baderaldeen A Altazi a,b,c,*, Daniel C Fernandez a,b, Geoffrey G Zhang a,b, Samuel Hawkins a,b, Syeda M Naqvi a,b, Youngchul Kim a, Dylan Hunt a,b, Kujtim Latifi a,b, Matthew Biagioli d, Puja Venkat a, Eduardo G Moros a,b
PMCID: PMC7771366  NIHMSID: NIHMS1653327  PMID: 29475772

Abstract

Quantitative image features, also known as radiomic features, have shown potential for predicting treatment outcomes in several body sites. We quantitatively analyzed 18Fluorine–fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) uptake heterogeneity in the Metabolic Tumor Volume (MTV) of eighty cervical cancer patients to investigate the predictive performance of radiomic features for two treatment outcomes: the development of distant metastases (DM) and loco-regional recurrent disease (LRR). We aimed to fit the highest predictive features in multiple logistic regression models (MLRs). To generate such models, we applied backward feature selection method as part of Leave-One-Out Cross Validation (LOOCV) within a training set consisting of 70% of the original patient cohort. The trained MLRs were tested on an independent set consisted of 30% of the original cohort. We evaluated the performance of the final models using the Area under the Receiver Operator Characteristic Curve (AUC). Accordingly, six models demonstrated superior predictive performance for both outcomes (four for DM and two for LRR) when compared to both univariate-radiomic feature models and Standard Uptake Value (SUV) measurements. This demonstrated approach suggests that the ability of the preradiochemotherapy PET radiomics to stratify patient risk for DM and LRR could potentially guide management decisions such as adjuvant systemic therapy or radiation dose escalation.

Keywords: Positron emission tomography, Radiomics, Tumor uptake, Cervical cancer

1. Introduction

Based on the 2017 estimates of the American Cancer Society (ACS), cervical cancer is the third most commonly diagnosed gynecological malignancy in the United States, with an estimated incidence of 12,820 new cases and estimated deaths of 4,210 [1]. The widespread implementation of early detection screening with the Pap smear test and subsequent treatment of precancerous lesions played a vital role in decreasing the cervical cancer incidence rate by half between 1975 (14.8 per 100,000) and 2012 (6.7 per 100,000). Subsequently, mortality rate also declined by half between 1975 (5.6 per 100,000) in comparison to 2012 (2.3 per 100,000). However, in less developed countries where screening is less prevalent or nonexistent, the burden of cervical cancer is much greater. This leads to cervical cancer being the fourth most common cancer in women worldwide, with an estimated global incidence of 528,000 new cases and 266,000 deaths in 2012 with the vast majority of these cases in less developed countries [2].

The ability to predict treatment outcome for patients, especially those at high risk of responding poorly to standard therapies, is of great interest. Such ability could help clinicians modify their treatment plan, or modality, to improve patient’s response to treatment. At present, recent research in the expanding field of functional imaging has put a high emphasis on the investigation and development of quantitative noninvasive biomarkers given the increasing need of robust treatment outcome predictors. 18Fluorine–fluorodeoxyglucose (18F-FDG) PET imaging has been widely used in oncology as a functional imaging technique to define the gross tumor volume, assess its response, and for cancer staging. One of the most common semi-quantitative metrics for FDG-PET is the standardized uptake value (SUV) where the maximum SUV (SUVMAX) reportedly predicted for overall survival [3], treatment response [4], and lymph node involvement [5].

On the other hand, several studies [6,7] questioned utilizing SUVMAX as an independent prognostic metric due to several measurement uncertainties that might be attributable to its sensitivity to variation in tumor volume, initial FDG uptake kinetics and distribution, inter-observer variability, and in-vivo metabolism. Therefore, one of the goals of this study was to examine the accuracy of the prediction of treatment outcomes based on observed differences in the SUVMAX within the primary tumor volume. In addition to SUVMAX, several studies proposed SUVPEAK (defined as the maximum of all the mean values computed from placing a spherical kernel of approximately 1.2 cm in diameter to yield a ~1 cm3 sphere centered at each voxel within the tumor volume) as a potential robust alternative to SUVMAX due to its minimum variability over time and relative insensitivity to image noise [8,9]. The predictive performance of SUVPEAK was also investigated in this study.

The extraction of underlying information from medical images based on quantitatively derived features (Radiomics) is presented as a developing process in the field of medical oncological imaging [10,11]. The extracted radiomic features based on image textural and shape patterns have been used in tumor staging, the prediction for treatment outcome as well as in the process of classification and segmentation of tumor versus normal tissue. El Naqa et al. [12] reported several logistic regression models of radiomic features with good predictive power for treatment outcomes in cervical cancer patients treated with radiochemotherapy. However, the study suggested that further testing and validation using larger patient datasets were required. Tixier et al. [13] demonstrated that analysis of intratumor FDG uptake heterogeneity of baseline PET scans using radiomics differentiated, with higher sensitivity than SUV measurements, between esophageal cancer patients who showed partial- and no-response to chemoradiotherapy. Wei et al. [14] found a strong association between radiomic features extracted from baseline FDG PET and tumor staging in cervical cancer. This study focused on primary tumor volumes because of the limited resolution of PET images, which did not reproduce significant heterogeneity in the smaller lymph nodes. The preceding studies concluded that radiomic features have superior performance in comparison to SUV measurements regarding clinical outcome assessment and tumor heterogeneity description.

Accordingly, our motivation was to facilitate a comparison between our experience and previous studies in this field to analyze cervical cancer tumor heterogeneity on baseline FDG-PET scans retrospectively, and to investigate the ability of multi-radiomic regression models to predict for two major treatment outcomes, distant metastases (DM) and loco-regional recurrence (LRR).

2. Methods

2.1. Patient demographics

This retrospective study consisted of a cohort of eighty patients (Table 1) with an age range, at the time of diagnosis of 25–86 years (median: 50 years). All patients were diagnosed with cervical cancer and treated with definitive chemoradiotherapy between 2009 and 2015. Radiotherapy consisted of external beam radiation therapy (EBRT) to a dose range between 43.2 and 50.4 Gy (median = 45 Gy) and MRI-planned brachytherapy to a dose of 20–30 Gy (median =28 Gy). All patients received concurrent cisplatin chemotherapy. The patients’ disease was staged according to the classification of International Federation of Gynecology and Obstetrics (FIGO). The number of patients for FIGO stages IB, IIB, IIA, IIIA, IIIB, and IVA were 18, 33, 10, 1, 17, and 1, respectively. Most of the patients (89%) had tumor histology consistent with squamous cell carcinoma. The mean follow-up time at the start of the study was nineteen months. We aimed to examine the correlation between radiomic features and two treatment outcomes: the development of distant metastasis (DM) and loco-regional recurrence (LRR). In this work, both treatment outcomes are scalars conventionally assigned values 0 and 1. The event DM/LRR =1 presents the complication after treatment and DM/LRR =0 is the absence of that complication at the specific time point from the end of radiochemotherapy treatment. We set the start point of the follow-up time for each patient at the date of the initial pathological biopsy report, while the time to the clinical treatment outcome was reported based on the date of event occurrence. The institutional review board (IRB) at the University of South Florida, Tampa, FL., approved this study protocol.

Table 1.

Patient characteristics.

Characteristic All patients (n = 80)
Age at Diagnosis Average: 50; Range: [25, 86] yrs.
Stage
IB 18
IIB 33
IIA 10
IIIA 1
IIIB 17
IVA 1
Histology
Squamous 72
Adenocarcinoma 8

2.2. PET imaging procedure and technique

All of the baseline PET/CT scans were performed using the same Discovery STE® hybrid PET/CT scanner (General Electric Medical Systems, Milwaukee, WI, USA) and institutional radiopharmaceutical administration protocol. All patients had to be fasting for 6 hours before being injected with an average activity of 363 MBq of 18F-FDG. After the injection, a whole-body PET/CT scan in the supine position was acquired for cancer staging. Patient’s weight (average weight = 87 kg) and blood glucose levels were recorded.

The PET static emission images were acquired after an average of 60 minutes post injection with an image slice thickness of 3.27 mm, row spacing of 5.47 mm, column spacing of 5.47 mm. The PET images were reconstructed using 3D maximum likelihood ordered subset expectation maximization (ML-OSEM) with two iterations and 28 subsets. All images were corrected for attenuation. Consequently, we converted the image intensity values to SUV units.

2.3. Method of metabolic tumor volume segmentation

Tumor volumes were segmented for this study (Fig. 1.a) using Mirada Medical DBx® software (Mirada Medical DBx®, Oxford, UK). A board-certified radiation oncologist manually delineated Metabolic Tumor Volumes (MTV), which contained both the cervical tumor and local direct extension in the uterus, parametrium, vagina, or other adjacent organs based on FDG uptake findings on PET and guided by CT, MRI, clinical examination findings, and patient-specific histopathological reports.

Fig. 1.

Fig. 1.

(a) Example of a delineated metabolic tumor volume (MTV) on 18F-FDG PET scans (axial view), and (b) an illustration of the heterogeneity pattern analysis of the MTV using: (I) short and long runs of voxel intensities (GLRLM), (II) consecutive voxel directions (GLCM), and (III) size of voxel zones with the same intensity (GLSZM).

2.4. Radiomics analysis

We developed in-house software to process and quantify PET scans and to calculate the commonly implemented methods of feature extraction. In total, we extracted seventy-five radiomic features from each MTV (Fig. 1.a). Consequently, we divided the radiomic features based on their calculation method into six sets.

2.4.1. Intensity-Histogram Features (IHF) and Intensity-Volume Histogram (IVH) features

This set presents first-order features that describe global measurements of the tumor based on the information provided by the distribution of voxel intensities within the tumor volume. However, they do not provide information about voxel-to-voxel relation or dependency within the volume of interest [15]. Thus, intensity histogram features (IHFs) describe the range of voxel intensity values within the segmented tumor volume. We extracted ten IHFs including the conventional statistical measurements under this category (e.g., mean, maximum, minimum, standard deviation, skewness, kurtosis, intensity entropy, and uniformity of voxel intensity). We also extracted four features based on the cumulative intensity-volume histogram (IVH) as described by El Naqa et al. [12]. The four feature are volume at intensity fraction (Vx: percentage volume having at least x% intensity value), intensity at volume fraction (Ix: minimum intensity to x% highest intensity volume), volume at intensity fraction difference (e.g., V10–V90), and intensity at volume fraction difference (e.g., I10–I90).

2.4.2. Gray level co-occurrence matrix (GLCM) based features

GLCM was used to extract second-order local features based on the spatial relationships among the voxels’ gray levels [16]. The matrix was formed using Twenty-six-connected directions on neighboring voxels in 3D space. We extracted Twenty–six GLCM radiomic features, such as contrast, local entropy, difference entropy, local homogeneity, contrast, and correlation.

2.4.3. Gray level run-length matrix (GLRLM) based features

GLRLM was used to extract higher-order regional features based on the information contained in the run-length of a particular gray level and a particular direction, where the term run-length is equal to the number of voxels of the same gray level contained along that direction in 3D space [17]. Therefore, a coarse texture will be dominated by relatively long runs, whereas fine texture characterizes most of the short runs. We extracted eleven features based on GLRLM. Examples of such features are Short-Run Emphasis (SRE), Long-Run Emphasis (LRE), Gray-Level Non-uniformity (GLNU), and Run Percentage (RPC).

2.4.4. Gray level size zone matrix (GLSZM) based features

GLSZM is another method to compute higher-order regional features. The calculation method of this set shares many similarities with the calculation method of GLRLM features. However, instead of the matrix basis being on the run-length of the consecutive voxel with similar intensity, GLSZM is equal to the number of zones of a specific size and gray level. Hence, the resulting matrix has a fixed number of lines equal to the number of gray levels and a dynamic number of columns, which are determined by the size of the largest zone as well as the size quantization [18]. Therefore, the more homogeneous the texture, the wider and flatter the matrix. We extracted eleven GLSZM features. Examples of GLSZM features are Size-Zone Variability (SZV), Zone Percentage (ZP), and Large-area emphasis (LAE).

2.4.5. Neighborhood gray tone difference matrix (NGTDM) based features

These features were calculated according to the method initially proposed by Amadasm and King [19]. These features are thought to correlate with human visual impressions. The neighborhood of a pixel used was 7 ×7 pixels for both CT and PET. Note that the original NGTDM feature equations were defined only for square ROIs. However, the calculations were modified slightly to apply them to irregularly shaped, and multiple slice ROIs in 3D space. The five higher-order features were coarseness, contrast, complexity, busyness, and texture strength.

2.4.6. Shape-based features

This set of features describe geometrical and morphological characteristics of tumor volumes. We extracted eleven shape features (SF) such as tumor volume, surface area, surface to volume ratio, asphericity, and compactness [20]. In addition to the features mentioned above, we measured SUVMAX and SUVPEAK from each tumor volume to compare the predictive performances between radiomics and SUV measurements.

2.5. Statistical analysis

The following subsections describe the method we followed to analyze the radiomic features as well as training and testing the resulted predictive models. The steps of this process are summarized in Fig. 2.

Fig. 2.

Fig. 2.

Flowchart of the proposed approach to generate the final radiomic features multivariate logistic regression models. We extracted seventy-five radiomic features from each MTV (eighty patients) based on six feature calculation methods: Gray Level Co-occurrence Matrix (GLCM), Gray Level Run-length matrix (GLRLM), Gray level Size Zone matrix (GLSZM), Neighborhood Gray Tone Difference Matrix (NGTDM), Shape features (SFs) and intensity volume histogram (IVH) features. The preprocessing steps included rescaling the features to a common range [0, 1], investigating trends using Spearman’s Rho test, and univariate correlation with treatment outcomes using univariate logistic regression modeling. We applied backward feature selection method as part of Leave-One-Out Cross Validation (LOOCV) within a training set consisting of 70% of the original patient cohort. The trained MLRs were tested on an independent set consisted of 30% of the original cohort. Finally, we evaluated the performance of the final models using the Area under the Receiver Operator Characteristic Curve (AUC).

2.5.1. Preprocessing of radiomic features

Prior to the extraction of texture features, we resampled all the MTVs to an isotropic voxel size using cubic interpolation. Accordingly, we isotropically resampled the FDG-PET based volumes with voxel size 5.47 ×5.47 ×3.27 mm3 to a voxel size of 5 × 5× 5 mm3. We also tested the dependence of radiomic features on voxel size (volume) and gray level discretization (GL) according to the method described in Altazi et al. [21] and Shafiq et al. [22], where features were corrected for each parameter if necessary. Finally, the GL that scored highest among four gray levels (32, 64, 128, and 256) on the inter-item correlation coefficient (IIC) (Supplementary Fig. 1) [23,24] was chosen as a reference. For all the studied radiomic features, GL-64 showed the highest correlation with other GLs. Hence, it was selected as a reference GL. In addition, similar studies [13,25] reported the same value because it allowed for 0.25 SUV increments for a range of SUVs comparable to ours (~4–35).

2.5.2. Preliminary analysis

In this supervised study, we grouped patients according to treatment outcome (DM–patients and LRR–patients) where the association of radiomic features with each outcome was carried out separately.

We used the independent t–test to estimate the association level between radiomic features and treatment outcomes were the size of the effect was measured by Cohen’s d conventions where the effect range cut-offs are: small (0.2), medium (0.5) and large (0.8) [26]. We used non-parametric Spearman’s Rho test to detect any trend of radiomic feature values in correspondence to the development of DM and LRR. We rescaled all radiomic features extracted from each patient’s MTV to a range [0, 1] by using Eq. (1):

r=rmin(r)max(r)min(r) (1)

where r is the original value of any radiomic feature and r’ is the rescaled value.

We tested the unconditional association between each treatment outcome and radiomic features, without consideration of any confounders association between features, using a univariate logistic regression model with a p-value at the 0.05 level for the association to be considered significant.

2.5.3. Feature selection

Before proceeding to the modeling step explained below, we randomly split the positive and negative cases from our patient cohort (eighty patients) into a training set consisting of fifty-six patients (70%) and a test set consisting of twenty-four patients (30%).

To generate models consisting of highly predictive features, we applied the straightforward sensitivity analysis method of sequential backward feature selection (SBS) as part of Leave-One-Out Cross Validation (LOOCV) within the training set. Due to complete separation for models combining all features, the selection was made within each feature calculation method category. For the feature to remain in a model, we set the stepwise selection to an entry and removal alpha of 0.05 at each model iteration and the model re-run containing the remaining variables.

2.5.4. Logistic regression modeling

To improve the radiomic features’ predictive power for both outcomes (DM and LRR), we combined the selected features into multivariate models using multiple logistic regression (MLR).

In this approach, we first consider a collection of p independent variables of the ith patient denoted by the vector xi={xij:j=1,2,,p} for N number of patients. In our case, the variables are the selected PET features. Our interest is to generate an equation consist of a linear combination of p that takes the form:

g(x)=βo+j=1pβjxi,fori=1,2,,N, (2)

where β is the set of regression coefficients of the model to be determined via maximum likelihood estimation (MLE) based on the given treatment outcomes. In which case, we use a logit transformation to generate the logistic regression model of the form:

f(x)=eg(x)1+eg(x)=11+eg(x) (3)

2.5.5. Model evaluation and selection

To evaluate the performance of our models, we plotted the area under the receiver operator curve (AUC) to graphically represent the association between the extracted features and each of the treatment outcomes. The p-values for both the MLRs and their associated AUC are two-sided and considered statistically significant at the 0.05 level (p < .05). Accordingly, we removed any model with no statistical significance, scored AUC < 0.65, or had an AUC with a 95% confidence interval lower limit < 0.6. Finally, the trained models on the independent test set and followed the same approach mention above to select the final models.

We performed the statistical analysis using SPSS (Version 22; IBM Corporation; Armonk, New York, U.S.), and WEKA 3.6 (The University of Waikato, Machine Learning Group, Hamilton, New Zealand).

3. Results

The group of patients who developed distant metastasis DM (N = 18) was associated with an average tumor volume =105 cm3 (SD = ± 57.1 cm3). By comparison, the patients with no distant metastasis (N =62) was associated with a numerically smaller average tumor volume = 65 cm3 (SD = ± 50.9 cm3). To test the hypothesis that both groups were associated with statistically significantly different average tumor volume, we performed an independent samples t–test (Table 2 and 3) by comparing the mean consistency scores of patients diagnosed with either of the outcomes versus those with the absence of such outcomes.

Table 2.

Results of independent t-test and descriptive statistics for some of the highly predictive radiomic features in association with the development of distant metastasis (DM). Standardized maximum and peak uptake values (SUVMAX and SUVPEAK) are reported for comparison.

Lavene’s F-test*
t-test for Equality of Means
F p > .05 t df p < .05 Mean Diff. 95% CI LB 95% CI UB
2nd-order Mean 1.72 .19 2.21 78 .030 0.13 0.01 0.24
SZV 3.60 .06 2.08 78 .040 0.08 0.04 0.17
SAE 2.45 .12 1.99 78 .049 0.15 0.00 0.30
Texture Strength 3.47 .07 −2.97 78 .004 0.17 0.07 0.25
Tumor Vol. 1.86 .18 2.12 78 .037 0.11 0.01 0.22
Surface Area 3.86 .05 3.61 78 .001 0.18 0.08 0.29
SUVMAX 0.38 .54 2.05 78 .047 0.23 0.00 0.45
SUVPEAK 0.20 .90 2.31 78 .028 0.13 0.015 0.25
*

Equal variances assumed.

Table 3.

Results of independent t-test and descriptive statistics for some of the highly predictive radiomic features in association with the locoregional recurrence (LRR) of the disease. Standardized maximum and peak uptake values (SUVMAX and SUVPEAK) are reported for comparison.

Levene’s F-test*
t-test for Equality of Means
F p > .05 t df p < .05 Mean Difference 95% CI LB 95% CI UB
2nd-order Mean 0.18 .67 1.90 78 .046 0.14 0.07 0.29
Diff. Entropy 2.91 .09 2.21 78 .030 0.12 0.01 0.22
SAE 0.19 .67 −0.47 78 .640 −0.05 −0.25 0.15
SZV 6.43 .01 2.56 78 .012 0.22 0.05 0.39
Tumor Vol. 2.47 .12 3.23 78 .002 0.25 0.10 0.41
SUVMAX 0.04 .08 2.09 78 .047 0.30 0.01 0.60
SUVPEAK 0.06 .08 2.08 78 .040 0.31 0.21 0.97
*

Equal variances assumed.

The distribution for both DM and No-DM patient groups were sufficiently normal for the purpose of conducting a student t–test (i.e., skewness < |2.0| and kurtosis < |0.9|). Also, the assumption of homogeneity of variances was tested and satisfied via Levene’s F–test, F (78)=0.194, p=.66. The independent samples t–test was associated with a statistically significant effect, t (78)=2.12, p=.037. Thus, the group of patients with DM was statistically significantly associated with larger average feature value (e.g., larger average tumor volume) than the group of patients with no DM. Moreover, we found that the effect size for this analysis (d=0.74) to exceed Cohen’s conventions (Table 4) for medium effect (d=0.5).

Table 4.

Descriptive statistics and effect size (Cohen’s d) of Radiomic features and standardized uptake value (SUV) measurements in association with distant metastasis (DM). Cohen’s d range cut-offs: small (0.2), medium (0.5) and large (0.8).

Distant Metastasis (DM) N Mean SD Cohen’s d
2nd-order Mean DM 18 0.65 0.16 0.68
No DM 62 0.78 0.22
SZV DM 18 0.54 0.24 0.83
No DM 62 0.73 0.22
SAE DM 18 0.74 0.25 0.56
No DM 62 0.59 0.29
Tumor Vol. DM 18 0.57 0.24 0.74
No DM 62 0.80 0.21
Texture Strength DM 18 0.53 0.19 0.81
No DM 62 0.72 0.22
SUVMAX DM 18 0.52 0.43 0.55
No DM 62 0.75 0.41
SUVPEAK DM 18 0.53 0.42 0.60
No DM 62 0.78 0.42

Along the same line, we found that the group of patients with LRR (N=9) was statistically significantly associated with larger average radiomic feature value (an average tumor volume=127 cm3; SD= ±70.9 cm3 while patients with no–LRR had an average tumor volume=68 cm3; SD= ±48.9 cm3). Following the same steps performed for the first group, the independent samples t–test was associated with a statistically significant effect t (78)=3.23, p=.002. We found the effect size for this analysis (d=0.98) to exceed Cohen’s conventions (Table 5) for the large effect (d=0.90). Spearman’s Rho correlation revealed a trend of higher radiomic feature values for patients with event occurrence in comparison to lower radiomic feature values for patients with no event occurrence (Table 6; Fig. 3.a and 3.b). The preliminary results mentioned above resulted in shortlisting radiomic features to twenty-four features (32% of the original number of features), namely five HIFs, four GLCMs, two GLRLMs, four GLSZMs, four NGTDM, and five SFs (Supplementary Table 1).

Table 5.

Descriptive statistics and effect size (Cohen’s d) of Radiomic features and standardized uptake value (SUV) measurements in association with locoregional recurrence (LRR). Cohen’s d range cut-offs: small (0.2), medium (0.5) and large (0.8).

Loco-Regional Recurrent (LRR) N Mean SD Cohen’s d
Difference Entropy LRR 9 0.11 0.078 0.99
No LRR 71 0.23 0.15
RPC LRR 9 0.05 0.06 0.88
No LRR 71 0.15 0.16
SZV LRR 9 0.54 0.36 0.75
No LRR 71 0.76 0.22
Tumor Vol. LRR 9 0.48 0.30 0.98
No LRR 71 0.73 0.21
SUVMAX LRR 9 0.54 0.42 0.66
No LRR 71 0.84 0.48
SUVPEAK LRR 9 0.55 0.41 0.70
No LRR 71 0.86 0.47

Table 6.

Association between different radiomic features and standardized uptake value (SUV) measurements with distant metastasis (DM) and locoregional recurrence (LRR) in a cohort of eighty cervical cancer patients measured by Spearman’s rank correlation (rs). Correlation is significant at the alpha level of 0.05 (2-sided).

Outcome Radiomic features
2nd Mean Diff. Entropy RLNU SZV Surf Area Tumor Vol. SUVMAX SUVPEAK
DM 0.33 0.20 0.32 0.27 0.38 0.31 0.25 0.27
LRR 0.25 0.26 0.28 0.23 0.34 0.28 0.22 0.23

Fig. 3.

Fig. 3.

Box plot illustrates the trend of some radiomic features such as (a) tumor volume (Left) in response to patients with and without distant metastasis (DM) and (b) difference entropy (Right) in response to patients with and without locoregional recurrence (LRR). The higher median and length of the right (red) boxes suggests that patients with DM or LRR are characterized by higher feature values in comparison with the other group (left box). Values of Radiomic features were rescaled on a range between [−3, 3]. (For interpretation of the references to colours in this figure legend, the reader is referred to the web version of this paper.)

We fitted each of these features in a univariate logistic regression model based on the data of the training set to assess the magnitude of their association with treatment outcomes. Subsequently, we combined the radiomic features that showed statistically significant univariate associations (Supplementary Table 2) with any of the two outcomes into MLR models according to their method of calculation. The radiomic features were selected by the backward sequential selection method based on Leave-One-Out Cross Validation within the training set as explained in the methods section. Six models highly predicted for both treatment outcomes (Eqs. (4)(9), Figs. 4 and 5) according to the following linear combinations:

G1=1.223×DifferenceEntropy+0.223×2ndordermean13.568 (4)
G2=3.609×SAE3.335×SZV1.075 (5)
G3=4.609×IV+3.104×LIE2.047 (6)
G4=2.081×TextureStrength3.054×Coarseness1.075 (7)
G5=0.058×TumorVol.+0.046×SurfaceArea3.630 (8)
G6=0.072×Asphericity+0.127×SphericalDisproportion2.640 (9)

Fig. 4.

Fig. 4.

Area under the receiver operator characteristic curve (AUC) representing comparison between the trained multivariate logistic regression models versus standardized uptake value (SUV) measurements in terms of prediction for distant metastasis (DM).

Fig. 5.

Fig. 5.

Area under the receiver operator characteristic curve (AUC) representing comparison between the trained multivariate logistic regression models versus standardized uptake value (SUV) measurements in terms of prediction for locoregional recurrence (LRR).

Before generating the models mentioned above, we corrected coarseness for volume (voxel size) dependence, and 2nd-order mean and difference entropy for GL discretization and texture strength for both volume and GL discretization using a method presented in a recent study [22].

Models G1 to G3 showed statistically significant association with LRR (Supplementary Table 2), and models G2 to G6 showed statistically significant association with DM (Supplementary Table 3). Regarding SUV measurements, both SUVMAX and SUVPEAK showed statistically significant association with DM and LRR through all preliminary tests except for univariate logistic regression modeling (Tables 26). Also, a multivariate logistic regression model of both SUVMAX and SUVPEAK was found to be statistically insignificant due to the removal of SUVMAX in the selection step at an alpha level of 0.05. We removed any model that scored an AUC < 0.60. Although model G3 was statistically significantly associated with LRR (AUC = 0.79, p < .05), it was removed due to low 95%CI range (0.59–0.99).

By evaluating the trained models using the independent test set (Tables 7 and 8; Figs. 6 and 7), we found four models associated with DM (G3, G4, G5, and G6) and two models associated with LRR (G1, and G2).

Table 7.

Area under the characteristic receiver operator curve (AUC) of the fully trained and validated models for prediction of LRR over the test set (N = 24).

Variables AUC SE p-value 95%CI LB 95%CI UB
G1: Diff. Entropy, 2nd-order Mean 0.92 0.07 .004 0.78 1.00
G2: SAE, SZV 0.88 0.10 .009 0.67 1.00
SUVmax 0.74 0.09 .095 0.55 0.94
SUVpeak 0.76 0.09 .070 0.58 0.95

Table 8.

Area under the characteristic receiver operator curve (AUC) of the fully trained and validated models for prediction of DM over the test set (N = 24).

Variables AUC SE p-value 95%CI LB 95%CI UB
G3: IV, LIE 0.86 0.07 .007 0.71 1.00
G4: Texture strength, Coarseness 0.88 0.06 .004 0.75 1.00
G5: Volume, Surface Area 0.92 0.05 .001 0.82 1.00
G6: Asphericity, Spherical Disp. 0.89 0.06 .003 0.76 1.00
SUVmax 0.56 0.11 .611 0.34 0.79
SUVpeak 0.61 0.11 .374 0.39 0.85

Fig. 6.

Fig. 6.

Area under the receiver operator characteristic curve (AUC) of the final distant metastasis (DM) predictive models based on the independent test set.

Fig. 7.

Fig. 7.

Area under the receiver operator characteristic curve (AUC) of final locoregional recurrence (LRR) predictive models based on the independent test set.

4. Discussion

The presented supervised analysis study consisted of generating statistical learning models that inferred a high association with dichotomous outcomes. Accordingly, we aimed to determine the most efficient radiomic feature predictors of cervical cancer treatment outcomes. By performing univariate and multivariate logistic regression analysis on radiomic features and SUV measurements, generated under the same conditions, conclusions were drawn regarding the predictive power of each metric. Characterizing and understanding the information extracted from the tumor FDG uptake poses unique challenges. Therefore, several preprocessing steps are required to maximize the benefit of the unique radiomic information derived from the heterogeneity patterns within the tumor image.

Rescaling radiomic feature values to a standardized range is an essential pre-processing step as it makes the feature values directly comparable both inter- and intra-patient. Otherwise, the value of the extracted features data will vary widely, even within the same patient (Eq. (1)). Another important preprocessing step is to eliminate radiomic features that failed to show any statistically significant association between feature and treatment outcome under univariate analysis. These features are unlikely to be associated with the outcome after adjusting for other features (cofactors) in multivariate models.

As previously mentioned, we noticed that some radiomic features displayed increasing trends corresponding to patients who developed either DM or LRR versus patients with absence of such events (Fig. 3). This result urged us to investigate the potential of such features to serve as robust quantitative predictors for the development of DM and LRR. The results of this study revealed that the geometrical aspects of tumors, interpreted by shape-based features (SF), showed the highest correlation with treatment outcomes. Thus, future research should focus more on ascertaining the predictive power of these radiomic features. For example, the size of the primary cervical tumor volume (Tumor Vol.) had the highest significant univariate prediction for DM. This finding is concordant with other studies in this field [27,28]. The results also presented a prominent role of GLSZM features (IV and LIE) in univariate correlation with DM, while GLCM features (difference entropy, 2nd order mean) presented the highest correlation with LRR. It is worth to mention that GLSZM features extracted from FDG-PET were shown to have an association with physiological processes such as colorectal tumor vascularization [25] and a differentiation ability of esophageal patients based on their response [13]. To our knowledge, we are the first to report a correlation between texture strength, coarseness, IV, LIE, and difference entropy to treatment outcomes. Meanwhile, IVH based features, which characterize the distribution of voxel intensities without taking into account spatial relationships among voxels, were all statistically uncorrelated with outcomes. In contrast to a similar study [12], V90, percentage volume having at least 90% intensity value, V10–90, the difference between V10 and V90, and I90, minimum intensity to the 90% highest intensity volume, did not show statistically significant association on any preliminary statistical test (p > .05). This finding might be due to the relatively higher number of the patients between both studies. We also found that patient age did not have statistically significant correlation with any of the outcomes. Racial disparity was out of the scope of this study [29].

Interestingly, although SUVPEAK and SUVMAX demonstrated preliminary correlation with both treatment outcomes, they failed to display any statistically significant correlation (p > .05) neither in the univariate nor multivariate logistic regression analysis. Therefore, within the scope of this study, radiomic features are considered superior to SUV measurements, especially to SUVMAX, regarding prediction for both DM and LRR (Supplementary Tables 2, 3 and Tables 7, 8, Figs. 4 and 5). This finding is thought to be attributed to the advanced calculation method of textural-based radiomic features, which better analyze the spatial resolution, and voxel-neighborhood information within the tumor volume. Such features might present enhanced understanding of tumor heterogeneity and behavior.

In agreement with related studies, multiple logistic regression modeling was shown to be a useful methodology that may enhance the predictive performance of radiomic features [30,31]. In the case of LRR, logistic regression modeling significantly improved the predictive power by an increase of ~8%, whereas in the case of DM it increased by ~5%). Furthermore, using sequential backward selection followed by LOOCV served as a rigorous method to select features with the best performance within the training set. Finally, applying the multivariate models on the test set presented G1 as the best model to predict for LRR (AUC: 0.89 95%CI: 0.78–1.00) and G5 as the best model to predict for DM (AUC: 0.82. 95%CI: 0.69–0.94).

On the other hand, quantitative analysis of 18F-FDG PET images presents several challenges [32]. From a clinical perspective, the metabolic tumor volume is shown to be highly dependent on both the scanning protocol and source-to-background ratio. In addition, an anatomical restriction of the pelvic region requires a more sophisticated method of segmentation.

Several articles reported the use of fuzzy locally adaptive [14] or fuzzy c-means [33] algorithms have led to satisfactory results. Therefore, in a previous study [21], we assessed the segmentation accuracy by computing the Dice coefficient of the overlap between two experts’ manual and a graphical-based semi-automatic segmentation method. Moreover, the relatively small patients cohort might be a limitation of the current study. However, this cohort is about the same size, or larger, in comparison to samples in similar studies. From a technical perspective, the accurate determination of treatment response remains a challenging task owing to the limitations of current PET imaging technologies, specifically, their limitation of spatial resolution and high noise characteristics resulting from lower sensitivity [34,35]. Hence, in this study, we adhered to the recommendation of the Society of Nuclear Medicine (SNM) and the European Association of Nuclear Medicine (EANM) by following one strict scanning and image acquisition protocol for our patient cohort.

5. Conclusion

Radiomics is an emerging process to extract more quantitative and predictive information from radiological images. We have selected twelve FDG-PET based radiomic features to fit into six distinct multivariable logistic regression models. These models had more predictive power in comparison to univariate radiomic features models and SUV measurements. Shape features such as the metabolic tumor volume and surface area in addition to NGTDM features such as texture strength and coarseness were shown to be valuable features for the prediction of distant metastasis. On a similar note, GLSZM features such as SZV, SAE and GLCM features such as 2nd-order mean and different entropy were more predictive of loco-regional recurrence of the disease. The proposed multi-radiomic models might be useful for correlation with clinical treatment outcomes for cervical cancer treated with definitive radiochemotherapy, and warrant further investigation. We also encourage intra-center studies with larger patient cohorts for validation of our results and those of other investigators in radiomics. This scope of research might lead to better tumor targeting, identification of response at an early stage, and provide a clearer understanding of the relation between radiomic features and tissue heterogeneity characteristics.

Supplementary Material

Supplementary Data 1

Acknowledgments

Mr. Baderaldeen Altazi is granted a scholarship from King Fahad Specialist Hospital, Dammam, Saudi Arabia. However, this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Footnotes

Conflict of interest statement

Authors have nothing to disclose.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ejmp.2017.10.009.

References

  • [1].Siegel RL, Miller KD, Fedewa SA, Ahnen DJ, Meester RG, Barzi A, et al. Colorectal cancer statistics, 2017. CA Cancer J Clin 2017;67(3):177–93. [DOI] [PubMed] [Google Scholar]
  • [2].Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer 2015;136:E359–86. [DOI] [PubMed] [Google Scholar]
  • [3].Kidd EA, El Naqa I, Siegel BA, Dehdashti F, Grigsby PW. FDG-PET-based prognostic nomograms for locally advanced cervical cancer. Gynecol Oncol 2012;127:136–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Bokemeyer KC, Oechsle KC, et al. Early prediction of treatment response to high-dose salvage chemotherapy in patients with relapsed germ cell cancer using [18F] FDG PET. Br J Cancer 2002;86:506–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Kidd EA, Siegel BA, Dehdashti F, Grigsby PW. Pelvic lymph node F-18 fluor odeoxyglucose uptake as a prognostic biomarker in newly diagnosed patients with locally advanced cervical cancer. Cancer 2010;116:1469–75. [DOI] [PubMed] [Google Scholar]
  • [6].Brown C, Howes B, Jamieson GG, Bartholomeusz D, Zingg U, Sullivan TR, et al. Accuracy of PET-CT in predicting survival in patients with esophageal cancer. World J Surg 2012;36:1089–95. [DOI] [PubMed] [Google Scholar]
  • [7].Visser EP, Boerman OC, Oyen WJ. SUV: from silly useless value to smart uptake value. J Nucl Med 2010;51:173–5. [DOI] [PubMed] [Google Scholar]
  • [8].Wahl RL, Jacene H, Kasamon Y, Lodge MA. From RECIST to PERCIST: evolving considerations for PET response criteria in solid tumors. J Nucl Med 2009;50(Suppl 1):122S–50S. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Sher A, Lacoeuille F, Fosse P, Vervueren L, Cahouet-Vannier A, Dabli D, et al. For avid glucose tumors, the SUV peak is the most reliable parameter for [(18)F]FDG-PET/CT quantification, regardless of acquisition time. EJNMMI Res 2016;6:21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012;48:441–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Kumar V, Gu Y, Basu S, Berglund A, Eschrich SA, Schabath MB, et al. Radiomics: the process and the challenges. Magn Reson Imaging 2012;30:1234–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].El Naqa I, Grigsby P, Apte A, Kidd E, Donnelly E, Khullar D, et al. Exploring feature-based approaches in PET images for predicting cancer treatment outcomes. Pattern Recognit 2009;42:1162–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Tixier F, Le Rest CC, Hatt M, Albarghach N, Pradier O, Metges JP, et al. Intratumor heterogeneity characterized by textural features on baseline 18F-FDG PET images predicts response to concomitant radiochemotherapy in esophageal cancer. J Nucl Med 2011;52:369–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Wei M, Zhe C, Wei S, Feng Y, Ying L, Ruwei D, et al. A segmentation algorithm for quantitative analysis of heterogeneous tumors of the cervix with 18F-FDG PET/CT. Biomed Eng IEEE Trans 2015;62:2465–79. [DOI] [PubMed] [Google Scholar]
  • [15].Parekh V, Jacobs MA. Radiomics: a new application from established techniques. Expert Rev Precis Med Drug Dev 2016;1:207–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Shanmugam K, Haralick Robert M, Dinstein Its’Hak. Textural features for image classification. IEEE 1973;SMC-3(6). [Google Scholar]
  • [17].Galloway MM. Texture analysis using gray level run lengths. Comput Graphics Image Process 1975;4:172–9. [Google Scholar]
  • [18].Thibault G, Angulo J, Meyer F. Advanced statistical matrices for texture characterization: application to cell classification. IEEE Trans Biomed Eng 2014;61:630–7. [DOI] [PubMed] [Google Scholar]
  • [19].Amadasun M, King R. Textural features corresponding to textural properties. IEEE Trans Syst Man Cybern 1989;19:1264–74. [Google Scholar]
  • [20].O’Sullivan F, Roy S, O’Sullivan J, Vernon C, Eary J. Incorporation of tumor shape into an assessment of spatial heterogeneity for human sarcomas imaged with FDG-PET. Biostatistics 2005;6:293–301. [DOI] [PubMed] [Google Scholar]
  • [21].Altazi B, Zhang G, Fernandez D, Montejo M, Hunt Dylan, Werner J, et al. Reproducibility of F18-FDG PET radiomic features for different cervical tumor segmentation methods, gray-level discretization, and reconstruction algorithms. J Appl Clin Med Phys 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Shafiq-ul-Hassan M, Zhang GG, Latifi K, Ullah G, Hunt DC, Balagurunathan Y, et al. Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels. Med Phys 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Kitchenham B, Känsälä K, 1993. Inter-item correlations among function points, Software Engineering. In: Proceedings, 15th International Conference on, IEEE1993, pp. 477–480. [Google Scholar]
  • [24].Gulliksen H The relation of item difficulty and inter-item correlation to test variance and reliability. Psychometrika 1945;10:79–91. [Google Scholar]
  • [25].Tixier F, Groves AM, Goh V, Hatt M, Ingrand P, Le Rest CC, et al. Correlation of intra-tumor 18F-FDG uptake heterogeneity indices with perfusion CT derived parameters in colorectal cancer. PLoS One 2014;9:e99567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Cohen J Statistical power analysis for the behavioral sciences, 2nd ed. Lawrence Erlbaum Associates, Hillsdale, NJ. [Google Scholar]
  • [27].Crivellaro C, Signorelli M, Guerra L, De Ponti E, Buda A, Dolci C, et al. 18F-FDG PET/CT can predict nodal metastases but not recurrence in early stage uterine cervical cancer. Gynecol Oncol 2012;127:131–5. [DOI] [PubMed] [Google Scholar]
  • [28].Chung HH, Kim JW, Han KH, Eo JS, Kang KW, Park NH, et al. Prognostic value of metabolic tumor volume measured by FDG-PET/CT in patients with cervical cancer. Gynecol Oncol 2011;120:270–4. [DOI] [PubMed] [Google Scholar]
  • [29].Beavis AL, Gravitt PE, Rositch AF. Hysterectomy-corrected cervical cancer mortality rates reveal a larger racial disparity in the United States. Cancer 2017. [DOI] [PubMed] [Google Scholar]
  • [30].Naqa IE. The role of quantitative PET in predicting cancer treatment outcomes. Clin Transl Imaging 2014;2:305–20. [Google Scholar]
  • [31].Lian C, Ruan S, Denoeux T, Jardin F, Vera P. Selecting radiomic features from FDG-PET images for cancer treatment outcome prediction. Med Image Anal 2016;32:257–68. [DOI] [PubMed] [Google Scholar]
  • [32].Avanzo M, Stancanello J, El Naqa I. Beyond imaging: the promise of radiomics. Phys Med 2017;38:122–39. [DOI] [PubMed] [Google Scholar]
  • [33].Tixier F, Hatt M, Le Rest CC, Le Pogam A, Corcos L, Visvikis D. Reproducibility of tumor uptake heterogeneity characterization through textural feature analysis in 18F-FDG PET. J Nucl Med 2012;53:693–700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Doot RK, Scheuermann JS, Christian PE, Karp JS, Kinahan PE. Instrumentation factors affecting variance and bias of quantifying tracer uptake with PET/CT. Med Phys 2010;37:6035–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Sureshbabu W, Mawlawi O. PET/CT imaging artifacts. J Nucl Med Technol 2005;33:156–61. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data 1

RESOURCES