Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Apr 1.
Published in final edited form as: Radiother Oncol. 2017 Dec 19;127(1):36–42. doi: 10.1016/j.radonc.2017.11.025

Predicting hypoxia status using a combination of contrast-enhanced computed tomography and [18F]-Fluorodeoxyglucose positron emission tomography radiomics features

Mireia Crispin-Ortuzar a,b,*, Aditya Apte a, Milan Grkovski a, Jung Hun Oh a, Nancy Y Lee c, Heiko Schoder d, John L Humm a, Joseph O Deasy a
PMCID: PMC5924729  NIHMSID: NIHMS927438  PMID: 29273260

Abstract

Background and purpose

Hypoxia is a known prognostic factor in head and neck cancer. Hypoxia imaging PET radiotracers such as 18F-FMISO are promising but not widely available. The aim of this study was therefore to design a surrogate for 18F-FMISO TBRmax based on 18F-FDG PET and contrast-enhanced CT radiomics features, and to study its performance in the context of hypoxia-based patient stratification.

Methods

121 lesions from 75 head and neck cancer patients were used in the analysis. Patients received pre-treatment 18F-FDG and 18F-FMISO PET/CT scans. 79 lesions were used to train a cross-validated LASSO regression model based on radiomics features, while the remaining 42 were held out as an internal test subset.

Results

In the training subset, the highest AUC (0.873 ± 0.008) was obtained from a signature combining CT and 18F-FDG PET features. The best performance on the unseen test subset was also obtained from the combined signature, with an AUC of 0.833, while the model based on the 90th percentile of 18F-FDG uptake had a test AUC of 0.756.

Conclusion

A radiomics signature built from 18F-FDG PET and contrast-enhanced CT features correlates with 18F-FMISO TBRmax in head and neck cancer patients, providing significantly better performance with respect to models based on 18F-FDG PET only. Such a biomarker could potentially be useful to personalize head and neck cancer treatment at centers for which dedicated hypoxia imaging PET radiotracers are unavailable.

Keywords: FMISO, FDG, hypoxia, radiomics, PET, CT, head and neck cancer, PET/CT

1. Introduction

Head and neck squamous cell carcinomas (HNSCC) treated with chemoradiotherapy can reach overall survival rates of nearly 90% when human papillomavirus (HPV)-positive (HPV+) tumors dominate the cohort [1, 2, 3]. However, head and neck tumors with regions of insufficient oxygenation –a condition known as hypoxia– are still associated with poor prognosis [4]. Studies have shown that hypoxia triggers angiogenesis, increases radioresistance and reduces the effectiveness of surgery, an alleged consequence of hypoxia selected pressure for tumor aggressiveness and metastasis [5, 6, 7, 8]. This has motivated the development of HNSCC chemoradiotherapy protocols that stratify patients based on the assessment of tumor hypoxia [9, 10, 11]. Major alterations of the treatment are being tested in clinical trials, such as the reduction of the radiation dose to non-hypoxic lesions from the standard 70 Gy to 30 Gy, thereby significantly reducing the risk of normal tissue toxicity [12, 13].

The preferred technique to detect hypoxia in this context is currently positron emission tomography (PET) combined with a hypoxia radiotracer such as 18F–Fluoromisonidazole (18F-FMISO) [14, 15, 16]. Previous studies have shown that 18F-FMISO PET is a prognostic biomarker both before and during treatment [17, 18, 19]. However, 18F-FMISO is still an investigational agent that requires an Institutional Review Board (IRB) with an Investigational New Drug approval from the FDA. The task of extending existing hypoxia-based HNSCC trials to the routine clinical setting would be simpler if an equivalent biomarker could be obtained from standard imaging techniques.

Hypoxia has been shown to increase 18F-Fluorodeoxyglucose (18F-FDG) uptake in cancer cell lines [20]. This is not however the only contribution to the 18F-FDG signal, and studies assessing the overall correlation between 18F-FDG and the level of hypoxia obtain conflicting results [21, 22]. In this study we sought radiomics features to explore the hypothesis that an image-based biomarker obtained from the combination of 18F-FDG PET scans and their accompanying contrast-enhanced CT scans correlates with 18F-FMISO-assessed hypoxia, with contrast-enhanced CT scans acting as surrogates for blood flow [23]. In particular, we analyze its utility for the task of discriminating with enough specificity a subset of candidate patients for dose de-escalation, using only imaging techniques employed routinely in the clinical setting. To maximize the information extracted, we quantify the images using radiomics features [24] and develop the biomarker using machine learning.

2. Materials and methods

2.1. Dataset and hypoxia definition

Patients were treated for head and neck cancer with chemoradiation at Memorial Sloan Kettering Cancer Center (MSKCC) under a de-escalation trial (IRB #04-070; NTC00606294 on clinicaltrials.gov). To be included in the present retrospective analysis, patients had to have received pre-treatment 18F-FDG static PET/CT for treatment planning, as well as 18F-FMISO dynamic PET/CT. Scans were performed in a GE Medical Systems PET/CT scanner with the patient immobilized with a customized mask. Prior to the 18F-FDG PET/CT scan patients were given 100 ml of omnipaque contrast. The 18F-FMISO PET/CT scan consisted of a 30-minute dynamic scan followed by two 10-minute static scans obtained approximately 90 and 150 minutes post-injection, respectively. The average time between the two PET scans was 6 days, with a standard deviation of 5 days. All patients with the available data treated between January 2011 and February 2016 were included in the analysis.

The lesions were delineated by experienced physicians on the contrast-enhanced CT for radiotherapy treatment planning. 18F-FDG uptake was quantified in terms of the body-weight standardized uptake value (SUV). 18F-FMISO uptake was normalized by the uptake measured in the ipsilateral (with respect to the lesion) jugular vein. The level of hypoxia of a lesion was defined in terms of its maximum tumor-to-blood uptake ratio (TBRmax) on the last static scan. In particular, in this study a lesion was considered to be hypoxic if TBRmax > 1.41.

2.2. Radiomics features

Radiomics features were extracted from both 18F-FDG PET and CT scans, including itensity features (from PET and CT), shape features (CT), and texture features (CT). Intensity features also included variables based on two-dimensional volume-intensity histograms, for example the volume spanned by voxels with intensities above 70 Hounsfield units (HU), denoted V>70. Features were extracted in Matlab using the CERR Radiomics Imaging Quantification toolbox (RIQ, [25, 26], April 2017 version). All code used to derive features is open source and available for download. Table S1 in the Supplementary Material includes a full list of the radiomics features computed.

In CT scans, radiomics features were extracted for each lesion from four different volumes of interest, denoted vGTV (gross tumor volume, GTV), vCT (GTV after filtering out voxels outside of HU ∈ [−100, 150]), v (SUV > 42% SUVmax) and v (SUV < 42% SUVmax), respectively, as shown in Figure 1.

Figure 1.

Figure 1

Feature extraction pipeline, indicating the volumes used to compute each family of features. IVH = Intensity Volume Histogram, RLM=Run Length Matrix, NGTDM=Neighborhood Gray-Tone Difference Matrix, Neighboring Gray-Level Dependence Matrix.

For 18F-FDG PET features, the volume of interest was defined as the region within the GTV with SUV > 42% SUVmax [27]. We use the intersection between the fixed-threshold contour and the GTV to avoid the overestimation of small lesion boundaries [28].

2.3. Data analysis

All lesions with a volume larger than 10 cm3 were considered for the analysis. The dataset was divided into a training subset comprising approximately 65% of the lesions, and an internal test subset which has held out and only used to test the final models. TBRmax was used as the continuous response variable to predict with a supervised learning model. However, as the ultimate goal was stratification, the response was dichotomized by classifying lesions with TBRmax > 1.4 as hypoxic, and the performance was evaluated in terms of the area under the receiver operating characteristic curve (AUC).

First, the training dataset was tested for any univariable associations between clinical predictors and the lesions TBRmax. Correlations between numerical predictors and TBRmax were measured in terms of the Spearman correlation coefficient, while associations with categorical predictors were assessed using balanced one-way ANOVA. p-values were corrected for multiple-testing using the Benjamini-Hochberg procedure.

Figure 2 illustrates the radiomics modelling process, which was performed in steps: (1) feature selection based on cross-validated LASSO regression; (2) multivariable linear model building, based on the selected features; and (3) evaluation of the best linear model on the hold-out test set. The process was performed independently for CT and 18F-FDG PET features. Steps (2) and (3) were also performed for the combined set of selected PET and CT features.

Figure 2.

Figure 2

Flow chart of the steps followed to derive the predictive model, including (1) feature selection, (2) model building, and (3) testing. The curved, dashed line indicates the addition of interaction terms between the selected features to assess the importance of non-linearity.

The feature selection step is based on a LASSO linear model embedded in a 10-fold cross-validation loop reshuffled 10 times. In each fold features were pre-selected by applying a univariable p-value cut and removing those that were highly correlated. The remaining features were used in a LASSO linear model with nested 5-fold cross-validation to determine the regularization parameter yielding the smallest mean square error. The selected features were used again as input to a second LASSO model, this time also including bilinear interaction terms, to assess the non-linearity of the problem. The whole process, outlined in Figure 2, was repeated multiple times to scan over possible values of the pre-selection cuts (p = 0.01, 0.02,… 0.1, Spearman r = 0.5, 0.6 0.8) and find the ones that maximized the cross-validation AUC.

For the model building step, only features selected in 50% of the 100 cross-validation runs or more were used. Multiple 1- and 2-variable linear regression models were created by taking all possible combinations of the selected features. An optimistic bound on the expected performance of the models was determined in terms of the mean AUC obtained from 10-fold cross-validation reshuffled 10 times.

For each category (PET, CT, or PET+CT) only the linear model with the best AUC was evaluated on the test dataset. The final model coefficients were determined by fitting to the entire training subset.

To assess whether the test AUCs of the three models were significantly better than a model based only on P90%FDG (the 90th percentile of the 18F-FDG SUV, used here as a robust variant of the maximum SUV), we computed 1000 bootstrap replicas of the test dataset, calculated the corresponding AUCs, and derived a p-value based on the two-sided Wilcoxon rank sum test.

Further details about the definition of the radiomics features and the data analysis procedure can be found in the supplementary material.

3. Results

75 patients satisfying all the requisites were identified, adding up to 121 lesions in total. A randomly chosen subset of 79 lesions were used for training, and the remaining 42 were held out for testing purposes. Patient characteristics are listed in Table 1. None of them were found to be significantly different between the training and testing datasets.

Table 1.

Characteristics of the lesions used in the analysis. For tumor subsite, HPV and p16 status, p-values are based on the Fisher exact test. For the mean and standard deviations of the 18F-FDG maximum SUV and 18F-FMISO TBRmax, p-values are based on the 2-sample t-test.

Type Ntrain
(Fraction)
Ntest
(Fraction)
p-value
Subsite

Base of Tongue 40 (0.51) 21 (0.5) 1
Hypopharynx 1 (0.01) 1 (0.02) 1
Supraglottic larynx 2 (0.03) 1 (0.02) 1
Tonsil 32 (0.41) 19 (0.45) 0.7
Unknown 4 (0.05) 0 (0) 0.3

HPV status

HPV+ 45 (0.57) 27 (0.64) 0.6
HPV− 11 (0.14) 7 (0.17) 0.8
Unknown 23 (0.29) 8 (0.19) 0.3

p16 status

p16+ 57 (0.72) 34 (0.81) 0.4
p16− 7 (0.09) 4 (0.1) 1
Unknown 15 (0.19) 4 (0.1) 0.2

Type Training
Mean (SD)
Test
Mean (SD)
p-value

18F-FMISO TBRmax 1.91 (0.73) 1.76 (0.56) 0.3
18F-FDG SUVmax 13.64 (5.76) 13.74 (5.23) 0.9

3.1. Association with clinical variables

TBRmax was tested for association with HPV status, p162 status and tumor subsite, and no significant correlation was found. Clinical variables were therefore not included in the radiomics models.

3.2. Model training

CT model

The optimal pre-selection cuts were pcut = 0.02 and rcut = 0.6. Two variables were selected by the LASSO model 50% of the time or more: the long-run high-grey-level emphasis of the v region, denoted E; and V>70 of the v region, denoted V>70. The specific definition of E that was selected was sensitive to texture directionality; in particular, it used the run-length matrix calculated along the direction that yielded the maximum result in that tumor. The resulting cross-validated AUC was 0.78, as shown in Table 2. Adding interaction terms did not improve the result. Using these two features only in a multivariable regression model yielded a mean cross-validation AUC of 0.841, higher than using any of the two features alone.

Table 2.

Performance of the models derived from the training subset assessed in terms of the mean and standard deviation (SD) of the cross-validation AUC. AUCs are also given for the test subset, indicating the bootstrap p-value of the difference with respect to the model with only P90%FDG.

Modality Feature pre-selection Interaction terms AUC
Training mean ± SD

CT None No 0.78 ± 0.03
None Yes 0.78 ± 0.03

ε↑,
V>70
No 0.841 ± 0.007

PET None No 0.852 ± 0.008
None Yes 0.853 ± 0.007

P90%FDG
No 0.855 ± 0.005

P90%FDG, μ3FDG No 0.861 ± 0.007

PET + CT P90%FDG, E No 0.873 ± 0.008

Test value (p)

PET
P90%FDG
No 0.756
PET P90%FDG, μ3FDG No 0.767 (p < 0.001)
CT ε↑,
V>70
No 0.828 (p < 0.0001)
PET + CT P90%FDG, E No 0.833 (p < 0.0001)

PET model

The optimal pre-selection cuts were pcut = 0.01 and rcut = 0.5. The maximum SUV, here defined in a robust way via the 90th percentile, P90%FDG, was selected 100% of the time. In addition, the skewness of the SUV distribution, μ3FDG, was selected exactly 50% of the time. The cross-validated AUC was 0.852, with no improvement gained from adding interactions. Using these two features only in a multivariable regression model resulted in a mean cross-validation AUC of 0.861, higher than using any of the two features alone.

Combined CT and PET model

All the possible combinations between the top two CT and PET features were considered as predictors in a multivariable linear regression model. The highest performance was achieved when combining P90%FDG and E, with an AUC of 0.873. Adding interaction terms did not improve the performance.

All the intermediate performance results can be found in Table I of the supplementary material.

3.3. Model testing

Four multivariable regression models were tested on the unseen test subset: (i) E and V>70, (ii) P90%FDG, (iii) P90%FDG and μ3FDG, (iv) E and P90%FDG. They were trained on the entire training dataset. The performance obtained in terms of AUC was 0.828, 0.736, 0.767 and 0.833, respectively, as shown in Table 2. The highest performance was achieved by the combined P90%FDG and E model, in agreement with what was observed in the training dataset. The discriminative power of the two features can be seen in Figure 3a, while the correlation between the linear combination of the two features and 18F-FMISO TBRmax is shown in Figure 3b. The AUC of the combined model was significantly higher than the AUC of the model based only on P90%FDG (p < 0.0001).

Figure 3.

Figure 3

(a) Scatter plot of P90%FDG against E for the test dataset. The markers are color coded according to their TBRmax value. The grey lines result from applying fixed thresholds to the model derived from the training dataset. The resulting classifications have the following false positive and true positive rates for TBRmax > 1.4: t1 FPR=0.07, TPR=0.72; t2 FPR=0.31, TPR=0.76; t3 FPR=0.46, TPR=0.97. Threshold t1 would result in 6 normoxic lesions out of 13 being classified as such, with only one hypoxic lesion being misclassified as normoxic. Threshold t2 would result in 8 normoxic lesions being correctly classified, with 7 hypoxic lesions being incorrectly classified as normoxic. Finally, threshold t3 would classify 12 out of 13 normoxic lesions correctly, but it would incorrectly classify 9 hypoxic lesions as normoxic. (b) Correlation plot of the combined P90%FDG and E signature versus F-FMISO TBRmax in the test dataset (Spearman ρ = 0.57, p < 0.01). For values of the PET+CT predictor below 2.0, the Spearman ρ is 0.72, p < 0.01.

4. Discussion

This study explores the possibility of obtaining a surrogate hypoxia biomarker derived from conventional imaging techniques. The proposed signature is based on 18F-FDG PET and its companion contrast-enhanced CT scan, and reaches an AUC of 0.83 in the test dataset.

We used TBRmax to quantify hypoxia as it minimizes uptake normalization errors and is highly correlated with the tumor-to-muscle ratio that is commonly used in clinical trials [9, 13, 30]. It has been argued that for a more reliable determination of tumor hypoxia a kinetic analysis of dynamic PET data may be needed [31], as the raw 18F-FMISO uptake is expected to be affected by factors such as the distribution volume [32]. An interesting follow-up study would be to find predictors of the irreversible binding rate of 18F-FMISO, as determined from kinetic analysis.

We found that the most predictive 18F-FDG PET features were the 90th percentile and the skewness of the SUV distribution, denoted P90%FDG and μ3FDGrespectively. μ3FDG captures the bias of the distribution towards either the higher or lower ends of the SUV spectrum, and its role could be to capture the fraction of hot (and potentially hypoxic) voxels. While P90%FDG remains important in models including CT features, the contribution of μ3FDG becomes negligible.

We studied features extracted from CT scans independently, but the analysis was in fact not completely agnostic to 18F-FDG PET, as the volumes of interest included high-SUV and low-SUV subregions of the GTV. The two top CT features were the volume taken by voxels with HU> 70 within the low-SUV subvolume (denoted V>70) and the long-run high-grey-level emphasis calculated along the direction with the maximum value within the high-SUV subvolume (denoted E). By measuring the volume of high contrast-CT density voxels, V>70 is adding a blood flow dimension to 18F-FDG uptake. Visual examination of the lesions shows that those with low V>70tend to have low SUV, CT-hypodense cores such as the one in Figure 4c; it is therefore identifying lesions as necrotic (negligible 18F-FMISO TBRmax due to limited blood flow) or viable (non negligible TBRmax). On the other hand, E is sensitive to successions of voxels (‘runs’) with high CT density within the high-SUV subregion. This suggests that it could be capturing the presence of blood vessels, as in the examples shown in 4a (dense vasculature) and 4b (single blood vessel). The focus on high-18F-FDG uptake regions could be helping reveal whether the uptake is due to perfusion or hypoxia.

Figure 4.

Figure 4

Representative slices from the CT and 18F-FDG PET scans of three lesions with a diverse range of radiomics features and TBRmax. Each subfigure corresponds to a different lesion, with the CT scan on the left panel and the 18F-FDG PET scan on the right panel. The main features for each lesion are: (a) TBRmax=1.5, V>70 =35 cc, E =9735, P90%FDG=18.2, μ3FDG =0.48; (b) TBRmax=1.8, V>70 =6 cc, E =10980, P90%FDG =11.5, μ3FDG = 0.17; (c) TBRmax=1.0, V>70=0 cc, E =5702, P90%FDG=1.9, μ3FDG =0.65. The white contour indicates v, and the dark grey contour indicates v.

This is not the first time that CT features have been found to be associated with hypoxia. Studies have shown that hypodense neck lymph nodes in routine contrast-enhanced CT scans have hypoxic conditions [23, 33]. Nodal CT density has also been shown to be a prognostic factor in head and neck cancer [34, 35]. More recently, Panth et al found significant differences in the time evolution of CT radiomics features between mice whose hypoxic fraction was reduced due to induced GADD34 over-expression and a control group [36]. In a study by Ganeshan et al, texture features obtained from contrast-enhanced CT images showed significant correlation with the average intensity of tumor staining with pimonidazole in NSCLC patients [37].

Our results suggest that features extracted from routine contrast-enhanced CT features can significantly improve the ability of 18F-FDG PET to indicate hypoxia, at least to the extent that it can be measured by 18F-FMISO TBRmax. A large number of radiomics features were considered in the analysis, to maximize the amount of information extracted from the images. The final model only contains two radiomics features, V>70 and E, both of which have plausible interpretations.

As seen in Figure 3b, there are outliers in the prediction. Such deviations are expected, as there are multiple situations that can lead to enhanced 18F-FDG uptake in the absence of hypoxia, including high levels of metabolism, active inflammation, or proliferation rate. Similarly, it has been shown that variations in the distribution volume of 18F-FMISO can result in particular values of TBR being associated to different levels of hypoxia [32].

From a clinical standpoint, the operating point of the radiomics biomarker would need to be optimized in order to achieve the desired specificity and senbiomarker, assuming sitivity. Figure 3a shows three possible thresholds for the proposed radiomics biomarker, assuming that lesions with TBRmax > 1.4 are defined as hypoxic. In particular, threshold t1 has a specificity of 93%, and would therefore satisfy the criterion of having a minimal number of false positives included in the de-escalation candidate subset. It may be possible to improve the performance by deriving a higher-order classifier, as opposed to the linear function proposed here; this would however require a larger dataset to avoid overfitting. Our results are only the first step, and suggest that such larger scale effort would be worthwhile.

Key limitations of this study are that it is single institution –although internal validation was successful–, and that the CT imaging protocol was not designed for close monitoring of contrast delivery. Although we believe the results are likely to have broader validity, a multi-institutional study is needed in order to definitively establish this signature (or one like it) to the point that it could be reliably used for patient management. Lastly, the 18F-FMISO endpoint itself, TBRmax, could be somewhat noisy, which limits the ability to predict its value. Viewed from this perspective, the correlation achieved is promising.

5. Conclusion

We have identified image features derived from conventional imaging that correlate with the magnitude of hypoxia in head and neck cancer patients. In particular, our results show that a combined radiomics biomarker based on 18F-FDG PET and contrast-enhanced CT can emulate 18F-FMISO TBRmax-based stratification with significantly higher accuracy than 18F-FDG PET alone. After validation on large multi-institutional cohorts, such a biomarker could potentially be useful for head and neck cancer patient stratification in situations where 18F-FMISO is not available.

Supplementary Material

supplement

Acknowledgments

MCO is supported by a Junior Research Fellowship from Trinity College, University of Cambridge. This research was funded in part through NIH grant #1 R01 CA157770-01A1 as well as NIH/NCI Cancer Center Support Grant P30 CA008748. The sponsors were not involved at any stage of the study design, data collection and analysis, manuscript preparation or submission.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of Interest Statement

The authors declare that they have no conflicts of interest.

1

Typical thresholds include tumor-to-blood or tumor-to-muscle ratios of 1.2 or 1.4. This study uses 1.4 to obtain a roughly balanced dataset, that is, with positive and negative classes of comparable size.

2

p16 is a tumor suppressor gene which has been suggested to be correlated with treatment response and survival [29].

Bibliography

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplement

RESOURCES