Abstract
Numerous studies have reported the prognostic utility of texture analyses and the effectiveness of radiomics in PET and PET/CT assessment of non-small cell lung cancer (NSCLC). Here we explore the potential, relative to this methodology, of an alternative model-based approach to tumour characterization, which was successfully applied to sarcoma in previous works. The spatial distribution of 3D FDG-PET uptake is evaluated in the spatial referential determined by the best-fitting ellipsoidal pattern, which provides a univariate uptake profile function of the radial position of intratumoral voxels. A group of structural features is extracted from this fit that include two heterogeneity variables and statistical summaries of local metabolic gradients. We demonstrate that these variables capture aspects of tumour metabolism that are separate to those described by conventional texture features. Prognostic model selection is performed in terms of a number of classifiers, including stepwise selection of logistic models, LASSO, random forests and neural networks with respect to two-year survival status. Our results for a cohort of 93 NSCLC patients show that structural variables have significant prognostic potential, and that they may be used in conjunction with texture features in a traditional radiomics sense, towards improved baseline multivariate models of patient overall survival. The statistical significance of these models also demonstrates the relevance of these machine learning classifiers for prognostic variable selection.
Keywords: FDG-PET, heterogeneity, metabolic gradient, spatial modeling, texture, radiomics, prognosis, non-small cell lung cancer, machine learning
I. Introduction
Contributions from many groups over recent years have now converged into considering radiomics as a promising methodological standard for prognosis and therapeutic assessment based on PET- and PET/CT-based assessment of tumor metabolism [1]-[9]. Conventional radiomics analyses focus particularly on the assessment of intratumoral FDG-PET uptake heterogeneity, which is defined in this framework as a multivariate combination of shape and texture characteristics. Resulting multivariate patient risk models typically include summaries of the histogram of intensities, second-order statistics of their spatial distribution and some higher-order (regional) texture variables [3], [5], [9], [10]. This approach has previously been applied to tumour image analysis in non-small cell lung cancer (NSCLC) [1]-[3], [9], [11], [12]. In this paper we will refer to conventional radiomics features as “textural features”.
Alternative methodologies for heterogeneity assessment have also been considered, including some developments based on ellipsoidal and tubular representations of the volumetric FDG-PET uptake distribution [13]-[15]. Here we consider the former approach, analyzing the volumetric uptake in function of an idealized ellipsoidal pattern. This was initially built upon clinical experience in sarcoma and has demonstrated prognostic utility in a number of diseases [13], [15] including NSCLC [16], [17]. This spatial modeling technique provides an assessment of the structure of the FDG uptake distribution, and allows deriving other associated variables of interest, such as metabolic gradients [15], [18]. In this paper we will refer to this group of variables as “structural features”, since they are derived from structural modelling of the volumetric uptake information.
This paper demonstrates the potential complementarity of structural and textural features, and the possibility of combining these two quantitation strategies towards enhanced multivariate prognostic models. A number of machine learning strategies that include the LASSO, random forests and neural networks [19]-[22] are considered for prognostic variable selection. Our results indicate that the makeup of output prognostic models varies with the choice of feature selection scheme, but that all schemes tend to combine both types of features. This trend corroborates the potential complementarity of these two quantitation methodologies observed from exploratory analysis. The significance of prognostic models obtained from machine learning classification based on two-year survival status also demonstrates the viability of the latter for prognostic feature selection. Prognostic potential of the novel metabolic gradient variable in NSCLC is further demonstrated via multivariate Cox modeling of overall survival.
II. Quantitation methodologies
A. Structural modeling and metabolic gradients
In our structural quantitation approach the spatial distribution of the FDG-PET uptake observations Yi at voxel locations , i = 1,…, N, is evaluated in terms of a 3D ellipsoidal model. This structural pattern is parametrized by θ = (μ, Σ) for shape Σ (the uptake data covariance matrix) and location μ. Voxel location within this model is expressed in terms of ellipsoidal radius
| (1) |
We represent the uptake as a univariate function f of radial location in a model given by
| (2) |
where are realizations of a zero-mean white noise process. This function is evaluated by nonparametric regression conditional on ellipsoidal parameters θ. Model (2) is fitted with respect to both ellipsoidal shape and location and nonparametric regression curve to approximate f, so that the final profile signature is obtained from the ellipsoid that best fits the volume of interest (VOI) uptake data. Examples of this profile summary are shown in Figure 1. A well-concentrated uptake distribution like that of Case A in the figure, with higher metabolic activity at its core, will yield a tightly distributed and strictly decreasing pattern, whereas a spatially more heterogeneous distribution like that of Case B will be reflected by a more widespread pattern, especially towards smaller radii. This profile thus creates the opportunity to define structural heterogeneity variables by assessing goodnes-of-fit of the ellipsoidal model. As in previous works [13], [14], two definitions of intratumoral heterogeneity are derived from the fitted values :
| (3) |
Fig. 1.
Transverse and coronal views (left) of two NSCLC case studies. Case A is an 80 y.o., stage IB male patient who presented with a homogeneous intratumoral FDG-PET uptake pattern (top), and was alive when lost to follow-up after 1081 days. Case B, a 63 y.o., stage IV male patient presented with a more heterogeneous uptake pattern (bottom) died 114 days after diagnosis. The input (ellipsoidal) VOI is also shown in these view; analysis is carried out on a segmentation of this volume. The scatterplots show the corresponding uptake profile . Two profile curves are plotted: a stepwise unimodal regression (navy line) and a bitonic uptake profile curve (red line). Values for and obtained for both studies are provided in top-right corner.
The original approach [13] consisted in fitting a stepwise isotonic decreasing nonparametric least squares regression function to the uptake data (u, Y). Here a second, bitonic (i.e. unimodal) regression approximation of f is also obtained, using a similar approach, so as to allow capturing more diverse patterns, in the case of heterogeneous tumors especially. An algorithm to implement the unimodal fit is described in Appendix A. Based on these uptake profiles, we have the opportunity to derive metabolic gradients at each voxel xi, defined as gradients of the uptake profile curve at that radial location ui as in [15] by
| (4) |
This metabolic gradient principle is illustrated in Figure 2 for the two case studies presented in Figure 1. This representation of metabolic gradient follows a conceptualization of tumour growth [23] in which the metabolic activity is in time expected to decrease at the volume core and gradually increase outwards toward the volume boundary. A positive gradient therefore denotes an area of high or increasing activity. The center plot on top row of Figure 1 illustrates this scenario, showing high gradient values associated with a very active tumour core. These gradients gradually decrease toward zero as uptake activity decreases back to “background levels”. A negative gradient value thus corresponds to a locally decreasing metabolic activity, when peak activity is observed further away from that location, relative to the core.
Fig. 2.
FDG-PET metabolic gradients for the two NSCLC case studies of Figure 1. The leftmost plot scatterplots recall the stepwise regression curves (red) and show the corresponding smoothed profile curves (blue) from which metabolic gradients g(u) are derived. The corresponding uptake weighted gradients gw(u) are represented as functions of radial location u by dots in the central plots, which also show the normalized gradient curve gn(u) (solid line). Histograms describe the distributions of these normalized gradients; the broken vertical lines indicate the 25th, 50th and 90th quantiles for each sample.
A smoothing spline can be applied to the stepwise regression estimate before computing these gradients, in order to obtain a more regularised assessment, which we have done in this work. From this analysis, various gradient summaries can be considered to further describe intratumoral status and activity. Normalized gradients in particular can be computed on a universal scale by
| (5) |
We also consider here uptake-weighted gradients, defined by
| (6) |
in order to emphasize the contrast in contributions of areas of higher and lower uptake. Figure 2 illustrates such variations on the uptake-weighted and normalized gradient quantities for the two example cases.
Statistical summaries of these samples of metabolic gradients (either , or ) can be used to define as many novel structural variables for tumor characterization. Figure 2 shows examples of such summaries for the two case studies. In our results hereafter we will consider in particular the median, 95th percentile and maximum of the normalized metabolic gradients, as summarising features of the rate of metabolic change within the VOI. An implementation of this quantitation methodology is available in open source [24].
B. Textural quantitation
A group of variables commonly found in radiomics analyses was constituted as a reference group in order to position the above-mentioned set of structural descriptors in terms of tumor characterization and prognostic utility. This reference set includes a shape variable measuring VOI asphericity as the ratio between surface and volume [11], and a set of first-order, second-order and regional features [2], [7], [9], [25]-[27]. These quantities were implemented following definitions of the Image Biomarker Standardization Initiative [28]. Second-order features were computed from two-dimensional symmetric and normalized greyscale level co-occurrence matrices (GLCM) and averaged over all 13 directions in the volume. Regional assessment of size-zone variability and intensity variability were computed from grey-level size-zone matrices (GLSZM) and grey-level run-length matrices (GLRLM) [9], [26], [28]. The list of textural features used in this work is provided in Appendix B. Many other features may be considered but this group is representative of the most well-understood texture variables commonly found in the PET-specific radiomics literature.
Input VOIs were interpolated to the smallest voxel dimension (slice thickness), yielding cubic voxels with volume 3.27mm3, prior to delineation. Following delineation, uptake values Y were requantized into Q = 64 grey levels YQ by fixed bin number transformation
Experimentation indicated that although texture quantitation changed with the choice of Q, this did not meaningfully impact the general conclusions obtained from subsequent survival analyses. We have not considered the impact of alternative quantization techniques; this question is beyond the scope of this paper.
III. Experimental framework
A. Dataset and imaging protocol
1). Clinical cohort:
A cohort of primary non-small cell lung cancer patients imaged with [18F] FDG PET/CT over a three-year period starting in 2012 was analysed retrospectively. Routine information collected for this dataset included clinical stage, sex, and maximum standardised uptake value (SUVmax). Stage was reported in standard categories (Stage I; II; IIIA; IIIB, IV) as per International Association for the Study of Lung Cancer (IASCLC, 7th edition). Same-stage subjects received similar treatment. Our analysis combines Stages IIIA and IIIB so that stage information is broken down into stages I, II, III and IV. End points for survival analysis were death, date of last clinic appointment and date lost to follow up.
A set of 93 PET/CT patient examinations were evaluated, after exclusion of unsuitable cases (e.g. presenting no [18F] FDG uptake, no follow-up information, or with incomplete information). There were 38 female and 55 male patients with a mean age of 67 years (range: 36-85 years of age). There were 24 patients with stage I disease, 14 patients with stage II disease, 38 patients with stage III disease and 17 patients with stage IV disease. There were 55 deaths during the study interval, with a mean survival time of 332 days (range: 9-976 days) for this subcohort.
2). Imaging protocol:
All patients underwent standard [18F] FDG PET/CT on a GE Discovery VCT system prior to treatment, according to institutional protocol. Patients received an intravenous injection of [18F] FDG (340-400 MBq) one hour before PET acquisition and subsequent low dose noncontrast attenuation correction CT. For imaging, patients were positioned supine with arms up and scanned from the lower orbital margin to the upper third of the femora. All imaging data were reconstructed by ordered subset expectation maximization (using the manufacturer’s standard 3D reconstruction algorithm with 2 iterations and 28 subsets), have voxel dimensions 5.46875mm × 5.46875mm in the transverse plane, and slice thickness of 3.27mm.
3). Volumes of interest:
The location and boundaries of primary tumours were identified by a radiologist in the presence of study staff, and DICOM files were read into AMIDE [29] to draw an initial ellipsoidal VOI from raw count data around the entire FDG-PET tumor volume during this process. Observations were subsequently standardized into SUV by scaling raw values by the activity in injected dose per unit weight of the patient (kBq/g). Although these crude ellipsoidal VOIs were sufficient for structural model fitting [14], refined segmentations were required for texture analyses, which are reputedly sensitive to tumor delineation [7], [9], [28]. To this end, hard thresholding [30], [31] was applied to the initial VOI. After experimentation a threshold of 20% of local SUVpeak was used (where local SUVpeak is defined as the mean SUV in a 9-voxel cubic region centered at the SUVmax voxel; various other definitions exist [32], [33] but are not considered here). A major criticism for hard thresholding is the undesired elimination of inner voxels with lower activity [7]. This was addressed here by subsequently filling the threshold masks using an ad-hoc approach, so that a final VOI mask was created that contained all voxels inside the outer threshold-volume boundaries. The thresholding described above therefore only applies to the boundaries of the VOI.
Various thresholds were examined by comparison of the corresponding textural and structural feature quantitations, as well as by visual assessment of the derived VOIs. We observed a strong alignment for many features for thresholds ranging between 15% and 45% of SUVpeak, suggesting that this modified thresholding approach is reliable for quantitation (this is illustrated in Appendix D). An analysis of the impact of other segmentation methodologies on final quantitation will be carried out but is outside the scope of this paper.
B. Workflow for prognostic utility assessment
Figure 3 depicts the analysis workflow used for the identification of relevant prognostic features and assessment of their prognostic potential. Details for each type of analysis are provided further. Quantitative analysis of the VOIs was performed as described in Section II. From this step a dataset of P = 134 clinical, structural and textural features (listed in Appendix B) computed for N = 93 primary volumes was retained for further analysis, performed in the following three steps. The first two steps implement classification with respect to 2-year survival (binary) outcome, on a subcohort of N=84 subjects (hence excluding 9 subjects with undetermined 2-year outcome, due to loss to follow-up within the first two years). Given the sample size available, these steps were performed using Monte Carlo cross-validation (MCCV) [34] on the basis of a randomized 70%-30% train/test split without replacement, so as to generate training and test sets with reasonable sample sizes (Ntrain = 59 and Ntest = 25). The third step implements Cox regression where the endpoint is overall survival.
Fig. 3.
Analysis pipeline used for prognostic feature selection and assessment of their prognostic potential. Blocks II and III constitute two complementary analyses; one does not inform the other. The faint arrow from block II to block III illustrates the option of using the classification output to reduce the computational load of an exhaustive best subset selection for this set of features.
1). Preliminary feature elimination:
Preliminary feature elimination was performed on M = 100 MCCV repetitions of a Boruta analysis (block I of Figure 3), further described in Section III-C, so as to reduce the amount of redundancy in the initial dataset. This yielded a reduced set of P = 54 candidate features, which were then considered for prognostic assessment in two different ways, as follows.
2). Two-year survival classification:
A classification analysis (block II of Figure 3) was carried out for prediction of 2-year survival, using M = 1,000 MCCV repetitions. A number of classifiers, described in Section III-D, were used to perform feature selection from the cross-validation training sets. Here, variability in feature selections from each classification technique is to be expected. The relevance of candidate features for prognostic evaluation was therefore measured (in part) on the basis of selection rates across all classifiers (a high number of occurrences, or popularity, of a given feature across all selected subsets suggesting a higher prognostic potential). Prediction performance of each classifier was measured on the cross-validation test sets, in order to evaluate the viability of feature subsets derived from them (assuming that feature subsets used in a poorly predictive model would be less likely to yield strong prognostic value).
3). Overall survival regression:
Best subset selection [21] was carried out for the full cohort out of all overall survival Cox regression models [35] (block III of Figure 3). The dataset obtained following preliminary feature elimination (block I of Figure 3) was further reduced to a subset of P = 21 covariates, by removing the 10 least popular textural features among all classifiers at the 2-year survival horizon (block II of Figure 3). This step is optional but helps reduce the computational burden associated with this exhaustive evaluation of 2P − 1 models. The objective of this analysis was to explore the make-up of high-performing Cox regression models, in particular whether or not they tended to include structural variables, and in terms of statistical significance of the variables they involved. All continuous variables were standardized prior to multivariate Cox regression analyses.
4). Implementation:
All quantitative and statistical analyses were performed in R [36]. An open-source implementation of the structural and textural quantitation techniques is available at https://github.com/ericwol/mia [24].
C. Preliminary feature elimination
Many approaches may be considered for initial feature elimination in order to narrow down the input set of variables for model selection. Pre-selection is often done on the basis of correlation or colinearity assessment, or univariate performance assessment [34], [37]-[40]. However based on such approaches, the qualification of presumably redundant features is somewhat arbitrary, and presents the risk of removing complementary information, leading to suboptimal prediction performance [37].
Instead, we considered the Boruta algorithm [40], based on random forest classification, which performs sequential feature removal by comparing the importance of original subsets of features with that of permuted copies of these, called shadows. Features that perform significantly better than their shadow are labeled as confirmed, whilst those yielding significantly worst importance than their shadow are deemed least relevant features and are eliminated in a sequential manner. The algorithm converges when importance for classification becomes stable. This procedure outputs labels for each feature as either confirmed, tentative or rejected, and exits when either all features are labels as confirmed or a maximum number of iterations has been reached. In our setting we only excluded features that were systematically rejected by the Boruta analysis across all Monte Carlo repetitions, retaining a total of 5 clinical (stage, SUVmax, SUVmean, TLG, volume), 5 structural (, , and the median, 95th percentile and maximum normalized gradients), and 44 textural features (identified in Figure 4 and Appendix B). Potential selection bias introduced by this classification step, which is carried out on the data being analysed in further steps, is discussed in Appendix C.
Fig. 4.
Pearson correlation matrix for the set of 54 final input clinical and quantitative variables for the cohort. Strong positive (+0.7 or above) and negative (−0.7 or lower) correlations are indicated with respectively brighter and darker voxels. Weaker correlations between −0.7 and +0.7 were ”greyed out” to the predominant grey level in this image. Two sets of lines are provided to aid identifying the clinical, structural and textural features (respectively left-most, middle and right-most groups).
A visualization of the Pearson correlation matrix of the set of P = 54 variables retained following preliminary elimination, highlighting strong (negative and positive) correlations between quantitative features, is provided in Figure 4. This matrix illustrates that the selected structural features are correlated with neither of the other feature frames. We also note that some higher-order (GLRLM and GLSZM) textural features are strongly correlated with volumetric and ellipsoidal morphologic features.
D. Two-year survival classification
1). Classifiers:
The classifiers used for feature selection were forward- and backward-stepwise selections for multivariate logistic regression [36], the LASSO [21], [41], the elastic net [21], random forests [42], neural networks [43], [44], and support vector machines (SVM) [45], [46].
2). Settings and tuning:
The smoothing parameter was set via ten-fold cross-validation for the LASSO [41]. Random forests were tuned for the number of variables randomly sampled as candidates at each split, and using 500 trees (choice of the latter did not impact results significantly) [42]. A single-layer feed-forward neural network with 7 nodes was used with no pre-processing, a nonlinear activation function, and weight decay held constant at 0.5. [43]-[45]. SVMs were applied to scaled input data and using linear kernels, and were tuned for regularization cost [39], [47]. In all cases tuning was achieved on the basis of ten-fold cross-validation of the training set [42], [43], [48].
3). Performance and variable selection rates:
Feature selection was performed following model training for all classifiers. Features subsets are obtained naturally for the stepwise logistic regressions, LASSO and elastic net methods. SVM classification was used for classification accuracy benchmarking purposes but not for feature selection. Recursive feature elimination (RFE) [34], [39] was used to select final feature subsets for the neural network and random forest models, on the basis of variable importance measured respectively by the Olden index [49], [50] and the Gini index [20], [38], [45]. For both methods final model size was determined on the basis of optimal training-set prediction accuracy. Classification performance was evaluated for all models on the basis of area under the ROC curve (AUC).
IV. Results
Results obtained from the 2-year survival classification and the overall survival regression analyses are presented in Sections IV-A and IV-B respectively.
A. Classification-based feature selection
1). Classification performance:
Figure 5 shows averaged ROC curves obtained from Monte Carlo cross-validation of classification-based test set outcome prediction, using feature subsets derived from the corresponding training sets. These curves illustrate that classification performance is comparable across many of the techniques considered. Associated AUCs suggest that some of these methods yield viable prediction performance, in particular the elastic-net, neural networks, the LASSO and random forests, with respective AUCs of 76.8%, 76.7%, 75.2% and 74.3%. SVM also performed reasonably well (AUC = 73.2%) prior to RFE but with decreasing performance (AUC = 69.6%) following it. Figure 8 in Appendix C provides information on the typical variability in AUC.
Fig. 5.

Monte Carlo cross-validated ROC curves (and corresponding AUCs) of test-set prediction performance based on training-set model fits for, from smallest to largest AUC: backward logistic regression (’B-LOG’), forward logistic regression (’F-LOG’), the LASSO, RFE-optimal neural networks (’NN.RFE’), elastic net (’E-NET’) and RFE-optimal random forests (’RF.RFE’).
Fig. 8.

Distributions of AUCs with (left) and without (right) preliminary feature elimination (PFE), i.e. block I of Figure 3. This figure also illustrates the typical range of AUCs for each model, with the box boundaries indicating interquartile range.
2). Feature selection trends:
Table I shows the six features most often selected out of all Monte Carlo cross-validation experiments for all classifiers except SVM. The table also indicates that forward logistic regression, the LASSO and random forests yielded reasonably small prognostic models (with average model sizes of 6.6, 7.7 and 10.8 respectively). Over all classification outputs, six features were selected in at least 40% of Monte Carlo repetitions: stage (85.1%), SUVmax (57.2%), max.gradientHIST (47.7%), gn,0.95 (46.1%), TLG (42.7%) and gn,max (42.1%). These results suggest that structural quantitation has prognostic potential, in conjunction with textural features.
TABLE I.
Six most frequently selected features (with corresponding selection rates) across 1,000 Monte Carlo cross-validation training sets, based on forward-stepwise selection (F-LOG), backward-stepwise selection (B-LOG), the LASSO, elastic net (E-NET), random forests (RF.RFE), and neural networks (NN.RFE). Corresponding Monte Carlo average model sizes are given in brackets in the top row. gn,max denotes the maximum normalized gradient. See Appendix B for shorthand notations.
| F-LOG (6.6) | B-LOG (14.9) | LASSO (7.7) | E-NET (31.2) | NN.RFE (32.7) | RF.RFE (10.8) |
|---|---|---|---|---|---|
| stage (89.3%) | hom.normGLCM (66.3%) | SUVmax (83.8%) | stage (98.0%) | i.c.1GLCM (99.5%) | stage (92.2%) |
| z.s.varGLSZM (40.1%) | stage (63.2%) | stage (71.9%) | SUVmax (94.6%) | stage (96.2%) | TLG (76.2%) |
| max.grHIST (38.2%) | i.d.f.normGLCM (60.6%) | gn,max (66.1%) | max.grHIST (89.8%) | volume (96.0%) | c.t GLCM (61.0%) |
| gn,max (36.7%) | homGLCM (54.3%) | z.s.entGLSZM (58.4%) | gn,median (87.7%) | SUVmax (92.6) | s.v GLCM (57.9%) |
| r.l.varGLRLM (25.9%) | i.d.fGLCM (45.5%) | gn, median (46.4%) | gn, max (84.1%) | r.l.n.uGLRLM (92.4%) | gn, max (45.1%) |
| kurtosisHIST (22.3%) | diff.av.GLCM (44.8%) | max.grHIST (38.9%) | (83.6%) | λleast (91.4%) | z.s.entGLSZM (43.1%) |
Figure 6 confirms this observation, showing that for 75% of the Monte Carlo resamples, the classifiers tend to combine both structural and textural features in their optimal model (structural features being selected in 76% of Monte Carlo classifications across all models). Depending on the cohort and selection scheme, different combinations of features may yield comparably high levels of classification performance. This was also observed with multivariate Cox proportional hazards model analyses. This tendency towards combining both feature frames indicates that the spatial modeling methodology proposed here is found useful in complement of standard radiomic features for prognostic evaluation in NSCLC.
Fig. 6.

Summary of selection trends (from left to right: models including only clinical features, both clinical and structural features, both clinical and texture features, and all three types of features) from MCCV training sets for various model selection schemes. Structural features are selected by the classifiers in 76% of MCCV-selected models, a vast majority of the time combined with textural features.
An example of such a multivariate model derived from a classifier that combines structural and textural features is given in Table II. This model was selected by the LASSO for one of the MCCV samples. It demonstrates how classifiers can identify models where both quantitative feature frames yield significant prognostic value in a Cox regression analysis with overall survival as the endpoint.
TABLE II.
Multivariate Cox proportional hazards analysis of overall survival based on one of the models selected by LASSO during MCCV Classification for 2-year survival. All covariates were normalized prior to Cox analysis. The p-values of covariates significant at the 5% significance level are indicated in bold (Concordance = 73.9%).
| Variable | Hazard ratio | p-value |
|---|---|---|
| stage (II) | 8.09 | 0.0023 |
| stage (III) | 8.65 | 0.0007 |
| stage (IV) | 9.80 | 0.0007 |
| SUVmax | 1.36 | 0.0640 |
| gn, max | 0.64 | 0.0376 |
| correlationGLCM | 1.37 | 0.0920 |
| run length varianceGLRLM | 0.70 | 0.0438 |
| run entropyGLRLM | 0.86 | 0.4167 |
The results presented in this section highlight a level of complementarity among these many features, by which several different feature combinations may lead to comparable prognostic performance. Tables I and II and Figure 6 demonstrate that the classifiers used here tend to exploit the complementarity between textural and structural quantitations, and in particular that metabolic gradient quantitation was found useful in characterizing patient risk in conjunction with textural features in multivariate models.
B. Overall survival outcomes analyses
1). Univariate analyses:
Bonferroni-corrected univariate analyses carried out at the 5% significance level for overall survival (thus using 0.05/P = 9.259259 × 10−4 as significant threshold) indicated that the following 12 features (out of the P = 54 covariates retained following preliminary feature elimination) were significant univariate risk factors: stage, SUVmax, SUVmean, TLG, volume, gn,max, λminor, λleast, minor axis length, least axis length, zone-size entropyGLSZM, all with p < 8.3 × 10−4. We note in particular that prognostic significance is found here for the novel maximum normalized metabolic gradient variable.
2). Multivariate prognostic modeling:
Many multivariate models may be constructed that include combinations of structural and textural features. An exhaustive best-subset selection [21] was performed over all multivariate Cox models, with respect to the concordance index . Concordance measures a weighted average of the proportions of times, across all subjects, that the predicted risk score for a given subject is effectively higher than that of each other subject with a longer survival. The predicted risk scores are given by the fitted logistic regression component of the model. This metric therefore assesses the model’s ability for accurate relative risk prediction within the cohort.
Best subset selection is an exhaustive approach that evaluates all potential models (i.e. feature combinations). The 2-year classification analysis can be used to further reduce the set of candidate features in order to speed up this procedure, for example by excluding those features with the smallest selection rates across all classifiers considered. Here we have used this approach and performed best subset selection from a candidate set comprising of the 5 clinical and 5 structural features, and the 11 most popular textural features out of all classification-based feature selections combined (i.e. all Monte Carlo classifications based on forward and backward stepwise logistic regression, the LASSO and elastic net, neural network RFE, and random forest RFE).
The concordance-optimal 6-feature prognostic model was considered, which contains stage, SUVmax, gn, 0.95, sum.varGLCM, correlationGLCM, and run.length.varianceGLRLM. Larger models also yielded high AIC and concordance, but with a loss of statistical significance (likely due to some degree of information overlap) of some or many of their covariates, which inhibits their interpretability. Table III presents the results of this 6-feature regression analysis and shows that all covariates were statistically significant at the 5% level (assessment based on median bootstrapped p-values). This suggests that these features are likely to have potential for baseline prognosis in NSCLC. The likelihood ratio test confirmed statistical significance of the additional contribution of the gradient metric (p=0.0188). The p-values here are provided as a guide to highlight potential factors that could merit formal validation in a prospective study, where formal statistical significance could be assessed.
TABLE III.
Multivariate Cox proportional hazards analysis based on 6-variable model with highest concordance (values are bootstrap medians over 1,000 resamples). All covariates were normalized prior to Cox analysis. The p-values of covariates significant at the 5% significance level are indicated in bold (bootstrap median concordance = 79.3%).
| Variable | Bootstrap median hazard ratio |
Bootstrap median p-value |
|---|---|---|
| stage (II) | 10.75 | 0.0023 |
| stage (III) | 11.59 | 0.0005 |
| stage (IV) | 15.19 | 0.0002 |
| SUVmax | 1.64 | 0.0044 |
| gn,0.95 | 0.63 | 0.0301 |
| sum varianceGLCM | 0.58 | 0.0037 |
| correlationGLCM | 1.77 | 0.0067 |
| run length varianceGLRLM | 0.59 | 0.0078 |
Figure 7 further illustrates prognostic utility of this model, with a clear separation of survival curves when the cohort is divided up into reference-risk and higher-risk strata on the basis of log-rank test optimization. More specifically, the cohort is divided into two risk groups, via a grid search, so as to maximize the log-rank test statistic for given splits of the linear predictors obtained from the Cox model (where the splits are achieved with respect to quantiles of the sample of linear predictor values) [51]. Other splitting strategies could be used instead that would yield similar separation. The reference- and higher-risk strata are broken down as follows: distribution of stages for reference-risk group (67 patients, 34 of which alive at last follow-up): 24 stage I, 9 stage II, 29 stage III, 5 stage IV; distribution of stages for higher-risk group (26 patients, 1 of which alive at last follow-up): 0 stage I, stage 5 II, 9 stage III, 12 stage IV. This figure also highlights the particular contribution of the metabolic gradient variable, which assists in improving Kaplan-Meier separation by adding emphasis on higher-risk patients. The likelihood ratio test confirms this significant contribution (p = 0.0188).
Fig. 7.

Kaplan-Meier analyses for the prognostic model of Table III without (solid blue) and with (dashed red) metabolic gradient information gn,0.95. The 6-feature model yields statistically significant stratification of patients between high-risk (lower curve) and low-risk (higher curve) groups (p = 1.3 × 10−15), when this split is performed on the basis of a high log-rank test statistic. The likelihood ratio test between the two Cox models also indicated that the metabolic gradient variable constitutes a statistically significant addition to the prognostic model (p = 0.0188). Inset: distributions of survival times for the two risk groups defined by the full model of Table III.
V. Conclusion
In this work we have considered the prognostic value of novel structural variables derived from statistical modeling of the FDG-PET uptake information in NSCLC. Our analyses explored the role and potential complementary of these structural descriptors of intratumoral metabolic activity with respect to a number of conventional radiomics variables. The structural and textural feature frames are not strongly correlated, suggesting that these two approaches capture different parts of the feature information space and may be complementary for tumor characterization. Survival analyses demonstrated that covariates from both feature spaces can be used together in the assessment of NSCLC imaged with FDG-PET/CT when combined in multivariate prognostic and risk stratification models.
Two-year classification and Cox regression analyses formed complementary assessment of prognostic utility of the structural modeling methodology we propose. In particular, cross-validated 2-year survival outcome classification based on a number of standard techniques yielded similar prediction performance on average and, despite differences in classification models, most classifiers elected structural variables consistently. The classification analysis demonstrated that high prediction accuracy can be achieved when adding structural variables to conventional textural features, and that this complementarity tends to be exploited by the machine learning classifiers. Furthermore, considering overall survival as endpoint, statistical significance of the proposed metabolic gradient assessment was obtained from univariate Cox model analysis. Significant multivariate Cox models and Kaplan-Meier survival curve estimates were also obtained combining both types of features, along with conventional clinical information on histological stage and SUVmax. These results demonstrate the prognostic potential of the proposed measure of metabolic gradient for FDG-PET imaging of NSCLC.
These results also suggest methodological relevance of machine learning classifiers based on two-year patient survival status for biomarker selection. There was heterogeneity in the composition of prognostic models derived from the various feature selection schemes on this basis. one cause for this variability may be the way different combinations of (structural and textural) features allow to capture important aspects of the tumor metabolic activity. However these selections also differed from one classifier to another for each Monte Carlo cross-validation resample of the lung cohort. This indicates that the choice of selection scheme also matters for prognostic model selection.
Further analyses will be carried out to assess how complementarity of structural and textural variables, or relationships between these two feature frames, may be affected by changes in image statistics arising e.g. from different reconstruction techniques and noise characteristics [4], [15], [52], [53]. Settings used for texture quantitation can also impact analysis [9], [27] and will be further considered. For example it is known that factors such as the choice of grey level discretization resolution or interpolation scheme used prior to VOI re-segmentation may impact texture analysis. For this cohort we observed a median correlation of 97.2% between non-interpolated and trilinearly interpolated VOIs, and this choice did not affect prognostic significance of the texture features considered here, but other variations of other settings could be important. Our model-based approach for structural assessment is not as sensitive to calibration issues. This paper focused on the question of feature selection for a given volume and the impact of tumor delineation will be considered elsewhere. We would expect any quantification that is statistically stable to not differ significantly between two VOIs that have reasonable overlap, as is the case for the model-based heterogeneity assessment used here [14].
The data presented here is limited in scope. The results provide an indication of the role of metabolic structure and texture variables in the prognostic assessment of NSCLC patients with FDG-PET. A prospective study will be needed to validate these findings. Future work will explore the potential of feature selection techniques that also take survival time into account, such as survival random forests [54] and variations on the LASSO [19], or, as a reviewer indicated, by embedding the Cox model construction in the two-year survival classification process. Again a prospective trail would be valuable in properly evaluating the practical clinical prognostic potential indicated by our analysis. Further attention will also be given to interpretability and efficient use of statistical summaries of the model-based metabolic gradients, in the context of NSCLC.
Acknowledgment
We thank the reviewers for their insightful and constructive feedback. This work was supported by the Science Foundation Ireland (SFI-PI 11/1027) and the National Cancer Institute (NCI R33-CA225310 and P01-CA042045).
APPENDIX
A. Description of algorithm
- Initial ellipsoid location and orientation are calculated using weighted mean and covariance matrix (μ0, Σ0) of {Xi}. At each iteration, ellipsoidal coordinates are updated via spectral decomposition
to generate ellipsoidal coordinates for voxels i = 1,…, n, in the form
where θ = (μ, Σ). A monotonically decreasing uptake profile is obtained nonparametrically from mean isotonic regression [55] for heterogeneity evaluation [13]. For bitonic regression, the regression function for model (2) is initialized by locating data mode (based on smoothed data curve, using splines). The data distribution is divided into two isotonic segments adaptively about its mode, say a monotonically increasing segment (u+ (θ), f+) and a monotonically decreasing one (u−(θ), f−). A final, smooth nonparametric regression function is obtained e.g. by means of optimal B-spline regression w.r.t. ellipsoidal structure θ (mean and shape).
Although optional, a final regular unimodal curve, e.g. using a smoothing spline, may be computed in order to compute metabolic uptake profile gradients for each voxel. Gradients can be evaluated e.g. using central divided differences from either of the iso- and bitonic regression curves derived above.
B. List of features
The original quantitation dataset comprised of the following features (cf. [28] for definitions of textural features):
Clinical features: stage, sex, age, SUVmax, SUVmean, TLG, volume, local SUVpeak.
Structural features: , , and summaries of normalized metabolic gradients gn at percentiles 30, 50, 70, 75, 80, 85, 90, 95, and 100.
- Textural features:
- Statistical features [28, Section 2.1]: energy and root mean square.
- All 23 intensity histogram features of [28, Section 2.2].
- Morphological features [28, Section 2.5]: surface-to-volume ratio, compactness 1 and 2, spherical disproportion, sphericity, asphericity, λmajor, λminor, λleast, major axis length, minor axis length, least axis length, PCA elongation, PCA flatness.
- All 25 grey level co-occurrence matrix (GLCM) based features of [28, Section 2.6].
- All 16 grey level run-length matrix (GLRLM) based features of [28, Section 2.7].
- All 16 grey level size-zone matrix (GLSZM) based features of [28, Section 2.8].
The following are the 54 candidate variables retained for analysis after performing preliminary feature elimination (with shorthand notation specified when useful for Table I).
5 clinical features: stage, SUVmax, SUVmean, TLG, volume.
5 structural features: , , gn,median, gn,0.95, gn, max (resp. denoting the median, 95th percentile and maximum of normalized gradients).
- 44 textural features:
- 1 statistical feature: energy.
- 3 histogram features: kurtosis, 10th percentile, maximum histogram gradient (max.grHIST).
- 6 morphologic features: λmajor, λminor, λleast, major axis length, minor axis length, least axis length.
- 18 GLCM-based features: entropy, difference average (diff.av.GLCM), difference variance, difference entropy, sum variance (s.v.GLCM), sum entropy, contrast, dissimilarity, homogeneity (homGLCM), homogeneity normalised (hom.normGLCM), inverse difference moment (i.d.fGLCM), inverse difference moment normalised (i.d.f.normGLCM), inverse variance, correlation (corrGLCM), cluster tendency (c.t.GLCM), cluster prominence, first measure of information correlation (i.c.1GLCM), second measure of information correlation.
- 6 GLRLM-based features: short runs emphasis, grey level non-uniformity (g.l.n.uGLRLM), run length non-uniformity (r.l.n.uGLRLM), run percentage (run.perGLRLM), run length variance (r.l.varGLRLM), run entropy.
- 10 GLSZM-based features: large zone emphasis, large zone low grey level emphasis, large zone high grey level emphasis, grey level non-uniformity, grey level non-uniformity normalised, zone size nonuniformity, zone percentage, grey level variance, zone size variance (z.s.varGLSZM), zone size entropy (z.s.entGLSZM).
An implementation of all above features is available in an open-source R package [24].
C. Selection bias from preliminary feature elimination
Potential selection bias introduced by the preliminary feature elimination (PFE) (block I of the analysis, Figure 3) may be evaluated by comparing results with a classification analysis conducted without PFE. Figure 8 shows that the distributions of AUCs obtained from each of the two methodologies are largely comparable. Furthermore, on the basis of two-sample nonparametric (Wilcoxon) log-rank tests, no statistically significant difference in median AUC with and without PFE was found, except for backward stepwise selection (which would be expected due to its mechanism) and elastic net (which entails a direct comparison of regression models of different dimensions that may affect the way regularization is calibrated), with respective p-values of 0.0075 and 0.0054. Figure 8 also illustrates the typical range of AUCs for each classifier in block II of the analysis pipeline. A distribution of selection trends similar to that of Figure 6 was observed when performing analysis without PFE, in particular with 77.8% of models including both textural and structural features, and 21.7% of models containing no structural features. Selection bias could be further evaluated on independent validation samples.
D. Correlations with respect to segmentation threshold
Figure 9 illustrates that some quantitations are more sensitive than others to segmentations varying with hard thresholding level, but that many features (structural and textural) provide similar information across the range of thresholds used, when hard-thresholding segmentation is performed proportionally to SUVpeak.
Fig. 9.
Effect of segmentation based on hard thresholding made proportional to SUVpeak. The curves indicate correlations in measurements made for high (red diamonds) and low (black crosses) thresholds that are close in range, or further apart (15%-difference: blue squares; 35%-difference: navy dots). Many of the structural and textural quantitations show strong alignment with respect to varying thresholds.
Contributor Information
Eric Wolsztynski, Department of Statistics, School of Mathematical Sciences, University College Cork, T12 XY86, Ireland..
Janet O’Sullivan, Department of Statistics, School of Mathematical Sciences, University College Cork, T12 XY86, Ireland..
Nicola M Hughes, Royal College of Surgeons in Ireland, Dublin, Ireland..
Tian Mou, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna, Sweden..
Peter Murphy, PET/CT Unit (Alliance Medical), Cork University Hospital, Cork, Ireland..
Finbarr O’Sullivan, Department of Statistics, School of Mathematical Sciences, University College Cork, T12 XY86, Ireland..
Kevin O’Regan, Department of Radiology, Cork University Hospital, Cork, Ireland..
References
- [1].Hatt M, le Rest CC, van Baardwijk A, Lambin P, Pradier O, and Visvikis D, “Impact of tumor size and tracer uptake heterogeneity in 18F-FDG PET and CT Non Small Cell Lung Cancer tumor delineation,” J. Nucl. Med, vol. 52, no. 11, pp. 1690–7, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Tixier F, Hatt M, Valla C, Fleyry V, Lamour C, Ezzouhri S, Ingrand P, Perdrisot R, Visvikis D, and Rest CCL, “Visual versus quantitative assessment of intratumor 18F-FDG PET uptake heterogeneity: Prognostic value in non–small cell lung cancer,” J Nucl Med, vol. 55, no. 8, pp. 1235–1241, 2014. [DOI] [PubMed] [Google Scholar]
- [3].Orlhac F, Soussan M, Chouahnia K, Martinod E, and Buvat I, “18F-FDG PET-derived textural indices reflect tissue-specific uptake pattern in non–small cell lung cancer,” PLoS ONE, vol. 10, no. 12, p. 16pp, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Nyflot M, Yang F, Byrd D, Bowen S, Sandison G, and Kinahan P, “Quantitative radiomics: impact of stochastic effects on textural feature analysis implies the need for standards,” J. Med. Imaging (Bellingham), vol. 2, no. 4, p. 041022, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Gillies RJ, Kinahan PE, and Hricak HH, “Radiomics: Images are more than pictures, they are data,” Radiology, vol. 278, no. 2, pp. 563–577, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Bogowicz M, Leijenaar R, Tanadini-Lang S, Riesterer O, Pruschy M, Studer G, Unkelbach J, Guckenberger M, Konukoglu E, and Lam-bin P, “Post-radiochemotherapy PET radiomics in head and neck cancer–the influence of radiomics implementation on the reproducibility of local control tumor models,” Rad Onc, vol. 125, no. 3, pp. 385–391, 2017. [DOI] [PubMed] [Google Scholar]
- [7].Hatt M, Tixier F, Pierce L, Kinahan PE, Rest CCL, and Visvikis D, “Characterization of PET/CT images using texture analysis: the past, the present… any future?” Eur J Nucl Med Mol Imaging, vol. 44, pp. 151–165, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Lambin P, Leijenaar R, Deist T, Peerlings J, de Jong E, van Timmeren J, Sanduleanu S, Larue R, Even A, Jochems A, van Wijk Y, Woodruff H, van Soest J, Lustberg T, Roelofs E, van Elmpt W, Dekker A, Mottaghy FM, Wildberger J, and Walsh S, “Radiomics: the bridge between medical imaging and personalized medicine,” Nature Reviews Clinical Oncology, vol. 14, pp. 749–762, 2017. [DOI] [PubMed] [Google Scholar]
- [9].Vallières M, Zwanenburg A, Badic B, Rest CCL, Visvikis D, and Hatt M, “Responsible radiomics research for faster clinical translation,” J Nucl Med, vol. 59, no. 2, pp. 189–193, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Vallières M, Freeman C, Skamene S, and Naqa IE, “A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities,” Phys Med Biol, vol. 60, no. 4, pp. 5471–5496, 2015. [DOI] [PubMed] [Google Scholar]
- [11].Apostolova I, Ego K, Steffen I, Buchert R, Wertzel H, Achenbach H, Riedel S, Schreiber J, Schultz M, Furth C, Derlin T, Amthauer H, Hofheinz F, and Kalinski T, “The asphericity of the metabolic tumour volume in NSCLC: Correlation with histopathology and molecular markers,” Eur J Nucl Med Mol Imaging, vol. 43, no. 13, pp. 2360–2373, 2016. [DOI] [PubMed] [Google Scholar]
- [12].Sollini M, Cozzi L, Antunovic L, Chiti A, and Kirienko M, “PET radiomics in NSCLC: state of the art and a proposal for harmonization of methodology,” Sci Rep, vol. 23, no. 7 (1), p. 358, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Eary J, O’Sullivan F, O’Sullivan J, and Conrad E, “Spatial heterogeneity in sarcoma 18F-FDG uptake as a predictor of patient outcome,” J Nucl Med, vol. 49, no. 12, pp. 1973–1979, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].O’Sullivan F, Wolsztynski E, O’Sullivan J, Richards T, Conrad E, and Eary J, “A statistical modelling approach to the analysis of spatial patterns of FDG-PET uptake in human sarcoma,” IEEE Trans. Med. Imag, vol. 30, no. 12, pp. 2059–2071, 2011. [Google Scholar]
- [15].Wolsztynski E, O’Sullivan F, Keyes E, O’Sullivan J, and Eary J, “Positron emission tomography-based assessment of metabolic gradient and other prognostic features in sarcoma,” J. Med. Imag, vol. 5, no. 2, p. 024502 (16 pages), 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Hughes N, Mou T, O’Regan K, Murphy P, O’Sullivan J, Wolsztynski E, Huang J, Kennedy M, Eary J, and O’Sullivan F, “Prognostic value of tumor heterogeneity evaluated by FDG-PET/CT imaging in lung cancer patients,” in J Clin Oncol, vol. 34 (suppl), no. 11564, 2016. [Google Scholar]
- [17].——, “Tumor heterogeneity measurement using [18F] FDG PET/CT shows prognostic value in patients with non-small cell lung cancer,” European Journal of Hybrid Imaging, 2018, in Press. [Google Scholar]
- [18].Wolsztynski E, O’Sullivan F, O’Sullivan J, and Eary J, “Localized metabolic gradient as an independent prognostic variable from FDG-PET imaging of sarcoma,” in J Nucl Med, vol. 58 (suppl 1), 2017, p. 22. [Google Scholar]
- [19].Tibshirani R, “The LASSO method for variable selection in the Cox model,” Statistics in Medicine, vol. 16, pp. 385–395, 1997. [DOI] [PubMed] [Google Scholar]
- [20].Dimopoulos MGI and Lek S, “Review and comparison of methods to study the contribution of variables in artificial neural network models,” Ecological Modelling, vol. 160, pp. 249–264, 2003. [Google Scholar]
- [21].Hastie T, Tibshirani R, and Friedman J, The Elements of Statistical Learning. Springer, 2009. [Google Scholar]
- [22].Hatt M, Parmar C, Qi J, and Naqa IE, “Machine (deep) learning methods for image processing and radiomics,” IEEE Trans Rad Plasma Med Sciences, vol. 3, no. 3, pp. 104–108, 2019. [Google Scholar]
- [23].Roose T, Chapman SJ, and Maini PK, “Mathematical models of avascular cancer,” SIAM Rev., vol. 49, no. 2, pp. 179–208, 2007. [Google Scholar]
- [24].Wolsztynski E, “Medical imaging analysis (mia),” https://github.com/ericwol/mia, 2017.
- [25].Haralick RM, Shanmugam K, and Dinstein I, “Textural features for image classification,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-3, no. 6, pp. 610–621, 1973. [Google Scholar]
- [26].Tixier F, Rest CCL, Hatt M, Albarghach N, Pradier O, Metges J-P, Corcos L, and Visvikis D, “Intratumor heterogeneity characterized by textural features on baseline 18F-FDG PET images predicts response to concomitant radiochemotherapy in esophageal cancer,” J Nucl Med, vol. 52, no. 3, pp. 369–378, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Leijenaar RTH, Carvalho S, Velazquez ER, van Elmpt WJC, Parmar C, Hoekstra OS, Hoekstra CJ, Boellaard R, Dekker ALAJ, Gillies RJ, Aerts HJWL, and Lambin P, “Stability of FDG-PET radiomics features: An integrated analysis of test-retest and inter-observer variability,” Acta Oncologica, vol. 52, no. 7, pp. 1391–1397, 2013, pMID: 24047337. [Online]. Available: 10.3109/0284186X.2013.812798 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Zwanenburg A, Leger S, Vallières M, and Löck S, “Image biomarker standardisation initiative - feature definitions,” CoRR, vol. abs/1612.07003, 2016. [Online]. Available: http://arxiv.org/abs/1612.07003 [Google Scholar]
- [29].Loening A and Gambhir S, “Amide: A free software tool for multi-modality medical image analysis,” Molecular Imaging, vol. 2, no. 3, pp. 131–137, 2003. [DOI] [PubMed] [Google Scholar]
- [30].Cook GJ, Yip C, Siddique M, Goh V, Chicklore S, Roy A, Marsden P, Ahmad S, and Landau D, “Are pretreatment 18F-FDG PET tumor textural features in non-small cell lung cancer associated with response and survival after chemoradiotherapy?” J Nucl Med, vol. 54, no. 1, pp. 19–26, 2013. [DOI] [PubMed] [Google Scholar]
- [31].Cook GJ, O’Brien ME, Siddique M, Chicklore S, Loi HY, Sharma B, Punwani R, Bassett P, Goh V, and Chua S, “Non–small cell lung cancer treated with erlotinib: Heterogeneity of 18F-FDG uptake at PET–association with treatment response and prognosis,” Radiology, vol. 276, no. 3, pp. 883–893, 2015. [DOI] [PubMed] [Google Scholar]
- [32].Wahl R, Jacene H, Kasamon Y, and Lodge M, “From RECIST to PERCIST: Evolving considerations for PET response criteria in solid tumors,” J Nucl Med, vol. 50, no. 1, pp. 122S–150S, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Vanderhoek M, Perlman S, and Jeraj R, “Impact of the definition of peak standardized uptake value on quantification of treatment response,” J Nucl Med, vol. 53, no. 1, pp. 4–11, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Kuhn M and Johnson K, Applied Predictive Modeling. New York: Springer, 2013. [Google Scholar]
- [35].Therneau T, “A package for survival analysis in S,” 2015, R package version 2.38. [Online]. Available: https://CRAN.R-project.org/package=survival
- [36].R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2016. [Online]. Available: https://www.R-project.org/ [Google Scholar]
- [37].Guyon I and Elisseeff A, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003. [Google Scholar]
- [38].Guyon I, Gunn S, Nikravesh M, and Zadeh L, Feature Extraction: Foundations and Applications. Springer, 2008. [Google Scholar]
- [39].Kuhn M, “Building predictive models in R using the caret package,” Journal of Statistical Software, vol. 28, no. 5, p. 26 pages, 2008. [Google Scholar]
- [40].Kursa MB and Rudnicki WR, “Feature selection with the Boruta package,” Journal of Statistical Software, vol. 36, no. 11, p. 13 pages, 2010. [Google Scholar]
- [41].Friedman J, Hastie T, and Tibshirani R, “Regularization paths for generalized linear models via coordinate descent,” Journal of Statistical Software, vol. 33, no. 1, pp. 1–22, 2010. [PMC free article] [PubMed] [Google Scholar]
- [42].Liaw A and Wiener M, “Classification and regression by randomForest,” R News, vol. 2, no. 3, pp. 18–22, 2002. [Online]. Available: http://CRAN.R-project.org/doc/Rnews/ [Google Scholar]
- [43].Venables WN and Ripley BD, Modern Applied Statistics with S, 4th ed. New York: Springer, 2002, iSBN 0-387-95457–0. [Online]. Available: http://www.stats.ox.ac.uk/pub/MASS4 [Google Scholar]
- [44].Fritsch S and Guenther F, neuralnet: Training of Neural Networks, 2016, R package version 1.33. [Online]. Available: https://CRAN.R-project.org/package=neuralnet
- [45].Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B, the R Core Team, Benesty M, Lescarbeau R, Ziem A, Scrucca L, Tang Y, Candan C, and Hunt T, caret: Classification and Regression Training, 2017, R package version 6.0–76. [Online]. Available: https://CRAN.R-project.org/package=caret
- [46].Meyer D, Dimitriadou E, Hornik K, Weingessel A, and Leisch F, e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien, 2017, R package version 1.6-8. [Online]. Available: https://CRAN.R-project.org/package=e1071
- [47].Lin X, Yang F, Yin LZP, Kong H, and et al WX., “A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information,” Journal Of Chromatography B, vol. 910, pp. 149–155, 2012. [DOI] [PubMed] [Google Scholar]
- [48].Ambroise C and McLachlan G, “Selection bias in gene extraction on the basis of microarray gene-expression data,” PNAS, vol. 99, no. 10, pp. 6562–6566, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Olden J, Joy MK, and Death RG, “An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data,” Ecological Modelling, vol. 178, pp. 389–397, 2004. [Google Scholar]
- [50].Beck M, NeuralNetTools: Visualization and Analysis Tools for Neural Networks, 2016, R package version 1.5.0. [Online]. Available: https://CRAN.R-project.org/package=NeuralNetTools [DOI] [PMC free article] [PubMed]
- [51].LeBlanc M and Crowley J, “Survival trees by goodness of split,” Journal of the American Statistical Association, Theory and Methods, vol. 88, no. 422, pp. 457–467, 1993. [Google Scholar]
- [52].Galavis P, Hollensen C, Jallow N, Paliwal B, and Jeraj R, “Variability of textural features in FDG PET images due to different acquisition modes and reconstruction parameters,” Acta Oncol., vol. 49, no. 7, pp. 1012–1016, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].van Gool MH, Aukema TS, Sinaasappel M, Olmos RAV, and Klomp HM, “Tumor heterogeneity on 18F-FDG-PET/CT for response monitoring in non–small cell lung cancer treated with erlotinib,” J Thorac Dis, vol. 8, no. 3, pp. E200–E203, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Ishwaran H, Kogalur U, Blackstone E, and Lauer M, “Random survival forests,” Ann. App. Statist, vol. 2, pp. 841–860, 2008. [Google Scholar]
- [55].Dhar SS, “Trimmed mean isotonic regression,” Scandinavian Journal of Statistics, vol. 43, no. 1, pp. 202–212, 2016. [Google Scholar]





