Abstract
Background:
Advanced neuroimaging measures along with clinical variables acquired during standard imaging protocols provide a rich source of information for brain tumor patient treatment and management. Machine learning analysis has had much recent success in neuroimaging applications for normal and patient populations and has potential, specifically for brain tumor patient outcome prediction. The purpose of this work was to construct, using the current patient population distribution, a high accuracy predictor for brain tumor patient outcomes of mortality and morbidity (i.e., transient and persistent language and motor deficits). The clinical value offered is a statistical tool to help guide treatment and planning as well as an investigation of the influential factors of the disease process.
Methods:
Resting state fMRI, diffusion tensor imaging, and task fMRI data in combination with clinical and demographic variables were used to represent the tumor patient population (n = 62; mean age = 51.2 yrs.) in a machine learning analysis in order to predict outcomes.
Results:
A support vector machine classifier with a t-test filter and recursive feature elimination predicted patient mortality (18-month interval) with 80.7% accuracy, language deficits (transient) with 74.2%, motor deficits with 71.0%, language outcomes (persistent) with 80.7% and motor outcomes with 83.9%. The most influential features of the predictors were resting fMRI connectivity, and fractional anisotropy and mean diffusivity measures in the internal capsule, brain stem and superior and inferior longitudinal fasciculi.
Conclusions:
This study showed that advanced neuroimaging data with machine learning methods can potentially predict patient outcomes and reveal influential factors driving the predictions.
Keywords: Machine-learning, fMRI, DTI, Tumor patients, Outcome prediction
1. Introduction
According to the Central Brain Tumor Registry of the United States [1], 66,000 new primary brain tumor diagnoses were made in the U.S. in 2012, one-third of which were malignant tumors. Health care has focused on improving survival in the treatment of patients with tumors. Another area of concern is the patients’ morbidity and affected quality of life. Tumor patients face quality of life (QOL) challenges ranging from general symptoms such as headache, anorexia, nausea, seizures and insomnia, to focal neurologic deterioration, including motor deficits, personality changes, cognitive deficits, aphasia and visual field defects [2]. Improving QOL depends on the removal of malignant brain tissue and preservation of healthy functional tissue.
During treatment and planning, the non-invasive imaging and risk assessment provided by Magnetic Resonance Imaging (MRI) and functional MRI (fMRI) methods guide clinicians during resection surgery and influence patient QOL by minimizing post-surgical deficits. The use of MRI methods for imaging structural soft tissue, fMRI for revealing eloquent brain regions and diffusion tensor imaging (DTI) for investigating white matter fiber tracts provide essential spatial and functional information about the tumor boundary and surrounding healthy tissue. These methods help surgeries achieve a critical balance between the extent of resection of malignant tissue to increase chance of survival and preservation of healthy tissue in order to minimize postoperative deficits and improving QOL.
Moreover, fMRI and DTI data offer complementary clinical value when used in software tools and models for patient prognostication [3,4]. Accurate predictions of primary outcomes are useful for guiding patient treatment and management (e.g. a more aggressive treatment plan for those who were predicted to have poor outcomes). Along with predictions, these models allow investigation of the disease process by providing insight into the relationship between the acquired neuroimaging and clinical measures and patient outcomes. Surgeries provide direct impact by treating the tumors, and statistical machine learning analyses provide complementary value by offering prognostication and the investigation of the disease process.
Specifically, for brain tumor outcome prediction, several studies have demonstrated high accuracy performance and prognostic factor investigation with statistical analyses using neuroimaging data. Lev et al. [5] used relative cerebral blood volume (rCBV) measures to predict survival of tumor patients, correctly classifying 13 of 13 high-grade astrocytoma and 7 of 9 low-grade astrocytomas. Moffat et al. [6] showed that early changes in tumor functional diffusion measures could be used as prognostic indicators of subsequent volumetric tumor response. Zacharaki et al. [4] used clinical, DTI, and relative cerebral blood volume measures in a decision tree algorithm to predict prognosis of high-grade gliomas with 85% accuracy, which was more accurate than histopathologic classification alone. Neal et al. [3] used MR anatomical image (T1 and T2) measures to calculate a Days Gained metric which predicted survival of glioblastoma multiforme (GBM) patients.
Prediction and representation of patient populations using neuroimaging data has been very successful in various medical imaging applications using machine learning [7–10]. In this area of research there have been fewer studies examining brain tumor populations than studies examining patients with brain anatomies that have no large scale abnormal structures (e.g. attention-deficit/hyperactivity disorder (ADHD), Alzheimer’s, schizophrenia) and this disparity can be caused by the challenges presented by tumor distortion for image processing, registration and feature construction. Although neuroimaging data with anatomical distortions is difficult to spatially transform and process, this work met the challenges of studying the brain tumor population.
Building upon previous studies which used anatomical and functional MRI measures, this tumor patient population study used an extensive representation of patients with clinical and advanced neuroimaging data. Resting state fMRI provided functional connectivity (which was not used by previous studies), task fMRI provided specific motor and language functional activity and DTI provided white matter diffusion measures that encoded the condition of each brain tumor patient. A large sample size as compared to previous studies [3,5], a powerful registration algorithm, and a feature selection process were used. The work adds evidence to the literature to support the value and ability of statistical and machine learning methods to predict brain tumor patient outcomes
2. Methods
2.1. Patients
Neuroimaging and clinical data collected from 62 brain tumor patients was used for representation in this study. Table 1 lists the characteristics, demographics and clinical information of the patients. This study is retrospective to a degree. The statistical and machine learning methods used are predictive (but the model is learned on existing data). The goal of learning a model is to understand the underlying relation between measures and to build a prediction system. In this work a model was learned on a subset of the data and evaluated on an unseen testing set. This is done to estimate the performance of the model on new, unseen data. This differs from completely retrospective analysis where all data is used in the analysis, without holding out a subset for testing.
Table 1.
Patient characteristics.
| Demographics | |
|---|---|
| Age range | 23–81 yrs. |
| Average age | 51.2 yrs. |
| Median age | 52 yrs. |
| Males | 29 |
| Females | 33 |
| Right handed | 52 |
| Left handed | 10 |
| Tumor location | |
| Frontal | 33(17 L/16R) |
| Temporal | 14(8 L/6R) |
| Parietal | 9 (2 L/7R) |
| Occipital | 2 (2 L) |
| Insular | 2 (2 L) |
| Other/not available | 2 |
| Tumor grade | |
| I | 1 |
| II | 14 |
| III | 9 |
| IV | 17 |
| Other/not available | 21 |
| 18 month survival | |
| Alive | 49 |
| Deceased | 13 |
Patients were selected from a database of individuals who received fMRI and DTI as part of presurgical planning at the University of Wisconsin (UW) Hospital and Clinics between August 2010 and July 2014. The inclusion criteria for this study were: patients with a diagnosis of first onset primary or metastatic tumors in any lobe of the brain who underwent motor and language mapping using fMRI as well as resting state fMRI. All patients gave informed consent and the study was conducted in accordance with protocols approved by the Health Sciences Institutional Review Boards at UW Madison. Patient’s clinical information was retrieved from medical records. Any record of preoperative or postoperative weakness (lower extremity, upper extremity, facial) and aphasia (Broca, Wernicke, conduction, global) was regarded as a deficit. If a patient had deficit that persisted beyond 6 months was classified as “deficit”, and if the patient improved or no deficit was present after six months then it was classified as a good outcome. Mortality information was collected for all patients from medical records and was cross-referenced with the Social Security Death Index accessed from http://stevemorse.org/ssdi/ssdi.html.
2.2. Data acquisition
The pre-operative standard brain tumor anatomical imaging protocol in the UW Hospital and Clinics uses 1.5 T and 3 T GE MRI scanners and consists of high-resolution T1- and T2-weighted imaging including a T1-weighted 3D fast spoiled gradient-recalled steady state acquisition (FSPGR) (1.5 T IR-prepared FSPGR, matrix size: 256 × 256, 112 axial slices, 1.0 × 1.0 × 1.5 mm, Flip Angle: 13°, repetition time (TR): 10.96 s, echo time (TE): 4.53 s, inversion time (TI): 0.45 s; 3T IR-prepared FSPGR, matrix size: 256× 256, 136 axial slices, 1.0 × 1.0 × 1.2 mm, Flip Angle: 13°, TR: 9.23 s, TE: 3.72 s, TI: 0.40s), and diffusion-weighted imaging (DWI) from which DTI measures were computed. This study focused on finger tapping (motor activation) and letter word-generation (language activation) fMRI task scans since the largest number of the patients in the database performed these tasks. Resting state fMRI scans (eyes closed, 150 volumes, matrix size: 64 × 64, 28 axial slices, 3.75 × 3.75 × 5.0 mm, TR: 2 s, TE: 0.03 s) were also acquired during this imaging protocol.
T2*-weighted echo planar imaging (EPI) scans were acquired during the performance of tasks with the following parameters: matrix size: 64 × 64, 28 axial slices, 3.75 × 3.75 × 5.0 mm, Flip Angle: 75°, TR: 2 s, TE: 0.03 s.
DWI scans were acquired using a pulsed gradient spin-echo sequence with a single-shot echo-planar acquisition with the following imaging parameters: matrix size: 128 × 128 zero-padded to 256 × 256, field of view: 24 cm, section thickness: 3 mm, TR: 4.5 s, TE: 0.72 s, number of excitations (NEX): 4. Diffusion encoding was performed in 25 uniformly distributed directions with b-value of 10s/mm2.
Clinical variables collected for patient representation were: patient age, sex, tumor lobe and hemisphere location, mini-mental state exam (MMSE) score, and verbal fluency (VF) raw score. VF was assessed by forms of the Controlled Oral Word Association Test (COWAT) [11], which requires subjects to produce words beginning with the letters “F,” “A,” ‘S’ in three respective 1-min trials. Responses to each letter were recorded and letter fluency scores were based on the total number of correct responses produced by the participants across the three letter conditions. Tumor grade was not used since a large majority of the patients did not have an identified grade in the medical records.
2.3. Resting state fMRI processing
Resting state fMRI data were preprocessed using AFNI (http://afni.nimh.nih.gov/afni, version: AFNI_2011_12_21_1014) and FSL (www.fmrib.ox.ac.uk/fsl, version: v5.0) software following a standard procedure reported by Allen et al. [12]. The steps were: 1) discarding the first four resting scan volumes to remove T1 equilibrium effects, 2) motion and slice-timing correction, 3) skull stripping, 4) spatial normalization to standard Montreal Neurological Institute (MNI) brain space with resampling to 3 × 3 × 3 mm voxels, and 5) spatial smoothing with a Gaussian kernel with a full-width at half-maximum (FWHM) of 10 mm. The voxel size and spatial smoothing parameters were chosen to match those of the preprocessing steps that produced the regions of interest (ROIs) used in this work’s resting state connectivity analysis.
Twenty-eight thresholded Independent Component Analysis (ICA) network maps from work by Allen et al. [12] were used as ROIs to produce a connectivity matrix by calculating pairwise correlations between all possible pairs from the 28 ROIs. ICA maps were thresholded at t-statistics > 5, which kept the maps spatially concise and their extent focused on the respective functional region in the brain. A limitation of this set of ROIs is that there was no specific language network. This procedure produced 378 unique correlation pairs and their correlation values were used as features to represent each patient’s functional connectivity in the machine learning analysis. This is a standard procedure in this kind of analysis. A connectivity representation was chosen because of the success it has had in recent studies using machine learning and neuroimaging data [13–15].
2.4. DTI processing
DTI data were preprocessed using FSL which included: 1) eddy-current correction, 2) b-vector rotation, 3) brain extraction (skull stripping), and 4) diffusion tensor fitting. This process produced fractional anisotropy (FA) and mean diffusivity (MD) maps for each patient.
Each patient’s DTI map was normalized to a common MNI space using Advanced Normalization Tools (ANTs) software in Linux [16]. ANTs software was chosen because it was empirically shown to produce excellent anatomical MRI registration [17]. This process included registration of the anatomical volume to an anatomical standard MNI template (standard registration script), registration of the DTI volume to the anatomical volume (intermodality intrasubject registration script) and included a spatial mask of the tumor to improve registration mapping.
2.5. Feature selection with a white matter skeleton
To extract only relevant FA and MD signal, a white matter skeleton was used to isolate voxels that contained white matter in the whole brain volume. This method is standard in tract based spatial statistics and was used in a similar neuroimaging prediction study [18,19].
In this procedure, all patients’ FA maps were combined to create an average map which was thresholded at FA > 0.4 to obtain a FA white matter skeleton. The threshold of 0.4 was chosen to achieve a balance between the number of total voxels retained (preference for fewer voxels for feature reduction) and anatomical white matter extent (preference for greater extent to capture relevant DTI signal). Next, the FA and MD values located in the voxels of the FA white matter skeleton were extracted for each patient and used as features for patient white matter representation in the machine learning analysis.
2.6. Task fMRI processing
Task fMRI maps were preprocessed using Prism Clinical Imaging software [20], (Elm Grove, WI) which included slice timing and motion correction, spatial normalization to standard MNI space, general linear model analysis and clustering to produce t-statistics spatial maps of task activation.
Functional ROIs of sensorimotor and language networks from Shirer et al. [21] were used to extract task activation t-statistics for functional network activity representation on patients. Sensorimotor ROIs, comprising of regions from the precentral and postcentral gyrus, supplementary motor area (SMA), lobules IV-VI and the thalamus, were used to extract activation measures from the finger tapping volumes. Language ROIs, comprising of regions from the inferior frontal gyrus, left middle temporal gyrus, angular gyrus, superior temporal gyrus and supramarginal gyrus, were used to extract measures from the word generation volumes. Voxel t-statistics in each spatially separate ROI were averaged to produce a total of 15 t-statistic measures representing the motor and language activity of each patient.
Lesion to activation distances (LADs) were also measured for each subject using the motor and language task fMRI maps. LADs were shown to provide predictive value by Wood et al. [22] and thus were included in this analysis. The LAD was defined to be the smallest distance between the tumor boundary and center of activation. Three LADs were measured: tumor to supplementary motor area, tumor to Broca’s area and tumor to Wernicke’s area. Measurements were made using AFNI and ITK-SNAP (itksnap.org, version 3.2) [23].
2.7. Outcome measures
The predicted variables or classifier labels in this study are mortality, motor and language deficits and motor and language outcomes. Mortality was considered in an 18-month time interval after surgery as in Zacharaki et al. [4] A patient who survived for 18 months had a “+1” label and a patient who did not had a “−1” label in the machine learning prediction analysis. Functional (transient) deficit labels were defined to be the presence or absence of motor and language deficits soon after surgery (motor and language deficits predicted separately). Functional outcomes were defined as persistence of motor and language deficits as described by clinical follow-up evaluations. If a patient had no deficit, then their outcome label was regarded as a non-persistent deficit or positive outcome. Consequently, each patient had a total of 5 labels to be predicted: mortality, motor deficit, language deficit, motor outcome, and language outcome. In this work the machine learning terminology of “classification” is equivalent to “prediction.”
2.8. Support vector machines
Support vector machines (SVMs) are one of the most popular and powerful algorithms for learning because they have the advantages of flexibility in representing complex functions through use of a kernel dot product step termed the “kernel trick” and resistance to overfitting through retaining a fraction of the training examples [24]. SVMs construct a maximum margin separator, a discrimination (decision) boundary, with the largest possible distance away from example points which generalizes well for future, unseen data. The algorithm finds a hyperplane that discriminates between two classes (labels) in high-dimensional space (the feature space) by solving a convex optimization problem. Usually data are not linearly separable and a soft margin is used that allows some examples to be misclassified to improve accuracy on future data examples [25]. The parameter C controls the amount of penalty assigned to misclassified points. In this study a soft margin linear kernel classifier was used with the default value of C = 1 as in [26], allowing misclassification. SVM classification was carried out using the Spider Machine Learning environment as well as custom scripts developed in MATLAB (2013a, The MathWorks, Inc., Natick, Massachusetts, United States) [27]. Decision trees as used as in Zacharaki et al. [4] and the Naïve Bayes classifier, a complementary algorithm to SVMs and decision trees, were used as a comparison to the SVM. An important step in a successful classification process is a rich patient representation that captures the variation in the underlying patient population distribution and this study investigates the predictive performance of such a representation. Table 2 lists the features and their categories used in the analysis.
Table 2.
Feature categories used in the patient representation.
2.9. Feature selection
Recursive feature elimination (RFE), as in Zacharaki et al. [28], was used to select the features that optimize classification performance on a tuning set (see SI Figure for tuning accuracy). This tuning or validation set allowed the reduction of the number of features and allowed selection of the most relevant ones for use in the full 62 patient model, which is standardly done to improve the model performance and its interpretability [29]. A t-test feature filter was used for feature selection for the mortality classifier which selected the most significant features separately from the FA and MD (DTI) category and all others.
The tuning sets for the SVM classifiers consisted of leave-one-out cross validation sets from the 61 patients in each outer loop leave-one-out fold. The tuned features obtained from feature selection were used in the full 62 patient model to evaluate its performance. The final performance accuracy was calculated with a leave-one-out cross-validation (LOOCV) method which is the best estimator of the model’s performance on a future patient [30]. A flowchart of the prediction process is shown in Fig. 1.
Fig. 1.
Flowchart of the prediction process. Flowchart of the prediction process showing the available data and stages of the analysis.
3. Results
A SVM trained model using all of the feature categories was able to classify patient mortality with 80.7% accuracy (LOOCV estimate, P < 0.001, binomial test). For comparison, other standard complementary classification algorithms with interpretable models were used to predict mortality. The naïve Bayes classifier achieved 72.6% accuracy (P < 0.001, binomial test) and a decision tree classifier achieved 75.8% accuracy (P < 0.001, binomial test).
A SVM trained model was able to classify patient language and motor morbidities (transient deficits) after surgery with 74.2% accuracy for language and 71.0% accuracy for motor. A SVM model was able to classify patient functional outcomes of language and motor deficit persistence with 80.7% accuracy for language and 83.9% accuracy for motor. Fig. 2 shows the classification accuracies of the different classifiers.
Fig. 2.
Classifier Accuracy. Accuracies of the classifiers for patient mortality, language and motor outcomes.
The sensitivity and specificity for each SVM classifier were also calculated. The mortality classifier was 44.4% sensitive and 93.9% specific with regard to mortality (in receiver operating characteristic terms, the presence of “signal” was the presence of mortality). The other classifiers showed a similar trend and their sensitivities and specificities are listed in Table 3.
Table 3.
Sensitivities and specificities for the SVM patient outcome classifiers
| Classifier | Sensitivity (true positive rate) | Specificity (true negative rate) |
|---|---|---|
| Mortality | 44.4% | 93.9% |
| Language deficit | 25% | 91.3% |
| Motor deficit | 22.2% | 90.9% |
| Language outcome | 10% | 94.2% |
| Motor outcome | 10% | 98.1% |
Most influential features for mortality prediction were from resting connectivity, fractional anisotropy and mean diffusivity categories and are listed in Table 4. Fig. 3 visually represents the most influential features for language outcome prediction (a table is not provided here to save space) and Fig. 4 shows the most influential features for motor outcome prediction. The influential features from the most accurate classifiers are presented since they are the most meaningful.
Table 4.
Influential features and their descriptions for mortality prediction.
| Feature number | Category | Description |
|---|---|---|
| 1 | Resting connectivity | Sensorimotor_precentral, frontal_L |
| 2 | Resting connectivity | Sensorimotor_precentral, sensorimotor_L |
| 3 | Resting connectivity | Sensorimotor_precentral, attentional_LRanterior |
| 4 | Resting connectivity | Auditory, frontal_L |
| 5 | Resting connectivity | Auditory, sensorimotor_superior |
| 6 | Resting connectivity | Auditory, default_mode_LRanterior |
| 7 | Resting connectivity | Frontal_L, basal_ganglia |
| 8 | Resting connectivity | Frontal_L, sensorimotor_R |
| 9 | Resting connectivity | Frontal_L, sensorimotor_LR |
| 10 | Resting connectivity | Frontal_L, visual_LR |
| 11 | Resting connectivity | Frontal_L, frontal_R |
| 12 | Resting connectivity | Frontal_L, frontal_LR |
| 13 | Resting connectivity | Frontal_L, sensorimotor_superior |
| 14 | Resting connectivity | Frontal_L, attentional_R |
| 15 | Resting connectivity | Frontal_L, attentional_LRposterior |
| 16 | Resting connectivity | Frontal_L, attentional_superior |
| 17 | Resting connectivity | Basal_ganglia, frontal_R |
| 18 | Resting connectivity | Basal_ganglia, frontal_Ranterior |
| 19 | Resting connectivity | Basal_ganglia, attentional_LRanterior |
| 20 | Resting connectivity | Basal_ganglia, default_mode_LRanterior |
| 21 | Resting connectivity | Sensorimotor_L, sensorimotor_R |
| 22 | Resting connectivity | Sensorimotor_L, frontal_R |
| 23 | Resting connectivity | Sensorimotor_L, frontal_Ranterior |
| 24 | Resting connectivity | Sensorimotor_L, visual_posterior |
| 25 | Resting connectivity | Sensorimotor_L, attentional_R |
| 26 | Resting connectivity | Sensorimotor_R, attentional_Lsuperior |
| 27 | Resting connectivity | Sensorimotor_R, attentional_LRanterior |
| 28 | Resting connectivity | Sensorimotor_R, sensorimotor_superior |
| 29 | Resting connectivity | Sensorimotor_R, attentional_R |
| 30 | Resting connectivity | Sensorimotor_R, default_mode_LRanterior |
| 31 | Resting connectivity | Default_mode_anterior, frontal_LR |
| 32 | Resting connectivity | Default_mode_anterior, default_mode_LRanterior |
| 33 | Resting connectivity | Attentional_L, frontal_R |
| 34 | Resting connectivity | Attentional_L, frontal_LR |
| 35 | Resting connectivity | Attentional_L, frontal_Ranterior |
| 36 | Resting connectivity | Attentional_L, attentional_LRanterior |
| 37 | Resting connectivity | Attentional_L, sensorimotor_superior |
| 38 | Resting connectivity | Sensorimotor_LR, frontal_R |
| 39 | Resting connectivity | Sensorimotor_LR, attentional_LRanterior |
| 40 | Resting connectivity | Visual_LR, frontal_R |
| 41 | Resting connectivity | Frontal_R, frontal_LR |
| 42 | Resting connectivity | Frontal_R, default_mode_posterior |
| 43 | Resting connectivity | Frontal_R, attentional_Lsuperior |
| 44 | Resting connectivity | Frontal_R, default_mode_medial |
| 45 | Resting connectivity | Frontal_R, attentionaljranterior |
| 46 | Resting connectivity | Visual, attentional_LRanterior |
| 47 | Resting connectivity | Visual, sensorimotor_superior |
| 48 | Resting connectivity | Visual, default_mode_lranterior |
| 49 | Resting connectivity | Frontal_LR, visual_LR |
| 50 | Resting connectivity | Frontal_LR, attentional_LRanterior |
| 51 | Resting connectivity | Visual_LR, attentional_LRanterior |
| 52 | Resting connectivity | Visual_LR, sensorimotor_superior |
| 53 | Resting connectivity | Frontal_Ranterior, attentional_LRanterior |
| 54 | Resting connectivity | Frontal_Ranterior, attentional_superior |
| 55 | Resting connectivity | Default_mode_posterior, attentional_LRanterior |
| 56 | Resting connectivity | Default_mode_posterior, sensorimotor_superior |
| 57 | Resting connectivity | Attentional_Lsuperior, sensorimotor_superior |
| 58 | Resting connectivity | Default_mode_medial, attentional_LRanterior |
| 59 | Resting connectivity | Attentional_LRanterior, visual_posterior |
| 60 | Resting connectivity | Attentional_LRanterior, attentional_R |
| 61 | Resting connectivity | Attentional_LRanterior, attentional_LRposterior |
| 62 | Resting connectivity | Attentional_R, default_mode_LRanterior |
| 63 | Clinical | Age |
| 64 | FA | Cerebellum |
| 65 | FA | Cerebellum |
| 66 | FA | Brainstem |
| 67 | FA | Cerebellum |
| 68 | FA | Cerebellum |
| 69 | FA | Cerebellum |
| 70 | FA | Cerebellum |
| 71 | FA | Cerebellum |
| 72 | FA | Cerebellum |
| 73 | FA | Occipitofrontal fasciculus |
| 74 | FA | Inferior longitudinal fasciculus |
| 75 | FA | Occipitofrontal fasciculus |
| 76 | FA | Occipitofrontal fasciculus |
| 77 | FA | Occipitofrontal fasciculus |
| 78 | FA | Occipitofrontal fasciculus |
| 79 | FA | Inferior longitudinal fasciculus |
| 80 | FA | Internal capsule |
| 81 | FA | Internal capsule |
| 82 | FA | Corona radiata |
| 83 | FA | Inferior longitudinal fasciculus |
| 84 | FA | Inferior longitudinal fasciculus |
| 85 | FA | Inferior longitudinal fasciculus |
| 86 | FA | Inferior longitudinal fasciculus |
| 87 | FA | Inferior longitudinal fasciculus |
| 88 | FA | Inferior longitudinal fasciculus |
| 89 | FA | Inferior longitudinal fasciculus |
| 90 | FA | Inferior longitudinal fasciculus |
| 91 | FA | Inferior longitudinal fasciculus |
| 92 | FA | Forceps major |
| 93 | FA | Forceps major |
| 94 | FA | Forceps major |
| 95 | FA | Forceps major |
| 96 | FA | Forceps major |
| 97 | FA | forceps major |
| 98 | FA | Forceps major |
| 99 | FA | Corona radiata |
| 100 | FA | Superior longitudinal fasciculus |
| 101 | FA | Superior longitudinal fasciculus |
| 102 | FA | Superior longitudinal fasciculus |
| 103 | FA | Superior longitudinal fasciculus |
| 104 | FA | Superior longitudinal fasciculus |
| 105 | MD | Inferior longitudinal fasciculus |
| 106 | MD | Occipitofrontal fasciculus |
| 107 | MD | Corpus callosum |
| 108 | MD | Superior longitudinal fasciculus |
| 109 | Task seed | Sensorimotor_Lcentral_gyrus |
Fig. 3.
Influential Locations of Language Outcome. Influential FA and MD locations of the language outcome predictor are displayed in red. Note that only voxels in the white matter skeleton were considered. The underlay is a standard Johns Hopkins University FA atlas (JHU-ICBM-FA-2 mm) in MNI space. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4.
Influential Locations of Motor Outcome. Influential FA and MD locations of the motor outcome predictor are displayed in red. Note that only voxels in the white matter skeleton were considered. The underlay is a standard Johns Hopkins University FA atlas (JHU-ICBM-FA-2 mm) in MNI space. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
After the SVM classifier which used all the features identified the most influential categories, classifiers using each category separately were investigated for mortality prediction to gain insight into their significance.
Classification of mortality using only resting connectivity data achieved 79.0% accuracy, which was not statistically different (paired t-test, P = 0.74) from the accuracy of the all-feature-classifier but was less accurate by 1.7%. Classification using only DTI data (FA and MD) also achieved 79.0% accuracy (paired t-test, P = 0.74); and classification using only task fMRI data (task seeds and LADs) achieved 74.2% accuracy (paired t-test, P = 0.25), which was less accurate from the all-feature-classifier by 6.5%.
4. Discussion
Brain tumor patients are regularly imaged as part of their diagnosis and treatment. The clinical tumor imaging protocol provides a rich source of data (anatomical MR, fMRI, DTI) that is valuable for clinical evaluation and also for statistical predictions. This neuroimaging data can be used along with clinical measures in machine learning algorithms to aid in prognostication by providing a patient with expected mortality and functional outcome information based on the existing patient population distribution.
This analysis used clinical and demographic variables in combination with rich neuroimaging data, both functional and structural, to represent the patient population. Resting state fMRI provided functional connectivity information independent of patient’s cooperation and consciousness, DTI provided information about subcortical white matter tracts (anisotropy and strength of diffusion), and task fMRI maps provided specific functional information about the motor and language networks. These measures were selected to give a broad description of the anatomical and functional state of a patient with a brain tumor within the machine learning framework. In constructing a high accuracy predictor, the patient representation with regard to the task is essential. For high performance the representation should be rich enough to capture the variation and difference between good outcome and bad outcome patients and also should be concise in keeping only the relevant features.
The main purpose of the study was to develop a method that could be able to predict outcomes with high accuracy and identify features influential for mortality, functional deficits, and outcomes. This work showed that clinical data in combination with advanced neuroimaging measures can provide very accurate prediction of these tumor patient primary outcomes. The value offered is a clinical tool that provides unique information noninvasively and that can be used in complement with existing clinical protocols to guide treatment and decision making. Patients who are predicted to have poor outcomes can be more aggressively treated in the clinic, and in the opposite case, a good outcome prediction will support a sense of relief and hope for improvement for the patient. It is important to remember that clinical decisions are made by weighing many factors and use many tools to guide treatment. Commonly histological, clinical and demographic information are all carefully used by physicians for a reasonable prognosis [31]. This software prediction tool provides a complementary level of prognostic information, prior to operation and biopsy, which is intended to be one of many clinical tools in the decision making process.
With regard to mortality prediction the classifier was moderately accurate (44.4%) in detecting “true positives” or patients with a mortality and very accurate (93.9%) in detecting “true negatives” or patients with no mortality. There was a similar trend of low true positive rates and high true negative rates for language and motor deficits and outcomes predictions. This is most likely due to the imbalance of the labels: 13 patients had “−1, mortality” labels and 49 had “+1, no mortality” labels. For such a case the classifier was able to learn prediction of the majority group best since it contained the majority of the training data. One could build a classifier with balanced groups to improve the true positive detection rate. It is also important to represent the patient population distribution well which may be inherently imbalanced. The best scenario for the machine learning framework is to have a large number of patients from both labels or groups and also to represent the underlying patient population distribution.
The analyzed data came from several imaging modalities and had correlated labels (severe deficits were correlated with mortality) and for such a form of data multi-kernel learning and multi-task learning are appropriate. These two more advanced methods were investigated and are described in the Supplementary Data. The result was that their models were less accurate than the original all-feature-SVM model: 80% accuracy for multi kernel learning and 30% for multi task learning.
The advantage of the all-feature-SVM algorithm over the more complex methods is that feature selection, which substantially improved accuracy, was more readily implemented for the standard SVM method. This resulted in a concise model (using only 109 features for the mortality classifier) and the retaining of the most relevant features which is useful for computational efficiency and for influential feature identification. Feature selection was not attempted with multi-kernel learning because the multi-kernel classifier was disproportionate in the feature kernels and did not show an improvement over the original SVM classifier. Feature selection was not used with multi-task learning because the joint feature learning algorithm is already constructed to handle a high dimensional feature space and select the most important features.
Many white matter voxels in the cerebellum, brain stem, internal capsule, corona radiata and longitudinal fasciculi were seen to be most influential as listed in Table 4 and shown in Figs. 3 and 4. Note that differences of brain anatomy are difficult to detect using conventional mass-univariate methods (e.g., voxel based morphometry) which require correction for multiple comparisons. This method has an advantage of an underlying multivariate approach which does not require multiple comparison corrections (nor clustering corrections). It was found that FA and MD were the most influential features for language and motor functional outcome prediction. FA and MD were also most influential in mortality prediction along with resting connectivity features. Clinical features did not play a significant role in the classifiers and only patient age was identified to be influential for mortality prediction. The clinical feature of tumor location in the insular cortex was influential for motor outcome prediction. The authors believe that the better performance of neuroimaging features as compared to clinical features is due to their large dimension and quality of information—they contained measures that discriminated between labels to a high degree.
The intersection (overlap) between mortality, language and motor classifier features was investigated to understand their involvement in the different predictors. Mortality and language predictors had no features in common. Mortality and motor predictors had two features in common: resting connectivity between frontal and sensorimotor regions, and connectivity between sensorimotor and default mode regions. Language and motor predictors shared six features: resting connectivity between frontal and attentional regions, three FA features in the brain stem and the cerebellum and two MD features in the internal capsule and corpus callosum. There was little overlap between specific features of the different classifiers but the locations of the MD and FA features had a spatial similarity for regions in the corona radiata, internal capsule and inferior and superior longitudinal fasciculi.
Since the brain tumor population has anatomical distortion which causes functional and anatomical tissue to be displaced from its normal location, the association of a feature with a specific anatomical or functional area for a particular patient should be regarded with caution. In the general case the anatomical areas of the patient population will be distributed similar to normal subjects for a large part of the brain, but in the specific patient case one should take care to determine which features associated with anatomical and functional networks are compromised by the presence of a tumor.
Although powerful image registration software was used, which counters the anatomical distortion present in brains with tumors, there is still some distortion and abnormality which remained after processing and therefore the images cannot be regarded as identical to those of normal populations. As a result, feature interpretation for a tumor population is not as reliable as for a normal patient population. For example, when it is found that specific motor FA and MD values highly influence mortality prediction, it is true that the FA and MD values in those voxels are highly influential but there is some uncertainty in the anatomical and functional association of the voxels.
Classification using resting state fMRI features only, DTI (FA and MD) features only, and task fMRI features only was also investigated. Resting state fMRI and DTI singleton category classifiers performed equally well (79.0% accuracy) and were less accurate than the all-category-classifier by 1.7%. The task fMRI category classifier was the least accurate with 74.2%. The highest classification accuracy was seen with a representation combining all of the features and categories which supports the advantage of a diverse representation over more limited representations.
Mortality labels were more readily identified for each patient than deficit labels—the patient either survived or not as indicated by medical records. There was more variability and uncertainty in the deficit labels where patients had a range of deficit severity based on their clinical evaluations. Since a binary label was associated with these deficits there was some loss of information. This systematic uncertainty may have contributed to the lower accuracy of the deficit classifier as compared to mortality. Overall, classification of outcomes for the entire data set was very accurate and showed promise for the application of machine learning methods in clinical outcome prediction.
This study investigated the potential of machine learning methods using advanced neuroimaging data to predict tumor patient outcomes and reveal influential measures driving the predictions. Such a framework allows the construction of consistent, systematic tools for prediction which provide an extra dimension of information for clinical treatment and management without requiring invasive procedures. Recent studies have shown machine learning’s success and its good match with neuroimaging data, and this study adds evidence supporting the ability of machine learning prediction for clinical applications.
4.1. Study limitations
Despite the strengths and potential clinical use of this study, there are some limitations that need to be addressed. One of them is the mixed patient population, including patients with tumor grade from 1-IV, as well as brain metastasis patients. Future studies should consider investigating a more homogeneous patient population. In addition, although prediction specificity was good, both groups of classifiers should be equally represented in order to improve the classifier sensitivity. Another thing to consider would be the lack of preoperative information on patients’ medical records, which lead to a limited description of each patient’s status (a description based on either the presence or absence of deficit). Lastly, we are aware that the presence of tumors causes tissue shift and mismatch. Even though we corrected such effect by performing image registration, Allen et al. (2011) maps were derived from patients without tumors.
4.2. Conclusions
The purpose of this work was to construct a high accuracy predictor for brain tumor patient outcomes of mortality and morbidity using resting-state fMRI, diffusion tensor imaging, and task fMRI data in combination with clinical and demographic variables. This method predicted patient mortality (18-month interval) with 80.7% accuracy, language deficits (transient) with 74.2%, motor deficits with 71.0%, language outcomes (persistent) with 80.7% and motor outcomes with 83.9%. The most influential features of the predictors were resting fMRI connectivity, and fractional anisotropy and mean diffusivity measures in the internal capsule, brain stem and superior and inferior longitudinal fasciculi. This study showed that advanced neuroimaging data with machine learning methods can potentially predict patient outcomes and reveal influential factors driving the predictions.
Supplementary Material
Acknowledgements
The authors thank the UW Hospital and Clinics clinicians for their treatment and management of patients.
This project was supported by awards from the National Institutes of Health (R41NS081926 to Vivek Prabhakaran; RC1MH090912 to M. Elizabeth Meyerand and Vivek Prabhakaran; K23NS086852 to Vivek Prabhakaran; ICTR KL2 as part of UL1TR000427 to Vivek Prabhakaran; and T32 EB011434 to M. Elizabeth Meyerand and Svyatoslav Vergun). Support was also provided by the Foundation of ASNR’s Comparative Effectiveness Research Award to Vivek Prabhakaran.
Abbreviations:
- ADHD
Attention-deficit/hyperactivity Disorder
- ANTs
Advanced Normalization Tools
- COWAT
Controlled Oral Word Association Test
- DTI
Diffusion Tensor Imaging
- DWI
Diffusion Weighted Imaging
- EPI
Echo-Planar Imaging
- FA
fractional anisotropy
- fMRI
Functional Magnetic Resonance Imaging
- FSPGR
Fast Spoiled Gradient-Recalled Steady State Acquisition
- FWHM
Full-Width at Half-Maximum
- GBM
Glioblastoma Multiforme
- ICA
Independent Component Analysis
- LAD
Lesion to Activation Distance
- LOOCV
Leave-One-Out Cross-Validation
- MD
mean diffusivity
- MMSE
Mini-Mental State Exam
- MNI
Montreal Neurological Institute
- NEX
Number of Excitations
- QOL
Quality of Life
- rCBV
relative cerebral blood volume
- RFE
Recursive Feature Elimination
- ROI
Region of Interest
- SMA
Supplementary Motor Area
- SVM
Support Vector Machine
- TE
Echo Time
- TI
Inversion Time
- TR
Repetition Time
- VF
Verbal Fluency
Footnotes
Disclosures
The authors declare no conflict of interest.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.inat.2018.04.013.
References
- [1].Ostrom QT, Gittleman H, Liao P, Rouse C, Chen Y, Dowling J, Wolinsky Y, Kruchko C, Barnholtz-Sloan J, CBTRUS Statistical Report: Primary Brain and Central Nervous System Tumors Diagnosed in the United States in 2007–2011, Neuroncology 16 (Suppl. 4) (2014) iv1–iv63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Heimans JJ, Taphoorn MJ, Impact of brain tumour treatment on quality of life, J. Neurol 249 (8) (2002) 955–960. [DOI] [PubMed] [Google Scholar]
- [3].Neal ML, Trister AD, Cloke T, Sodt R, Ahn S, Baldock AL, Bridge CA, Lai A, Cloughesy TF, Mrugala MM, Discriminating survival outcomes in patients with glioblastoma using a simulation-based, patient-specific response metric, PLoS One 8 (1) (2013) e51951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Zacharaki E, Morita N, Bhatt P, O’Rourke D, Melhem E, Davatzikos C, Survival analysis of patients with high-grade gliomas based on data mining of imaging variables, Am. J. Neuroradiol 33 (6) (2012) 1065–1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Lev MH, Ozsunar Y, Henson JW, Rasheed AA, Barest GD, Harsh GR, Fitzek MM, Chiocca EA, Rabinov JD, Csavoy AN, Glial tumor grading and outcome prediction using dynamic spin-echo MR susceptibility mapping compared with conventional contrast-enhanced MR: confounding effect of elevated rCBV of oligodendroglimoas, Am. J. Neuroradiol 25 (2) (2004) 214–221. [PMC free article] [PubMed] [Google Scholar]
- [6].Moffat BA, Chenevert TL, Lawrence TS, Meyer CR, Johnson TD, Dong Q, Tsien C, Mukherji S, Quint DJ, Gebarski SS, Functional diffusion map: a non-invasive MRI biomarker for early stratification of clinical brain tumor response, Proc. Natl. Acad. Sci. U. S. A 102 (15) (2005) 5524–5529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Greicius MD, Srivastava G, Reiss AL, Menon V, Default-mode network activity distinguishes Alzheimer’s disease from healthy aging: evidence from functional MRI, Proc. Natl. Acad. Sci. U. S. A 101 (13) (2004) 4637–4642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Shen H, Wang L, Liu Y, Hu D, Discriminative analysis of resting-state functional connectivity patterns of schizophrenia using low dimensional embedding of fMRI, NeuroImage 49 (4) (2010) 3110–3121. [DOI] [PubMed] [Google Scholar]
- [9].Price CJ, Seghier ML, Leff AP, Predicting language outcome and recovery after stroke: the PLORAS system, Nat. Rev. Neurol 6 (4) (2010) 202–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Poldrack RA, Halchenko YO, Hanson SJ, Decoding the large-scale structure of brain function by classifying mental states across individuals, Psychol. Sci 20 (11) (2009) 1364–1372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Benton A, Hamsher K, Multilingual Examination, University of Iowa, Iowa City, IA, 1976. [Google Scholar]
- [12].Allen EA, Erhardt EB, Damaraju E, Gruner W, Segall JM, Silva RF, Havlicek M, Rachakonda S, Fries J, Kalyanam R, A baseline for the multivariate comparison of resting-state networks, Front. Syst. Neurosci 5 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Craddock RC, Holtzheimer PE, Hu XP, Mayberg HS, Disease state prediction from resting state functional connectivity, Magn. Reson. Med 62 (6) (2009) 1619–1628, 10.1002/mrm.22159PubMedPMID19859933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Meier TB, Desphande AS, Vergun S, Nair VA, Song J, Biswal BB, Meyerand ME, Birn RM, Prabhakaran V, Support vector machine classification and characterization of age-related reorganization of functional brain networks, NeuroImage 60 (1) (2012) 601–613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Vergun S, Deshpande AS, Meier TB, Song J, Tudorascu DL, Nair VA, Singh V, Biswal BB, Meyerand ME, Birn RM, Characterizing functional connectivity differences in aging adults using machine learning on resting state fMRI data, Front. Comput. Neurosci 7 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Avants BB, Tustison N, Song G, Advanced normalization tools (ANTS), Insight J. 2 (2009) 1–35. [Google Scholar]
- [17].Klein A, Andersson J, Ardekani BA, Ashburner J, Avants B, Chiang M-C, Christensen GE, Collins DL, Gee J, Hellier P, Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration, NeuroImage 46 (3) (2009) 786–802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Smith SM, Jenkinson M, Johansen-Berg H, Rueckert D, Nichols TE, Mackay CE, Watkins KE, Ciccarelli O, Cader MZ, Matthews PM, Behrens TE, Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data, NeuroImage 31 (4) (2006) 1487–1505, http://dx.doi.Org/10.1016/j.neuroimage.2006.02.024PubMedPMID6624579. [DOI] [PubMed] [Google Scholar]
- [19].Amarreh I, Meyerand ME, Stafstrom C, Hermann BP, Birn RM, Individual classification of children with epilepsy using support vector machine with multiple indices of diffusion tensor imaging, NeuroImage: Clinical 4 (2014) 757–764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Prism Clinical Imaging I Elm Grove, WI. [Google Scholar]
- [21].Shirer W, Ryali S, Rykhlevskaia E, Menon V, Greicius M, Decoding subject-driven cognitive states with whole-brain connectivity patterns, Cereb. Cortex 22 (1) (2012) 158–165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Wood J, Kundu B, Utter A, Gallagher T, Voss J, Nair V, Kuo J, Field A, Moritz C, Meyerand M, Impact of brain tumor location on morbidity and mortality: a retrospective functional MR imaging study, Am. J. Neuroradiol 32 (8) (2011) 1420–1425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Yushkevich PA, Piven J, Hazlett HC, Smith RG, Ho S, Gee JC, Gerig G, User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability, NeuroImage 31 (3) (2006) 1116–1128. [DOI] [PubMed] [Google Scholar]
- [24].Russell SJ, Norvig P, Artificial intelligence : a modern approach, 3rd ed., xviii Prentice Hall, Upper Saddle River, N.J, 2010. (1132p. p). [Google Scholar]
- [25].Cortes C, Vapnik V, Support-vector networks, Mach. Learn 20 (3) (1995) 273–297. [Google Scholar]
- [26].Dosenbach NU, Nardos B, Cohen AL, Fair DA, Power JD, Church JA, Nelson SM, Wig GS, Vogel AC, Lessov-Schlaggar CN, Prediction of individual brain maturity using fMRI, Science 329 (5997) (2010) 1358–1361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Weston J, Elisseeff A, Baklr G, Sinz F, The Spider Machine Learning Toolbox, (2005). [Google Scholar]
- [28].Zacharaki EI, Wang S, Chawla S, Soo Yoo D, Wolf R, Melhem ER, Davatzikos C, Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme, Magn. Reson. Med 62 (6) (2009) 1609–1618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].James G, Witten D, Hastie T, Tibshirani R, An Introduction to Statistical Learning, Springer, 2013. [Google Scholar]
- [30].Hastie T, Tibshirani R, Friedman J, The Elements of Statistical Learning, NY: Springer, 2001 2001. [Google Scholar]
- [31].Cruz JA, Wishart DS, Applications of machine learning in cancer prediction and prognosis, Cancer Informat. 2 (2006) 59. [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




