Abstract
Magnetic resonance imaging (MRI) is by nature a multi-modality technique that provides complementary information about different aspects of diseases. So far no attempts have been reported to assess the potential of multi-modal MRI in discriminating individuals with and without migraine, so in this study, we proposed a classification approach to examine whether or not the integration of multiple MRI features could improve the classification performance between migraine patients without aura (MWoA) and healthy controls. Twenty-one MWoA patients and 28 healthy controls participated in this study. Resting-state functional MRI data was acquired to derive three functional measures: the amplitude of low-frequency fluctuations, regional homogeneity and regional functional correlation strength; and structural MRI data was obtained to measure the regional gray matter volume. For each measure, the values of 116 pre-defined regions of interest were extracted as classification features. Features were first selected and combined by a multi-kernel strategy; then a support vector machine classifier was trained to distinguish the subjects at individual level. The performance of the classifier was evaluated using a leave-one-out cross-validation method, and the final classification accuracy obtained was 83.67% (with a sensitivity of 92.86% and a specificity of 71.43%). The anterior cingulate cortex, prefrontal cortex, orbitofrontal cortex and the insula contributed the most discriminative features. In general, our proposed framework shows a promising classification capability for MWoA by integrating information from multiple MRI features.
Introduction
Migraine is a form of primary neurovascular disorder characterized by episodic headache [1]. According to a survey by the World Health Organization, migraine ranks as the third most prevalent disorder, and affected nearly 15% of the whole population. More than 90% of sufferers are unable to work or function normally during their migraine attacks [2]. Moreover, people who undergo migraine may have an increased risk of ischemic stroke, unstable angina, or affective disorders [3–5]. Based on whether the headaches are accompanied by an early symptom that called aura, migraine is divided into two major subtypes: migraine without aura (MWoA) and migraine with aura. It is worth to note that two thirds of migraine patients suffer from MWoA [6], hence the early diagnosis and appropriate treatment of MWoA is imperative. Since MWoA has no clear prodrome, and the symptoms are hard to evaluate and may change from one attack to the next, it's not always easy to exclude other possible causes of headache and achieve an accurate diagnosis of MWoA using traditional methods (e.g. symptoms analysis, medical tests). Thus, there has been substantial interest in identifying objective biomarkers and developing automated methods that with the potential to assist the diagnosis of migraine.
In recent years, studies using magnetic resonance imaging (MRI) have greatly advanced our understanding to the neural mechanisms underlying migraine. Both structural and functional brain alterations in migraine have been revealed by MRI techniques. Specifically, studies using structural MRI (sMRI), especially high-resolution T1-weighted imaging, have demonstrated that migraine is linked with gray matter (GM) changes in the inferior parietal lobule [7], hippocampus [8], inferior frontal cortex [9], motor/premotor and the prefrontal cortex [10]; a recent study on migraineurs [11] also revealed changes in the regional cortical thickness, cortical surface area, and volume in several brain areas including the parahippocampal gyrus, anterior cingulate cortex and the medial orbital frontal gyrus. Meanwhile, by employing resting-state functional MRI (rs-fMRI), a number of studies have demonstrated that migraine was associated with functional brain alterations measured by various indices, including the amplitude of low-frequency fluctuations (ALFF), regional homogeneity (ReHo), as well as functional connectivity. For example, compared with the healthy controls, migraineurs showed significant ALFF changes in the anterior cingulate cortex and prefrontal cortex [12], and ReHo changes in the prefrontal cortex, orbitofrontal cortex [13], insula [14], and cuneus [15]. Altered functional connectivity in migraineurs was identified between the dorsolateral prefrontal cortex and the dorsal anterior cingulate cortex [16], between amygdala and insula [17], and between the prefrontal and temporal regions that were within the default mode network [18].
Despite identification of the aforementioned functional and morphological brain alterations, there is very few exploration on the possibility of utilizing the MRI findings to assist diagnosis of migraine patients. One important reason is that most of these findings were obtained by applying mass-univariate analysis approaches to detect group differences [19], however, for neuroimaging to be useful in a clinical setting, one must be able to provide predictions at the individual level. In the past several years, the application of machine learning techniques to neuroimaging data analysis has made promising progress in brain disease identification [20, 21]. Compared to the group level analyses, machine learning techniques allow inference at the single-subject level, and moreover, they are sensitive to subtle and spatially distributed differences in the brain that might be undetectable in group level comparisons. Recently, a growing number of studies have used machine learning methods to examine a range of psychiatric and neurological conditions, such as Alzheimer’s disease [22], autism [23], social anxiety disorder [24], depression [25], and schizophrenia [26].
In the application of machine learning techniques, one can either use features derived from single-modality MRI data or even a single measure, or include multi-modality features. The advantage of the latter way is that different MRI modalities/measures usually provide complementary pathological information. Aside the obvious distinction between sMRI and functional MRI, the above-mentioned three indices derived from rs-fMRI are also mutual-complementary. In detail, both ALFF and ReHo measure the regional spontaneous neural activity, but ALFF reflects the amplitude [27] while ReHo indicates the functional synchronization of neural units that are spatially close to each other [28]. Functional connectivity, i.e., the connectivity between separate brain regions, provides functional information at the level of brain networks [29]. By taking them together, one can achieve a more comprehensive understanding of the brain function from segregation to integration [30]. The advantage of combining multi-type features over single-type features was also verified in recent studies by showing improved classification performance in various diseases, including ADHD [31], Alzheimer’s disease [32–34], Parkinson’s disease [35], and schizophrenia [36]. Nonetheless, very few studies have tried the multi-type features combination approach in migraine discrimination, and the capability of machine learning techniques for this condition is not yet known.
With the hypothesis that integration of multi-type features in a proper way could improve the classification performance compared with single-type feature approaches, in the current study, we proposed a novel framework that employs multi-kernel support vector machine (SVM) to combine ALFF, ReHo, regional functional correlation strength (RFCS) and GM features. We examined whether this framework works better than single-type feature approaches in differentiating MWoA patients from healthy controls (HC).
Materials and Methods
Subjects
Twenty-one migraine patients without aura and twenty-eight healthy controls participated in this study. The age and gender differences between the two groups were tested using two-sample t-test and χ2 test respectively, and no significant difference was observed (p>0.05). The diagnosis of MWoA was made by neurologic practitioners according to the criteria from the Second Edition of the International Classification of Headache Disorders (ICHD-II) [37]. All the patients were right-handed and aged between 18 and 45 years. They had to be off analgesic drugs for at least 2 weeks, not in preventive treatment and had not used any other drugs for at least 1 month prior to the study. All patients were free from migraine attack during a follow-up for at least 72 hours prior to the brain scan, and 48 hours after the scan. Exclusion criteria included: patients with chronic migraine or other chronic or current pain disorders, subjects with a history of mental diseases or other neurological disorders besides migraine, pregnant females, subjects with MRI contraindications, or with structural abnormalities in brain found by computer tomography or conventional MRI scanning. The Local Ethics Committee of the West China Hospital of Sichuan University approved this study and all subjects have given written informed consents prior to the participation. Table 1 lists the demographic and clinical data of the 49 subjects.
Table 1. Demographic and clinical characteristics of the 49 participants.
MWoA (n = 21) | HC (n = 28) | T-value | χ2 value | p-value | |
---|---|---|---|---|---|
Sex (male/female) | 5/16 | 13/15 | — | 2.642 | 0.104a |
Age (mean ± SD, y) | 27.52 ± 8.15 | 29.18 ± 6.96 | -0.766 | — | 0.448b |
Education (mean ± SD, y) | 15.05 ± 4.14 | 16.36 ± 2.87 | -1.308 | — | 0.197b |
24-HAMD (mean ± SD) | 6.48 ± 6.46 | 2.46 ± 2.17 | 2.735 | — | 0.012b |
14-HAMA (mean ± SD) | 4.38 ±5.50 | 1.46 ± 1.43 | 2.371 | — | 0.027b |
SD = standard deviation; y = year; HAMD = Hamilton Depression Scale; HAMA = Hamilton Anxiety Scale; MWoA = migraine without aura; HC = healthy controls.
a The p value was obtained by χ2 test.
b The p values were obtained by two-sample t-test.
Image acquisition
All data were acquired using a 3.0 Tesla MRI system (Trio Tim, Siemens, Erlangen, Germany). Foam paddings and headphones were used to limit head movement and reduce scanner noise for the subjects. During the data acquisition, all participants were instructed to keep their eyes closed but not fall asleep, relax their minds and keep as still as possible. A three-dimensional magnetization-prepared rapid gradient echo sequence was used to collect T1-weighted structural images, with the following parameters: repetition time/echo time (TR/TE) = 1900/2.26 ms, flip angle = 9°, slice thickness/gap = 1/0 mm, field of view (FOV) = 256 × 256 mm2, matrix = 256 × 256, voxel size = 1 × 1 × 1 mm3. The rs-fMRI data were collected using an echo planar imaging (EPI) sequence: TR/TE = 2000/30 ms, flip angle = 90°, slice thickness/gap = 5/0 mm, FOV = 240 × 240 mm2, matrix = 64 × 64, voxel size = 3.75 × 3.75 × 5 mm3.
Data preprocessing
Structural images were preprocessed using the Statistical Parametric Mapping software (SPM8, http://www.fil.ion.ucl.ac.uk/spm). Images were first segmented into GM, white matter (WM) and cerebrospinal fluid partitions, then the GM and WM partitions were utilized to create a study-specific template using the diffeomorphic anatomical registration through exponentiated lie algebra (DARTEL) algorithm [38]. The individual GM images were warped to this template, and then modulated and resliced with the resolution remained. Finally, a Gaussian kernel with a full-width at half-maximum (FWHM) of 8 mm was used to smooth all the GM images.
Resting-state functional images were preprocessed using SPM8 and the Data Processing Assistant for Resting-State fMRI (DPARSF, http://rfmri.org/DPARSF) toolbox. The first 10 EPI volumes were discarded to avoid the magnetic saturation effects and ensure all participants adapted to the scanning environment. The remaining volumes first underwent slices timing correction, and then realigned to the first volume to correct for susceptibility-by-movement interaction. None of the subjects’ head in this study has a movement that exceeds 2 mm displacement and 2° of rotation in any direction. The realigned scans were further spatially normalized to the Montreal Neurological Institute template and resliced to 3 mm isotropic voxels. Next, band-pass filtering (0.01 Hz—0.08 Hz) was performed on the time series of each voxel to reduce the effect of low-frequency drifts and high-frequency physiological noise [39]. Then the ALFF, ReHo and RFCS were calculated as described below.
Feature extraction
ALFF is an effective indicator of regional intrinsic or spontaneous neuronal activity in the brain [40]. In the calculation of ALFF, the normalized and resliced images were firstly smoothed using a 4 mm FWHM Gaussian kernel. Then the ALFF, within the frequency band 0.01 Hz—0.08 Hz, was calculated for each voxel using the Resting-State fMRI Data Analysis Toolkit (REST, http://rest.restfmri.net). To reduce the global effects of variability across participants, for a certain subject, the ALFF of each voxel was divided by the global mean ALFF value. The individual ALFF maps were then partitioned into 116 regions of interest (ROIs) based on the automated anatomical labeling (AAL) atlas [41], and the 116 regional mean ALFF values were extracted as features for this subject.
ReHo, which measures the functional synchronization of a given voxel with its nearest neighbors, was also calculated using the REST software. For each normalized and resliced image, the measured cluster was set as 27 voxels [28]. Similar to ALFF processing, the ReHo of each voxel was also divided by the global mean ReHo value. After smoothing with a 4 mm FWHM Gaussian kernel, a ReHo map was obtained for each subject and partitioned into 116 ROIs using the AAL atlas. The ReHo features of one subject were consisted of the 116 regional mean ReHo values.
The RFCS measures the average correlation extent of a given brain region compared with all other regions [32]. To compute resting-state functional connectivity, we regressed out the spurious effects of nuisance covariates [42]. The individual volume was first partitioned into 116 ROIs using the AAL atlas, and the mean time series of each region was then extracted by averaging the time series within that region. We obtained a 116×116 correlation matrix for each subject by calculating the Pearson correlation coefficients between all possible pairs of regions. Then, the RFCS was calculated using a method that has been described in previous studies [43]. The i-th RFCS was defined as:
(1) |
where Rij is the correlation coefficient between region i and region j, and N is the number of regions.
Similar to the functional maps, individual GM maps were also partitioned into 116 ROIs and the regional mean GM values were extracted as features for that subject.
Finally, for each subject, three functional maps (ALFF, ReHo and RFCS) and one structural map (GM) were obtained, and for each map, we extracted 116 features from the 116 AAL ROIs. For a given ROI, ALFF, ReHo and RFCS reflect the degree of regional activity, the degree of regional synchronization and the degree of global synchronization of spontaneous neuronal activity, respectively; and the GM reflects the morphometric characteristics. Therefore, for each subject, we had 116 × 4 features which convey different types of information.
Feature selection
The dimension of original features is much higher than the number of samples, which might lead to the curse of dimensionality problem and high complexity. To speed up computation and to improve the classifier performance [44], a feature selection step was adopted to remove irrelevant or redundant features. Two-sample t-tests were performed to determine features that showed differences between MWoA and HC groups. Only features with a p value smaller than the predefined threshold (p<0.05, uncorrected) were retained. This process was performed independently for each feature, ignoring the relationship (redundant or complementary) with other features. To jointly consider the discriminative power among features, we employed the SVM-recursive feature elimination (SVM-RFE) method [45] for further feature selection. SVM-RFE is a backward selection technique that iteratively removes as many non-informative features as possible while retains features that carry discriminative information. The above mentioned feature selection scheme was performed separately on each feature type. Of note, all the procedures of feature selection were constrained on the training set of each leave-one-out cross-validation (LOOCV) fold.
Multi-kernel SVM
In order to effectively integrate different feature vectors, a multi-kernel SVM [46] was used. We constructed a base kernel for each feature level, and then mixed these base kernels by a weighted linear combination. Let F be the number of the base kernels, then the final kernel can be expressed as
(2) |
where xi is the feature vector of the i-th training sample; x is the feature vector of the test sample; k(f) (x (f) i, x (f)) is the f-th kernel function; and βf ≥ 0 is the weighting factor of f-th kernel function with the constraint of ∑ fβf = 1.
The final kernel matrix can be naturally embedded into the conventional single-kernel SVM. Applying a linear SVM to the final kernel, the decision function for the predicted label can be obtained as below:
(3) |
where N is the number of training samples, yi∈{-1, +1} is the corresponding class label, αi is the Lagrangian multiplier, and b is a bias.
The weights of different kernels in the multi-kernel SVM are learned based on the training samples. The reduced gradient method that converges rapidly and efficiently is chosen to optimize the kernel weights and SVM classifier. The optimization procedure is iterative: given the current solution of kernel weights, it solves a classical SVM with the combined kernel; then updates the kernel weights. This two-step process is repeated until the Armijo’s rule [47] is met. As explained above, the multi-kernel SVM can provide a convenient and effective way for fusing various features from different modalities. In our case, we focused on multiple features from two modalities: rs-fMRI and sMRI. Fig 1 gives the schematic illustration of our multi-feature combination and classification method.
Cross-validation
Cross-validation is often used to assess the generalizability of a model and to ensure that the model is not overfitted. Here, we used the LOOCV strategy to estimate the performance of the proposed framework. In detail, one sample was designated as a test sample, and the remaining samples were used to train the classifier. For each feature ai in the training samples, a common feature normalization scheme was adopted: ai = (ai-āi)/σi, where āi and σi are the mean and standard deviation of the i-th feature across all training samples, respectively. The estimated āi and σi would be used to normalize the corresponding feature of the test sample. In the LOOCV procedure, features used for classification were chosen from the normalized training samples. Specifically, after the filter-based feature selection (t-test), the retained features were further selected by using the SVM-RFE approach, in which an SVM classifier was repeatedly trained; and at each iteration, the square of the weight vector coefficient was used as the ranking criterion to remove the lowest ranking feature. SVM-RFE allowed us to derive an accuracy measure for each feature elimination level from which we determined the minimum number of features required to produce equivalent accuracy to all features. Then the determined features were used to train the multi-kernel SVM classifier. Optimal kernel weight βf and optimal multi-kernel SVM model were obtained and applied to the test sample. The whole process were repeated until all samples have been left out for test. The final accuracy was computed by averaging the accuracies from all tests. Accuracy, sensitivity and specificity were defined based on the prediction results of LOOCV, to quantify the performance of all compared methods:
(4) |
(5) |
(6) |
where TP denotes the number of patients correctly classified; FN denotes the number of patients classified as controls; TN denotes the number of controls correctly predicted; and FP denotes the number of controls classified as patients. We also calculated the area under the receiver operating characteristic curve (AUC) to illustrate the performance of classification.
Results
Comparison of classification performance
In classifications based on different feature types, the same feature extraction and selection criteria were used. The generalizability of these classifiers was estimated by using the LOOCV approach. We adopted traditional single-kernel SVM classifier for single-type feature classification, and multi-kernel SVM classifier for multi-type features classification. All the SVM classifiers were implemented with the linear kernel and the default penalty parameter C = 1. In case of direct feature concatenation, we linked the 116 × 4 features (ALFF, ReHo, RFCS and GM) into a long feature vector, and used the traditional single-kernel SVM to perform the classification. We also applied the M3 method in which features are trained using a multi-classifier based on four maximum uncertainty linear discriminate analysis base classifiers [32]. The proposed framework obtained a classification accuracy of 83.67%, with a sensitivity of 92.86% and a specificity of 71.43%, which were better than the results of any single-type feature or other multi-type feature combinations. The classification performance of all the feature types were summarized in Table 2, and the top 10 features most frequently selected in the proposed method were listed in Table 3.
Table 2. Classification performance using different types of feature.
Feature types | ACC (%) | SEN (%) | SPE (%) | AUC |
---|---|---|---|---|
ALFF | 65.31 | 85.71 | 38.10 | 0.69 |
ReHo | 67.35 | 71.43 | 61.90 | 0.67 |
RFCS | 63.27 | 82.14 | 38.10 | 0.68 |
GM | 71.43 | 85.71 | 52.38 | 0.83 |
ALFF+ReHo | 69.39 | 82.14 | 52.38 | 0.70 |
ALFF+RFCS | 64.58 | 85.71 | 33.33 | 0.54 |
ALFF+GM | 70.83 | 89.29 | 42.86 | 0.74 |
ReHo+GM | 72.92 | 85.71 | 52.38 | 0.75 |
ReHo+RFCS | 71.43 | 82.14 | 57.14 | 0.75 |
RFCS+GM | 75.00 | 92.86 | 47.62 | 0.78 |
ALFF+ReHo+RFCS | 72.92 | 85.71 | 52.38 | 0.75 |
ALFF+ReHo+GM | 75.51 | 89.29 | 57.14 | 0.78 |
ALFF+RFCS+GM | 79.59 | 89.29 | 66.67 | 0.84 |
ReHo+RFCS+GM | 73.47 | 85.71 | 57.14 | 0.71 |
Concatenation | 67.35 | 78.57 | 52.38 | 0.74 |
M3 method | 73.47 | 66.67 | 78.57 | 0.82 |
Proposed | 83.67 | 92.86 | 71.43 | 0.83 |
SEN = sensitivity, SPE = specificity, ACC = accuracy, AUC = area under receive operating characteristic curve. “+” indicates combination of the given types of features; “Concatenation” means all four types of feature were concatenated into a long feature vector.
Table 3. Top 10 frequently selected features for proposed classification.
Feature | Regions | Count |
---|---|---|
ALFF | Left anterior cingulate gyrus | 41 |
Left posterior cingulate gyrus | 39 | |
Left lenticular nucleus, pallidum | 25 | |
Left inferior frontal gyrus, opercular part | 13 | |
Right superior temporal gyrus | 11 | |
Right inferior frontal gyrus, opercular part | 10 | |
Right posterior cingulate gyrus | 10 | |
Vermis_1&2 | 9 | |
Right inferior parietal lobule | 6 | |
Right cerebelum_Crus1 | 5 | |
ReHo | Right inferior parietal lobule | 37 |
Right superior temporal gyrus | 36 | |
Left lenticular nucleus, putamen | 31 | |
Left cuneus | 27 | |
Right insula | 21 | |
Left lenticular nucleus, pallidum | 20 | |
Right hippocampus | 6 | |
Right Cerebelum_9 | 6 | |
Left superior frontal gyrus, medial orbital | 5 | |
Right lenticular nucleus, putamen | 5 | |
RFCS | Left superior frontal gyrus, orbital part | 41 |
Left amygdala | 39 | |
Right amygdala | 39 | |
Left hippocampus | 24 | |
Right Cerebelum_Crus2 | 18 | |
Right inferior frontal gyrus, triangular part | 15 | |
Right Cerebelum_9 | 12 | |
Right superior temporal gyrus | 11 | |
Right Cerebelum_7 | 11 | |
Vermis_10 | 4 | |
GM | Left supplementary motor area | 39 |
Left hippocampus | 38 | |
Right parahippocampal gyrus | 33 | |
Left parahippocampal gyrus | 17 | |
Right hippocampus | 9 | |
Left precentral gyrus | 5 | |
Right precentral gyrus | 5 | |
Left superior frontal gyrus | 5 | |
Right superior frontal gyrus | 4 | |
Right inferior frontal gyrus, opercular part | 4 |
In the last step of the framework, discriminative score of each test subject was acquired by the SVM classifier and then the sign of these scores were used for classification (e.g., positive indicates HC and negative indicates MWoA). The threshold for classification is chosen to be 0 for efficiency, but in order to evaluate the performance of the classifier this threshold can be varied across the range of all possible values obtained. With varying thresholds, the receiver operating characteristic (ROC) curve was plotted (Fig 2). The larger the area under ROC obtained, the better the classification performance achieved. AUC of the proposed framework was 0.83, indicating an excellent discrimination power.
The most discriminative features
Since the feature selection in each fold was performed based on the training set, the selected features slightly differ across different cross-validation folds. Therefore, we defined the features that were frequently selected in all cross-validations as the most discriminative features. The brain regions from where the top ten ALFF, ReHo, RFCS and GM features were selected are provided in Fig 3. Based on the selected ALFF feature, the ten most discriminative regions were the bilateral inferior frontal gyrus, left anterior cingulate gyrus, bilateral posterior cingulate gyrus, right inferior parietal lobule, left lenticular nucleus, right superior temporal gyrus, right cerebellum and vermis. For ReHo, the regions with relative good classification power included the left superior frontal gyrus, right insula, right hippocampus, left cuneus, right inferior parietal lobule, bilateral lenticular nucleus, right superior temporal gyrus, and right cerebellum. The most discriminative regions for RFCS included the left superior frontal gyrus, right inferior frontal gyrus, left hippocampus, bilateral amygdale, right superior temporal gyrus, right cerebellum and vermis. GM regions with relative high classification power included the bilateral precentral gyrus, bilateral superior frontal gyrus, right inferior frontal gyrus, left supplementary motor area, bilateral hippocampus, and the bilateral parahippocampal gyrus. For all feature types, the mostly selected regions are the anterior cingulate cortex, prefrontal cortex, orbitofrontal cortex, and the insula.
Discussion
To our best knowledge, this is the first study to demonstrate the advantage of multi-type features (sMRI and rsfMRI) integration over single feature approach in the discrimination between MWoA and HC. In general, the more feature types we included, the better performance we obtained. In contrast to the best classification achieved by single-type feature (GM), our framework achieved a higher classification accuracy (83.64% vs. 71.43%). This validated our hypothesis that the combination of different types of features should integrate more effective information into the SVM kernel, since they reflect complementary pathological aspects of diseases. It is worth noting that identifying disease by combining biomarkers from different feature types with different data fusion methods is still an open area for research.
In the classification using multi-modal imaging data, the feature combining method is a key point for effective information integration. The simple and common practice is concatenating all features into a longer feature vector. However, in this study the direct concatenation only achieved an accuracy of 67.35%, which is even lower than the result of the single GM feature, indicating that it’s not an effective combination method. Rather, when we combined features from different modalities using the multi-kernel combination strategy, which firstly combined the kernel matrices of different features into a mixed kernel matrix, and then using it to train a single SVM model, much better classification performance was achieved. This strategy assigns kernel weights to different feature types, which may partially interpret the improved capability in integrating comprehensive and complementary information for the purpose of identification. Our framework also performed better in comparison with the M3 method, which used multi-classifiers to integrate multi-modal information, though better specificity was achieved by M3. This may implicate the advantageous capability of M3 method to reduce the occurrence of misdiagnosis in some circumstance.
Feature selection is a useful and important process to remove irrelevant or redundant features for dimensionality reduction and improving the performance of the classifier. The filter-based (t-test) and wrapper-based (SVM-RFE) feature selection methods were used in this work. We would point out that we were not using SVM-RFE to optimize predictive accuracy but to remove non-informative data from the extracted features and to find the most parsimonious feature representation. Here, the behavior of predictive accuracy was evaluated as a function of the number of features in the data. The SVM-RFE algorithm iteratively removes non-informative features from the data set and derives an accuracy measure for each feature elimination level. In this way the minimum number of features required to produce equivalent accuracy to all features can be obtained.
Among all our single-type feature classifications, the highest accuracy was obtained when the GM features were used; as well, in all the two-type and three-type feature combinations, the ones that incorporated GM features always performed relatively better. In a recent study by Schwedt et al. [11], three structural features (regional cortical thickness, cortical surface area, and volume) were used for migraine identification and achieved a desirable classification accuracy. These results collectively suggest that structural information might be essential for migraine identification when using a machine learning approach. Considering other machine learning researches into Alzheimer’s disease [46] and Parkinson’s disease [48], which are all based on structural imaging features, we would agree with the argument that structural imaging markers may have greater weight in the diagnostic and prognostic judgment for neurological diseases [49].
Our proposed framework sought to identify the most discriminative features between MWoA patients and healthy controls. The brain regions with top discriminative powers were partially overlap with those reported in the previous MWoA studies that applied conventional univariate statistical analysis to functional and structural imaging data. For example, altered ALFF has been identified in the anterior cingulate gyrus and inferior frontal gyrus [12]; ReHo changes have been revealed in the superior frontal gyrus [13], insula, superior temporal gyrus, lenticular nucleus, cerebellum, hippocampus, cuneus and the inferior parietal lobule [14]; and, abnormal functional connectivity has been reported in the prefrontal and temporal regions [16, 18], as well as amygdala and visceroceptive cortex [17]. Regions showing high discriminative power for GM features in the current study, such as the precentral gyrus, superior frontal gyrus, inferior frontal gyrus and supplementary motor area, were also reported with deficits in previous voxel based morphometric studies of MWoA [9, 10, 16]. But with one important note, the regions shown on discrimination maps only indicate their relatively high contributions to the classification, but are not equivalent to those identified from mass-univariate analyses that are of more pathophysiology relevance.
Our study should be still taken as a preliminary proof-of-concept study. It proposes a promising approach for the future translation of neuroimaging into patient benefit. The pipeline we used includes the preprocessing of sMRI and rs-fMRI data using standard analytical software (SPM and DPARSF); extraction of feature vectors that consist of regional ALFF, ReHo, RFCS and GM values; selection of the extracted features; and application of the multi-kernel SVM to the selected feature vector. Once the multi-kernel SVM classifier is trained and a decision function is generated, a new sample could be classified in minutes. Although our approach requires replication and validation in larger samples, it provides initial evidence of a rapid and accessible methodology that could potentially aid clinical decisions.
One imperfection of this study is that in the depression and anxiety assessments the MWoA group scored significantly higher than healthy controls. Clinically diagnosed depression and anxiety are reported to be associated with brain alterations that overlap with the observations in the current study; for example, the ALFF changes in the cerebellum, anterior cingulate cortex and inferior frontal gyrus in depression [27], and fronto-amygdalar functional connectivity changes in anxiety [50]. We acknowledge that this may raise concerns about the bias introduced into the feature pattern, however, the patients’ HAMD and HAMA scores are still far below the diagnostic threshold for depression and anxiety disorder, so the differences are more likely to be emotional fluctuations caused by migraine. The effects of these differences on imaging data, if any, should be minor. Of course, an ideal research should only recruit MWoA but without any other combined symptoms.
Other possible limitations of our study should also be noted. Firstly, we only included rs-fMRI and sMRI for multi-modal classification. Other data modalities e.g., task fMRI, diffusion MRI and electroencephalogram, which all provide additional information, should be considered in future studies. Secondly, we used the AAL atlas to parcellate the whole brain for ROI definition. Though this method simplified computation, recent studies did find that different parcellation schemes could affect brain network analysis [51–53]. Thus, it’s needed to apply our method to other brain atlases, or even at voxel level, to investigate the impact of brain parcellation on classification. Finally, considering the relative small sample size (49 subjects in total), the classifier is only specific to the current dataset. In the future, we would like to use a larger dataset to determine the generalizability of this framework.
Conclusions
This study proposed a novel framework to discriminate MWoA and HC using ALFF, ReHo, RFCS and GM features derived from rs-fMRI and sMRI scans. Compared with the single-, two-, and three-type feature based classification, the classification performance was improved by integrating all four types of features via a multi-kernel SVM. The promising classification results suggest multi-modal imaging data based pattern-classification as a direction that deserves more comprehensive investigations for MWoA discrimination.
Supporting Information
Data Availability
All relevant data are within the paper and its Supporting Information files.
Funding Statement
This work was supported by the National Natural Science Foundation of China (JunranZ) Grant No. 81000605 (http://isisn.nsfc.gov.cn/egrantindex/funcindex/prjsearch-list#), National Natural Science Foundation of China (QG) Grant No. 81371536 (http://isisn.nsfc.gov.cn/egrantindex/funcindex/prjsearch-list#), Guangdong Natural Science Foundation (JunranZ) Grant No. S2012020010867 (http://gonggao.gdstc.gov.cn/itemListAction.do?method=Item_List_Data), Sichuan Science and Technology Plan Project (JunranZ) Grant No. 2015HH0036 (http://xmgl.scst.gov.cn/index.html), Guangxi Natural Science Foundation Key Projects (JH) Grant No. 2014GXNSFDA118037 (http://gxnsf.gxsti.net/stms/login.jsp), Key Lab Science Foundation of Guangxi province (JZ) Grant No. GXSCIIP201411 (http://www.gxtc.edu.cn/Default.aspx), National Natural Science Foundation of China (JiangZ) Grant No. 61273361 (http://isisn.nsfc.gov.cn/egrantindex/funcindex/prjsearch-list#), and Key Technology Research and Development Program of Sichuan Province (JiangZ) Grant No. 2014GZ0003 (http://xmgl.scst.gov.cn/index.html). The study design, decision to publish, and preparation of the manuscript are sponsored by the funding of Junran Zhang and Jiang Zhang; data collection is sponsored by the funding of Qiyong Gong and Jiangtao Huang.
References
- 1.Aurora SK, Welch K. Migraine: imaging the aura. Curr Opin Neurol. 2000; 13:273–276. 10.1097/00019052-200006000-00007 [DOI] [PubMed] [Google Scholar]
- 2.Vos T, Flaxman AD, Naghavi M, Lozano R, Michaud C, Ezzati M, et al. Years lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet. 2013; 380:2163–2196. 10.1016/S0140-6736(12)61729-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hung CI, Liu CY, Cheng YT, Wang SJ. Migraine: a missing link between somatic symptoms and major depressive disorder. J Affect Disord. 2009; 117:108–115. 10.1016/j.jad.2008.12.015 [DOI] [PubMed] [Google Scholar]
- 4.Raggi A, Leonardi M, Bussone G, D’Amico D. Value and utility of disease-specific and generic instruments for assessing disability in patients with migraine, and their relationships with health-related quality of life. Neurol Sci. 2011; 32:387–392. 10.1007/s10072-010-0466-3 [DOI] [PubMed] [Google Scholar]
- 5.Velentgas P, Cole JA, Mo J, Sikes CR, Walker AM. Severe vascular events in migraine patients. Headache. 2004; 44:642–651. 10.1111/j.1526-4610.2004.04122.x [DOI] [PubMed] [Google Scholar]
- 6.Dixit A, Bhardwaj M, Sharma B. Headache in pregnancy: a nuisance or a new sense? Obstet Gynecol Int. 2012. 10.1155/2012/697697 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Coppola G, Di Renzo A, Tinelli E, Iacovelli E, Lepre C, Di Lorenzo C, et al. Evidence for brain morphometric changes during the migraine cycle: a magnetic resonance-based morphometry study. Cephalalgia. 2015; 35:783–791. 10.1177/0333102414559732 [DOI] [PubMed] [Google Scholar]
- 8.Hougaard A, Amin FM, Ashina M. Migraine and structural abnormalities in the brain. Curr Opin Neurol. 2014; 27:309–314. 10.1097/WCO.0000000000000086 [DOI] [PubMed] [Google Scholar]
- 9.Hu W, Guo J, Chen N, Guo J, He L. A meta-analysis of voxel-based morphometric studies on migraine. Int J Clin Exp Med. 2015; 8:4311–4319. [PMC free article] [PubMed] [Google Scholar]
- 10.Kim J, Suh SI, Seol H, Oh K, Seo WK, Yu SW, et al. Regional grey matter changes in patients with migraine: a voxel-based morphometry study. Cephalalgia. 2008; 28:598–604. 10.1111/j.1468-2982.2008.01550.x [DOI] [PubMed] [Google Scholar]
- 11.Schwedt TJ, Chong CD, Wu T, Gaw N, Fu Y, Li J. Accurate classification of chronic migraine via brain magnetic resonance imaging. Headache. 2015; 55:762–777. 10.1111/head.12584 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Xue T, Yuan K, Cheng P, Zhao L, Zhao L, Yu D, et al. Alterations of regional spontaneous neuronal activity and corresponding brain circuit changes during resting state in migraine without aura. NMR Biomed. 2013; 26:1051–1058. 10.1002/nbm.2917 [DOI] [PubMed] [Google Scholar]
- 13.Yu D, Yuan K, Zhao L, Zhao L, Dong M, Liu P, et al. Regional homogeneity abnormalities in patients with interictal migraine without aura: A resting-state study. NMR Biomed. 2012; 25:806–812. 10.1002/nbm.1796 [DOI] [PubMed] [Google Scholar]
- 14.Zhao L, Liu J, Dong X, Peng Y, Yuan K, Wu F, et al. Alterations in regional homogeneity assessed by fMRI in patients with migraine without aura stratified by disease duration. J Headache Pain. 2013; 14:e85 10.1186/1129-2377-14-85 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Yu D, Yuan K, Zhao L, Liang F, Qin W. Regional Homogeneity Abnormalities Affected by Depressive Symptoms in Migraine Patients without Aura: A Resting State Study. PLoS One. 2013; 8:e77933 10.1371/journal.pone.0077933 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jin C, Yuan K, Zhao L, Zhao L, Yu D, Deneen KM, et al. Structural and functional abnormalities in migraine patients without aura. NMR Biomed. 2013; 26:58–64. 10.1002/nbm.2819 [DOI] [PubMed] [Google Scholar]
- 17.Hadjikhani N, Ward N, Boshyan J, Napadow V, Maeda Y, Truini A, et al. The missing link: Enhanced functional connectivity between amygdala and visceroceptive cortex in migraine. Cephalalgia. 2013; 33:1264–1268. 10.1177/0333102413490344 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tessitore A, Russo A, Giordano A, Conte F, Corbo D, De Stefano M, et al. Disrupted default mode network connectivity in migraine without aura. J Headache Pain. 2013; 14:89 10.1186/1129-2377-14-89 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Davatzikos C. Why voxel-based morphometric analysis should be used with great caution when characterizing group differences. Neuroimage. 2004; 23:17–20. 10.1016/j.neuroimage.2004.05.010 [DOI] [PubMed] [Google Scholar]
- 20.Haller S, Lovblad KO, Giannakopoulos P, Van De Ville D. Multivariate pattern recognition for diagnosis and prognosis in clinical neuroimaging: state of the art, current challenges and future trends. Brain Topogr. 2014; 27:329–337. 10.1007/s10548-014-0360-z [DOI] [PubMed] [Google Scholar]
- 21.Orrù G, Pettersson-Yeo W, Marquand AF, Sartori G, Mechelli A. Using support vector machine to identify imaging biomarkers of neurological and psychiatric disease: a critical review. Neurosci Biobehav Rev. 2012; 36:1140–1152. 10.1016/j.neubiorev.2012.01.004 [DOI] [PubMed] [Google Scholar]
- 22.Abdulkadir A, Mortamet B, Vemuri P, Jack CR, Krueger G, Klöppel S, et al. Effects of hardware heterogeneity on the performance of SVM Alzheimer's disease classifier. Neuroimage. 2011; 58:785–792. 10.1016/j.neuroimage.2011.06.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ecker C, Rocha-Rego V, Johnston P, Mourao-Miranda J, Marquand A, Daly EM, et al. Investigating the predictive value of whole-brain structural MR scans in autism: a pattern classification approach. Neuroimage. 2010; 49:44–56. 10.1016/j.neuroimage.2009.08.024 [DOI] [PubMed] [Google Scholar]
- 24.Frick A, Gingnell M, Marquand AF, Howner K, Fischer H, Kristiansson M, et al. Classifying social anxiety disorder using multivoxel pattern analyses of brain function and structure. Behav Brain Res. 2014; 259:330–335. 10.1016/j.bbr.2013.11.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gong Q, Wu Q, Scarpazza C, Lui S, Jia Z, Marquand A, et al. Prognostic prediction of therapeutic response in depression using high-field MR imaging. Neuroimage. 2011; 55:1497–1503. 10.1016/j.neuroimage.2010.11.079 [DOI] [PubMed] [Google Scholar]
- 26.Iwabuchi SJ, Liddle PF, Palaniyappan L. Clinical utility of machine-learning approaches in schizophrenia: improving diagnostic confidence for translational neuroimaging. Front Psychiatry. 2013; 4:95 10.3389/fpsyt.2013.00095 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Guo WB, Liu F, Xue ZM, Xu XJ, Wu RR, Ma CQ, et al. Alterations of the amplitude of low-frequency fluctuations in treatment-resistant and treatment-response depression: a resting-state fMRI study. Prog Neuropsychopharmacol Biol Psychiatry. 2012; 37:153–160. 10.1016/j.pnpbp.2012.01.011 [DOI] [PubMed] [Google Scholar]
- 28.Zang Y, Jiang T, Lu Y, He Y, Tian L. Regional homogeneity approach to fMRI data analysis. Neuroimage. 2004; 22:394–400. 10.1016/j.neuroimage.2003.12.030 [DOI] [PubMed] [Google Scholar]
- 29.Biswal BB, Kylen JV, Hyde JS. Simultaneous assessment of flow and BOLD signals in resting-state functional connectivity maps. NMR Biomed. 1997; 10:165–170. [DOI] [PubMed] [Google Scholar]
- 30.Sporns O. The human connectome: a complex network. Annals of the New York Academy of Sciences. 2011; 1224:109–125. 10.1111/j.1749-6632.2010.05888.x [DOI] [PubMed] [Google Scholar]
- 31.Sato JR, Hoexter MQ, Fujita A, Rohde LA. Evaluation of pattern recognition and feature extraction methods in ADHD prediction. Front Syst Neurosci. 2012; 6:23015782 10.3389/fnsys.2012.00068 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Dai Z, Yan C, Wang Z, Wang J, Xia M, Li K, et al. Discriminative analysis of early Alzheimer's disease using multi-modal imaging and multi-level characterization with multi-classifier (M3). Neuroimage. 2012; 59:2187–2195. 10.1016/j.neuroimage.2011.10.003 [DOI] [PubMed] [Google Scholar]
- 33.Liu F, Zhou L, Shen C, Yin J. Multiple Kernel Learning in the Primal for Multimodal Alzheimer’s Disease Classification. IEEE J Biomed Heath. 2014; 18:984–990. 10.1109/JBHI.2013.2285378 [DOI] [PubMed] [Google Scholar]
- 34.Zhang D, Wang Y, Zhou L, Yuan H, Shen D, ADNI. Multimodal classification of Alzheimer's disease and mild cognitive impairment. Neuroimage. 2011; 55:856–867. 10.1016/j.neuroimage.2011.01.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Long D, Wang J, Xuan M, Gu Q, Xu X, Kong D, et al. Automatic classification of early Parkinson's disease with multi-modal MR imaging. PLoS One. 2012; 7:e47714 10.1371/journal.pone.0047714 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ulas A, Castellani U, Mirtuono P, Bicego M, Murino V, Cerruti S, et al. Multimodal Schizophrenia Detection by Multiclassification Analysis. In: Martin CS and Kim SW, editors. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. 2011. pp. 491–498.
- 37.Headache Classification Subcommittee of the IHS. The international classification of headache disorders: 2nd edition. Cephalalgia. 2004; 24:9–106. [DOI] [PubMed]
- 38.Ashburner J. A fast diffeomorphic image registration algorithm. Neuroimage. 2007; 38:95–113. 10.1016/j.neuroimage.2007.07.007 [DOI] [PubMed] [Google Scholar]
- 39.Lowe M, Mock B, Sorenson J. Functional connectivity in single and multislice echoplanar imaging using resting-state fluctuations. Neuroimage. 1998; 7:119–132. 10.1006/nimg.1997.0315 [DOI] [PubMed] [Google Scholar]
- 40.Zang YF, He Y, Zhu CZ, Cao QJ, Sui MQ, Liang M, et al. Altered baseline brain activity in children with ADHD revealed by resting-state functional MRI. Brain Dev. 2007; 29:83–91. 10.1016/j.braindev.2006.07.002 [DOI] [PubMed] [Google Scholar]
- 41.Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage. 2002; 15:273–289. 10.1006/nimg.2001.0978 [DOI] [PubMed] [Google Scholar]
- 42.Fox MD, Snyder AZ, Vincent JL, Corbetta M, Van Essen DC, Raichle ME. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc Natl Acad Sci U S A. 2005; 102:9673–9678. 10.1073/pnas.0504136102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Jiang T, He Y, Zang Y, Weng X. Modulation of functional connectivity during the resting state and the motor task. Hum Brain Mapp. 2004; 22:63–71. 10.1002/hbm.20012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Pereira F, Mitchell T, Botvinick M. Machine learning classifiers and fMRI: a tutorial overview. Neuroimage. 2009; 45:S199–S209. 10.1016/j.neuroimage.2008.11.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002; 46:389–422. [Google Scholar]
- 46.Liu F, Wee CY, Chen H, Shen D. Inter-modality relationship constrained multi-modality multi-task feature selection for Alzheimer's Disease and mild cognitive impairment identification. Neuroimage. 2014; 84:466–475. 10.1016/j.neuroimage.2013.09.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Armijo L. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of mathematics. 1966; 16:1–3. 10.2140/pjm.1966.16.1 [DOI] [Google Scholar]
- 48.Rana B, Juneja A, Saxena M, Gudwani S, Kumaran SS, Agrawal R, et al. Regions-of-interest based automated diagnosis of Parkinson’s disease using T1-weighted MRI. Expert Systems with Applications. 2015; 42:4506–4516. 10.1016/j.eswa.2015.01.062 [DOI] [Google Scholar]
- 49.Frisoni G. Structural imaging in the clinical diagnosis of Alzheimer's disease: problems and tools. J Neurol Neurosurg Psychiatry. 2001; 70:711–718. 10.1136/jnnp.70.6.711 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Hahn A, Stein P, Windischberger C, Weissenbacher A, Spindelegger C, Moser E, et al. Reduced resting-state functional connectivity between amygdala and orbitofrontal cortex in social anxiety disorder. Neuroimage. 2011; 56:881–889. 10.1016/j.neuroimage.2011.02.064 [DOI] [PubMed] [Google Scholar]
- 51.Hayasaka S, Laurienti PJ. Comparison of characteristics between region- and voxel-based network analyses in resting-state fMRI data. Neuroimage. 2010; 50:499–508. 10.1016/j.neuroimage.2009.12.051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Zalesky A, Fornito A, Harding IH, Cocchi L, Yuecel M, Pantelis C, et al. Whole-brain anatomical networks: Does the choice of nodes matter? Neuroimage. 2010; 50:970–983. 10.1016/j.neuroimage.2009.12.027 [DOI] [PubMed] [Google Scholar]
- 53.Wang J, Wang L, Zang Y, Yang H, Tang H, Gong Q, et al. Parcellation-Dependent Small-World Brain Functional Networks: A Resting-State fMRI Study. Hum Brain Mapp. 2009; 30:1511–1523. 10.1002/hbm.20623 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data are within the paper and its Supporting Information files.