Skip to main content
. 2021 Jan 13;12:353. doi: 10.1038/s41467-020-20655-6

Fig. 2. DL capacitates more discriminative features across multiple classification and regression tasks.

Fig. 2

a Age and gender-based classification task. We test six standard machine learning (SML) methods, including linear (red hues—LDA linear discriminant analysis, LR logistic regression, SVML support vector machine with a linear kernel) and nonlinear models (green hues—SVMP, SVMR, and SVMS, which are abbreviations for SVM models with a Polynomial, Radial-Basis Function, and Sigmoidal kernel, respectively) by reducing the high-dimensional whole-brain gray matter by three dimensionality reduction (DR) methods (GRP gaussian random projection, RFE recursive feature elimination and UFS univariate feature selection) and compared against two deep learning (DL) models (blue hues: DL1 and DL2) trained on 3D whole-brain gray matter. This task was performed with a repeated (n = 20) random sub-sampling cross-validation scheme on UK Biobank MRI data (n = 12,314; nvalidation = 1157; ntest = 1157) on a range of training sample sizes that varied between 100 and 10,000 samples. Both DL classification models significantly outperformed (evaluated using a two-tailed paired sample t test) all six SML classification models regardless of the DR method for all training sizes beyond the test sample size. In addition, superior feature extraction of the DL models was immediately evident as the SML models trained on the DL1 representations (DL1 Embeddings panel on top right) performed equally well. The error-bars highlight the mean values ±1 SE across the 20 cross-validation repetitions, whereas the horizontal line along the normalized accuracy level of 0.1 represents the chance probability for this 10-way classification task. b Gender classification task. The largest sample (ntrain = 10,000; nvalidation = 1157; ntest = 1157) was evaluated for gender classification using the same cross-validation procedure. For this task, the tested DL model significantly outperformed all six SML classification models for all DR methods. The label abbreviations on the x axis of this plot (except DL) refer to this task performed by implementing a combination of the listed SML model and the DR method, whereas DL represents this task performed with our DL1 model on 3D whole-brain gray matter. c Age regression and d Mini-mental state examination (MMSE) regression tasks. We evaluate three SML methods, including the elastic net (EN), Kernel Ridge (KR), and Random Forrest (RF) regression methods on features estimated by all three DR methods. The label abbreviations on the x axis of this plot (except DL) refer to these tasks performed by implementing a combination of the listed SML model and the DR method, whereas DL represents this task performed with our deep vanilla regressor model (DL3) on 3D whole-brain gray matter. The age regression task was evaluated on the largest training sample size (ntrain = 10,000; nvalidation = 1157; ntest = 1157), whereas the MMSE regression task was implemented on ADNI MRI data (n = 828; ntrain = 428; nvalidation = 200; ntest = 200) using the same cross-validation procedure. The mean absolute error (MAE) and the Pearson correlation coefficient (PCC) (between the true and predicted values) performance metrics were estimated to compare performance for the DL vanilla regressor model and the three SML regression models for both regression tasks. Both tasks reported statistically significant improvement in MAE (a decrease) and PCC (an increase) for the DL regressor as compared to the SML regression models as evaluated using a two-tailed paired sample t test. For the boxplots plotted in b, c, and d panels, the box shows the inter quartile range (IQR between Q1 and Q3) of the data set, the central mark (horizontal line) shows the median and the whiskers correspond to the rest of the distribution based on IQR [Q1-1.5*IQR, Q3 + 1.5*IQR]. Beyond the whiskers, data are considered outliers and represented by red circles. Source data are provided as a Source Data file.