Abstract
We set out to map the combined and unique contributions of five different biomarkers for two cognitive outcomes in cognitively healthy adults. While probing the association of biomarkers with cognition in the full experimental sample is essential, we focused on how well any such associations would persist in held-out data.
335 cognitively normal participants, age 20-80 years older, with a mean of 16 years of education were included in the study. Z-scores were computed for fluid reasoning and vocabulary. The following imaging data were included: brain volume-thickness, fractional anisotropy, deep grey matter volume, and white matter hyperintensity.
Volume accounted for most of the variance in both cognitive domains. In out-of-sample data, fluid reasoning was best predicted by volumes, but vocabulary by the combination of all modalities. While the predictive utility improves overall for older participants, the information gleaned relative to null-models becomes less for older participants.
An optimized set of brain biomarkers can predict cognition in out-of-sample data, to various degrees for both fluid and crystallized intelligence.
Keywords: aging, brain morphometry, out-of-sample prediction, fluid reasoning, vocabulary
Introduction
Although the cognitive consequences of aging are evident, less is known about the biological factors, and their synergies, that contribute to the cognitive changes. Specific in-vivo measured brain bio-markers (BMs) both biochemical and imaging, are indicators of specific brain changes that may accompany changes in cognitive status. Interestingly, high-dimensional pattern regression on medical images via machine learning has been used for higher estimation accuracy and better generalizability in quantifying and detecting spatially complex imaging patterns of pathology from medical scans (Wang et al. 2010). Studies have proposed multiple regression models to build linear models estimating changes in cognition from structural MRI scans (Duchesne et al. 2009; Duchesne et al. 2005). Thus, a thorough validation of the joint effect of BMs is required for understanding age-related cognitive differences.
There are some limitations in the existing studies examining BM’s association with cognition. Most of the sample sizes were relatively small, or studies were focused on specific age groups (i.e. only older adults), or used only a brief cognitive examination test (i.e. the Mini Mental State Examination), hampering the generalizability of the results. Existing studies also mainly examined individual or pre-chosen sets of BMs e.g. hippocampal volume only, based on prior correlations with both age and cognition. Thus, the association with cognition for a variety of biomarkers considered in conjunction has only been rarely examined. Some studies have explored how a specific set of BMs can explain variation in cognition in cognitively normal older adults. For example, Hedden et al. (Hedden et al. 2016) evaluated the predictive utility of a set of individual BMs, selected based on prior association with aging and cognition; with a similar analysis also having been reported by our team (Tsapanou et al. 2019). However, these BMs were not specifically derived for optimized predictive utility. A natural next step for us was to attempt to derive an optimized set of BMs and test its account of cognitive variance in held-out data. Such an approach needs not sacrifice inferential rigor and can further test predictive utility, something that is not routinely attempted in typical biomarker-research surveys.
In the current study, we initially examined the variance in cognition associated with specially derived predictive patterns from each brain modality separately. More precisely, the modalities we used were the following: brain volumes and thicknesses in 68 regions of interest, fractional anisotropy and medial diffusivity in 18 white-matter tracts, deep gray matter volume in 14 regions of interest, and total white matter hyperintensity burden (WMH). We then also considered pooled estimates or “votes” of all single-modality predictions that are more predictive than the best of their individual-modality constituents, in order to examine whether each factor and modality provides unique information. Our quasi out-of-sample prediction of cognition (explained in detail in the Methods section) marks the main conceptual and algorithmic centerpiece of the current study. While probing the association of biomarkers with cognition in the full experimental sample is a natural and important first step, we wanted to also examine how well any such associations would persist in held-out data. Standard statistical inference in a training sample might not fully inform about the predictive utility of a fitted model in held-out data. We checked such replication empirically in split-sample simulations.
For our simulations we were interested in several questions, which, to our knowledge, have not been systematically investigated before: (1) what are the differences in predictive utility for different biomarker modalities? (2) How well can cognitive abilities pertaining to fluid vs. crystallized intelligence be predicted from structural brain markers, and are there differences? (3) Is age associated with the predictability of cognition from brain, and are there differences between fluid and crystallized intelligence?
Our a-priori hypotheses for the predictive utility of biomarkers were broad: we expected gray-matter structural measures to show better utility than white matter hyperintensities. For the dependence of predictive utility on the type of cognition, we expected that fluid intelligence would show a stronger dependence on brain structure, reasoning that crystallized intelligence by its nature as a proxy for cognitive reserve would be less dependent on “brain real estate”. Similarly, we expected that age would be negatively associated with the success of predicting cognition from brain: higher age might be associated with greater prominence of individual differences and residual factors not captured reliably by brain-markers.
Methods
Participants were recruited for two studies: The Reference Ability Neural Network (RANN), and the Cognitive Reserve (CR) study. The RANN study was designed to identify networks of brain activity uniquely associated with performance across adulthood for each of the four following reference abilities: memory, reasoning, speed of processing and vocabulary (Habeck et al. 2016). The CR study was designed to elucidate the neural underpinnings of cognitive reserve and the concept of brain reserve (Stern 2012). All participants were native English speakers, right-handed, with at least a fourth-grade reading level. In order to be included in the study, participants had to be also free of any major neurological or psychiatric conditions that could affect their cognition. Careful screening excluded participants with Mild Cognitive Impairment (MCI) or dementia. A score equal or greater than 130 was required on the Mattis Dementia Rating Scale (Mattis 1988) for the inclusion in the studies. Moreover, participants had to have no or minimal complaints on a questionnaire about their functionality (Blessed, Tomlinson, and Roth 1968). Both RANN and CR studies have been approved by the Institutional Review Board of Columbia University. More detailed information about the two studies can be found in previous publications (Habeck et al. 2016; Habeck et al. 2017; Stern 2012; Stern et al. 2014; Stern 2009; Razlighi et al. 2017). We selected participants who had complete data on the measures used in the current analyses.
Imaging data:
We used data from structural and resting state functional T1 Magnetic Resonance Imaging (MRI), diffusion tensor imaging (DTI), and fluid-attenuated inversion recovery (FLAIR). All scans were acquired on the same 3.0 Tesla Philips Achieva MRI scanner. We used regional brain volumes and thicknesses in 68, and fractional anisotropy (FA) and medial diffusivity (MD) in major tracts, derived from the FreeSurfer (v5.1.0) software for human brain imaging analysis (http://surfer.nmr.mgh.harvard.edu/). We also used 14 deep gray matter (GM) regions and log transformed global white matter hyperintensity burden (log(WMH+1)) derived from the FMRIB software library (FSL).
Volume and cortical thickness: A T1-weighed (MPRAGE) scan was used. The scans were acquired with TE/TR of 3/6.5 ms and flip angle of 8 degrees, in-plane resolution of 256 × 256, field of view of 25.6 × 25.6 cm, and 165-180 slices in axial direction with slice-thickness/gap of 1/0 mm. The FreeSufer software was used for the reconstruction of the T1 scans (Fischl et al. 2002; Fischl et al. 2004). We used the volumes (in mm3) of 68 regions of Interest (ROIs), (thicknesses in mm). Thickness in 68 cortical regions (in mm) was computed using the standard FreeSurfer parcellation (Desikan et al. 2006).
We also used the volume of 14 deep gray matter volume regions: right-left cerebellar cortex, right-left thalamic cortex, right-left caudate, right-left putamen, right-left pallidum, right-left hippocampus, right-left amygdala.
Fractional Anisotropy (FA) and Medial Diffusivity (MD): Two sets of DTI images were acquired with each set having 56 directions using these parameters: b = 800 s/mm2, TE = 69 ms, TR = 7645-7671 ms, FOV=22.4 × 22.4 cm, flip angle = 90°, in-plane resolution 112 × 112 voxels, acquisition time 9 min 27 s, slice thickness = 2 mm (no gap), and 75 slices. The two data sets were then concatenated and processed with TRACULA (Tracts Constrained by Underlying Anatomy) DTI, distributed as part of the FreeSurfer library, which produces 18 major white matter tracts, as described in previous publications (Li et al. 2018; Yendiki et al. 2011). All tracts were used as individual input variables.
White matter hyperintensity (WMH) burden: FLAIR images for visualization of WMH were used with the following parameters: TR/TI (ms) 11000/2800 TE (ms): 125 in plane resolution 256 × 189, FOV 23.0 × 17.96 cm, and 30 slices with slice-thickness/gap of 4/0.5 mm. For the extraction of the WMH we used the Lesion Segmentation Tool (LST), which is a toolbox for Statistical Parametric Mapping (SPM), able to segment T2 hyperintense lesions in FLAIR images. Lesions were segmented by the lesion growth algorithm (Schmidt et al. 2012) as implemented in the LST version 2.0.15 (www.statistical-modelling.de/lst.html) for SPM. Following standard convention, WMH voxel-counts were log-transformed (according to log(WMH+1)) (resulting in a normal distribution) and treated as a continuous variable, with higher values indicating greater WMH burden.
Deep gray matter ROIs: The analysis resulted in 14 ROIs, and was performed by using FSL Sienax.
Neuropsychological evaluation:
Each participant underwent an extensive neuropsychological evaluation. From the neuropsychological battery, we derived two cognitive domains to use in our analysis; fluid reasoning and vocabulary, so that we cover the examination of both fluid and crystallized intelligence. The following tests were used: for fluid reasoning: Wechsler Adult Intelligence Scale (WAIS-III) matrix reasoning, letter-number sequencing, and block design test, total correct (D. 1997). For vocabulary : WAIS-III vocabulary test, the Wechsler Test of Adult Reading (WTAR) (D. 2001), and the American National Adult Reading Test (AMNART), total correct (Grober and Sliwinski 1991). Z-scores were computed for each cognitive task, and scores were transformed, and possibly sign-reversed, such that a higher score indicates better performance.
Statistical analysis
Statistical analyses were performed using in-house code written in Matlab R2017a. Analyses were performed across the whole group, aged 20 – 80.
We derived predictive patterns for each of the cognitive domains and general cognition using each of the brain measures: (1) volume, (2) thickness, (3) FA, (4) deep GM volume and (5) WMH burden, using PCA regression as described below. (6) We further produced a composite prediction scheme (= “vote”): combining the predictions of all brain measures by averaging. All single-modality predictions and the prediction scheme were applied to held-out data.
Quasi out-of-sample prediction of cognitive function
The main computational frame for generating predictions and assessing predictive utility in held-out data consisted of many iterations of split sample assignments, also termed Monte-Carlo 5-fold cross-validation. The data were split randomly 1,000 times into training sets of 240 participants, and test sets of 60 participants. Using Principal Components Analysis (PCA) (explained below), a prediction of a cognitive outcome measure in the test set on the basis of model estimated in the training set was calculated, and the goodness of the prediction was recorded to track performance. We had 2 cognitive outcomes (fluid reasoning and vocabulary) and 5 data modalities (outlined above). In total, we thus ran 2 split-sample simulations in which all 5 input modalities were tested, with the computation of a “vote predictor” which simply averaged the predictions of all 5 input modalities. For tracking prediction performance, we computed the Predicted Residual Sum of Squares (PRESS) in the held-out data. In total, for each of the 2 simulations, we thus have 1,000 PRESS values for 6 predictions, i.e. 5 single-modality predictions, and one vote of all 5 brain-modality predictions. The PRESS statistic was presented in box-plots showing medians and inter-quartile ranges.
For full disclosure, we speak of “quasi” rather than genuine replication; because data for both training and test sets were sampled from a limited pool, they were not independent across iterations.
PCA regression/SSM
PCA regression, in a brain-imaging context termed Scaled Subprofile Model (Habeck, Stern, and Alzheimer’s Disease Neuroimaging 2010; Habeck 2010; Moeller et al. 1987; Strother et al. 1995), was used to develop predictive models for the cognitive outcomes. This approach has been used extensively by our lab in the past and is an obvious choice for a continuous prediction problem RN ➔ R where N features are reduced to one prediction of a cognitive outcome.
While a strict implementation of PCA/SSM assumes multiplicative variability, and correspondingly imposes a logarithmic transform on all input data prior to the PCA, we did not perform this for the current paper. A log-transform did not enhance prediction performance and led to slightly worse result for white-matter tract integrity; for that reason, we omitted it.
Assuming the training and test data as Y and Z (with the same number of features as rows, but different numbers of observations as columns) and the set of Principal Components, derived via PCA only from Y, as V (with as many rows as Y and Z, and as many columns as components), the estimation-step of the linear prediction-model in the training set can be written as a linear regression according to:
where 1 denotes an intercept term, and ‘ denotes matrix transposition, and Cog denotes the cognitive outcome in question (either fluid reasoning or vocabulary). This linear regression produces the regression weights β and pinv stands for the Moore-Penrose pseudoinverse transformation.
Next, the estimated model is used in the held out data to predict the cognitive outcome:
In short, we projected a set of Principal Components obtained from the training set into the training set to obtain subject scores of all PCs in the training set, and then performed a linear regression with the cognitive outcome of interest in the training set as the dependent variable. Next, the subject scores of the same PC-set were obtained in the test data set, and the prediction model (=regression weights) was applied to generate a prediction. This prediction can then be compared to the actual values of the cognitive outcome of interest in the test data set. As mentioned before, prediction performance was assessed with the PRESS statistic. The lower PRESS, the better the prediction. PRESS is the mean squared error of the prediction in the held-out data:
where Cog and Cog* denote the actual and the predicted values of the cognitive outcome in the replication sample, and < > denoted the sample average in the held-out data. There are as many PRESS values as iteration in the simulation, i.e. 1,000.
The set of selected Principal Components in V was chosen via minimization of the Akaike criterion (AIC). All contiguous sets 1, 1:2, 1:3, 1:4, …, 1:N, were tried out in the training sample in N regressions, where N was the number of Eigen values. The best-fitting set 1: K, was chosen for which AIC was minimal.
In contrast to group-level statistics like correlation, PRESS can be assessed on a singleparticipant level. However, PRESS is not independent of variance in the outcome measure and in the biomarker. Lower variance might result in lower PRESS values by default, i.e. even in null data. To correct for such an effect and estimate the true predictive utility of the biomarker, for each iteration we also ran a ‘null’ model, i.e. a model where the dependent cognitive variable in the training set was randomized. Such a null model’s PRESS in the held-out data can serve as a reference benchmark how much meaningful information can be gleaned from the modality in question about the cognitive outcome of interest. We computed the normalized PRESS as the ratio
which should be smaller than unity, and serves as an absolute measure how much information can be provided by the biomarker about the outcome: a ratio close to unity indicates that not much information can be provided by the biomarker compared to the null prediction. Normalized PRESS can now serve as statistic to compare the predictive utility different modalities or cognitive outcomes.
Figure 1 explains PCA/SSM and our split-sample framework in more detail.
Figure 1:

Computational recipe for Scaled Subprofile Modeling, and the split-sample framework used in the current article.
Composition of cognition-related patterns
Prediction in independent data is a stringent and ecologically valid test for the diagnostic power of cognitive markers, but the composition of the associated biomarker pattern is of significant interest too. We checked whether the patterns were stable across different resampling iterations, i.e. whether there were regions or tracts consistently featured in the derived patterns with a consistent sign. For visualization of pattern loadings, for every region or tract, we computed the empirical [1%, 99%] coverage interval. If the interval fell below/above zero, we assigned a negative/positive sign. The issue of pattern composition only applied to 4 modalities: (1) regional volume ROIs, (2) regional thickness ROIs, (3) tract-based measures, and (4) deep gray-matter ROIs. For WMH and the composite “all brain” vote, no pattern is available.
Relation of prediction success to age
A question of great interest to us was whether predictive utility shows differential associations to age. To avoid any loss of statistical power we refrained from (anyways somewhat arbitrary) age stratification and did not subset our data; however, we computed the mean age value for every replication sample. A simple bivariate correlation was run across all 1,000 iterations to test whether the mean age in the replication sample was related to the normalized PRESS value.
Distribution of PC-sets in split-sample simulations
We also explored the distribution of obtained PC-sets for all 4 modalities on which PCA/SSM was performed. The maximum PC-index, K, of the set 1: K was chosen according to the AIC criterion. We plotted all distributions for all 2 × 4 predictions in Figure 4.
Figure 4:

Distribution of the maximum PC index K for the PC-sets 1: K that were chosen in the training-sample pattern derivation for 4 modalities, with fluid reasoning as the cognitive outcome in the top row, and vocabulary in the bottom row. Shown are the distributions for all 1,000 iterations.
Results
Demographic characteristics/ Correlations:
The sample consisted of 335 participants. There were slightly more women (57.6%), Mean age of 51 years (SD:16.5), and mean education was 16 (SD:2.3) years, with a range of 12 to 22 years. The PCA-based derivation of cognitive markers revealed significant associations between the two cognitive domains and each of the brain modalities. In Table 1 we present the significant loadings across all 1,000 patterns for each brain modality and cognitive outcome. Positive loadings were observed for all 4 modalities for fluid reasoning, while only tract-based medial-diffusivity measures showed negative loadings. For vocabulary most modalities showed negative as well as positive loadings.
Table 1 :
Significant loadings at all 1,000 patterns for each brain modality, age-group and cognitive outcome. Presented variables that have loadings with the coverage interval [1%, 99%] excluding zero.
| Cortical Volume | Cortical Thickness | FA Tracts | Deep GM ROIs |
|---|---|---|---|
| Fluid reasoning: positive loadings | |||
| lh-caudalanteriorcingulate lh-cuneus lh-entorhinal lh-fusiform lh-isthmuscingulate lh-lateraloccipital lh-lateralorbitofrontal lh-lingual lh-medialorbitofrontal lh-parahippocampal lh-parsopercularis lh-parsorbitalis lh-parstriangularis lh-pericalcarine lh-postcentral lh-precuneus lh-rostralanteriorcingulate lh-superiorparietal lh-supramarginal lh-frontalpole lh-temporalpole rh-cuneus rh-entorhinal rh-fusiform rh-isthmuscingulate rh-lateraloccipital rh-lateralorbitofrontal rh-lingual rh-medialorbitofrontal rh-parahippocampal rh-parsopercularis rh-parsorbitalis rh-pericalcarine rh-postcentral rh-precuneus rh-superiorparietal rh-frontalpole rh-temporalpole rh-insula |
lh-cuneus lh-isthmuscingulate lh-lingual lh-parstriangularis lh-pericalcarine lh-posteriorcingulate lh-insula rh-bankssts rh-cuneus rh-isthmuscingulate rh-lingual rh-pericalcarine rh-rostralanteriorcingulate rh-superiortemporal rh-transversetemporal rh-insula |
FA
fmajor FA_fminor FA_lh.atr FA_lh.ccg FA_lh.cst FA_lh.ilf FA_lh.slfp FA_lh.slft FA_lh.unc FA_rh.atr FA_rh.ccg FA_rh.cst FA_rh.ilf FA_rh.unc |
Left-Caudate Left-Putamen Left-Pallidum Left-Hippocampus Left-Amygdala Right-Thalamus-Proper Right-Caudate Right-Putamen Right-Pallidum Right-Hippocampus Right-Amygdala |
| Fluid reasoning: negative loadings | |||
| None | None | MD_fmajor MD_fminor MD_lh.atr MD_lh.ccg MD_lh.cst MD_lh.ilf MD_lh.slfp MD_lh.slft MD_lh.unc MD_rh.atr MD_rh.ccg MD_rh.cst MD_rh.ilf MD_rh.slfp MD_rh.slft MD_rh.unc |
None |
| Vocabulary: positive loadings | |||
| lh-cuneus lh-entorhinal lh-lateraloccipital lh-superiorparietal rh-entorhinal |
lh-cuneus lh-medialorbitofrontal rh-cuneus rh-medialorbitofrontal |
FA_lh.cab | Right-Cerebellum-Cortex |
| Vocabulary: negative loadings | |||
| rh-ro stralmiddlefrontal | lh-caudalmiddlefrontal lh-parsopercularis lh-superiorfrontal lh-superiortemporal rh-caudalmiddlefrontal rh-parsopercularis rh-superiorfrontal |
MD_lh.cab FA_fminor |
None |
Direct relationship of brain markers to cognitive domains.
The variance accounted for both fluid reasoning and vocabulary by the brain modalities is tabulated in Table 2.
Table 2:
Summary of fit statistics for each modality and cognitive domain. The variance accounted for (VAF=R2) in each cognitive outcome is tabulated. The models were run in the full group of participants: no replication in held-out data was performed. PCA/SSM was run for all modalities but WMH, for which simple linear regression was used.
| Fluid reasoning | Vocabulary | |
|---|---|---|
| Vol | 0.2223 | 0.1085 |
| Thx | 0.1148 | 0.1074 |
| Tracts | 0.0827 | 0.0517 |
| Deep GM | 0.1763 | 0.0285 |
| log (WMH+1) | 0.0312 | 0.0182 |
| All brain | 0.2495 | 0.2085 |
Prediction success
In Figures 2, 3 we present the prediction success for both cognitive outcomes as quantified by the median and interquartile ranges for the PRESS and normalized PRESS statistic. In general, the actual prediction was better than the null prediction. Only for WMH, there is no real difference between actual and null prediction: for both cognitive domains: more than 26% of normalized PRESS values for WMH lie above unity for both outcomes.
Figure 2:

Out-of-sample prediction success of different brain structural modalities. Upper panel: prediction success quantified by the median and interquartile ranges for the PRESS statistic. For each modality the prediction success of a model with the correct neural-cognitive assignment in the training sample (actual) is contrasted to a null model where the neuro-assignment was randomized to create null conditions. Lower panel: relative prediction success which is computed by dividing the actual PRESS value by the null PRESS value for each iteration. Values of normalized PRESS lower than unity indicate that the modality in question offers information about the to-be-predicted cognitive outcome in excess of the null model. For both panels interquartile ranges of the distribution of 1,000 PRESS values is shown.
Figure 3:

Similar to Figure 2, but with Vocabulary as the cognitive outcome.
Fluid reasoning was better predicted by volumes, followed by the combination of all brain modalities. Vocabulary was better predicted by the combination of all brain modalities, followed by the volumes. Overall, fluid reasoning was much better predicted than vocabulary.
Relation of age to prediction success
We also computed bivariate correlations of the PRESS and normalized PRESS value with the mean age in the replication samples for both simulations (Table 2). While there were no findings for vocabulary, fluid reasoning presented significant correlations for most marker modalities: PRESS was negatively associated with the mean sample age for all 6 predictions. However, for a majority of the predictions, the normalized PRESS value was associated positively with mean sample age. The null-models themselves become better as the replication sample age increases. Thus, while the apparent predictive utility improves overall for older participants, the information gleaned relative to null-models becomes less for older participants.
Distribution of PC-sets in split sample simulations
Figure 4 shows the distribution of the maximum PC-index, K, of the chosen PC-sets 1:K in our split sample simulations.
Consistent with the better prediction success for fluid reasoning, the figure shows better variance concentration with lower K values for fluid reasoning compared to vocabulary for all modalities but the deep gray-matter ROIs. Performing PCA/SSM in the training sample often involves a larger set of PCs for vocabulary compared to fluid reasoning, resulting in fewer robust loadings, and worse out-of-sample replication.
Discussion
We aimed to examine the association of an optimized set of brain-structural modalities with cognition in cognitively healthy adults across a wide age range. We took a step further and examined how well these associations predicted performance in out-of-sample data. Our paper put the practical aim of prediction in held out data, i.e. inference of unknown from known information with models estimated in prior data, directly to the test. Surface and deep gray matter volumes accounted for most of the variance in both cognitive domains. In the out of sample analyses, both fluid reasoning and vocabulary performance were best predicted by volumes and the combination of all brain modalities, respectively.
For both cognitive domains, it is noteworthy that out of all the individual brain modalities, volume had the highest predictive utility. Existing longitudinal studies highlight the importance of the brain volume to predict cognition, MCI or Dementia in older adults (Csernansky et al. 2005). Regional brain atrophy in MCI patients has been also reported to predict subsequent cognitive decline (Fan et al. 2008). Thus, our results are in accordance with prior investigations regarding the strong association between brain volume and cognition.
Somewhat surprising was the finding that predictive success (implying low PRESS) in the held out data was positively associated with the mean age in the replication sample. We had anticipated the exact opposite: for example, younger adults have more robust networks and more efficient connectivity than the older ones (Bishop, Lu, and Yankner 2010; Stern et al. 2005). Moreover, we hypothesized that older adults may be using unique strategies honed over a lifetime, contrary to the more anatomically specific younger brains. This finding stresses the importance of normalizing PRESS using null prediction data. The information provided above and beyond the null-prediction, as quantified by normalized PRESS, was lower for higher mean sample age. Pragmatically, predicting cognition for older participants might work better than for younger participants, although not because of better mechanistic information provided by the biomarkers, but by virtue of generically reduced variance that would be even be true for null-data. The same consideration might recommend always forming multimodal averages of predictions as in the “all brain” vote: from Figures 2 and 3, the reader can appreciate that both PRESS values (i.e. for the correct and randomized neuro-cognitive assignment in the training sample) are the lowest for the vote-predictor, hinting that averaging null-predictions might appear to be advantageous if it results in reduced variance.
Concerning prediction success by cognitive outcome, we observed that crystallized cognition was less well determined by structural brain markers than fluid cognition. Cognitive reserve which has been significantly associated with ageing (Stern 2009, 2012; Tucker and Stern 2011) could play a significant role to this association. We could hypothesize that cognitive reserve is more active in cognitively normal older adults than the younger ones, “suppressing” the predictive cognitive ability of brain factors. Thus, we would expect that factors such as education, IQ, and occupational attainment would have greater cognitive predictive utility in the older population. Finally, although our sample consisted of cognitively normal participants without any major neurological or psychiatric diseases, it is possible that non-pathological or pathological brain changes related to normal ageing could increase brain and cognitive variability in older brains, limiting the out-of-sample predictability. Our analytic framework assumes group-invariant patterns. Increased inter-individual variability makes this assumption, and thus successful prediction, less tenable.
The cross-sectional design is a main limitation of the study. Furthermore, the choice of the specific predictors neglected other, potentially also important factors (i.e. personality or genetics) with likely contributions to cognitive variation. Our approach, while aimed at out-ofsample replication with a simple-enough multivariate approach, is not very common in the literature. It is conceivable that further Machine-Learning sophistication could achieve better prediction in out-of-sample data, using combinations and whole ensembles of possibly deeper learning architectures, rather than employing a flat ‘one shot’ PCA-regression. For the current paper, we wanted to lay the ground with this simple non-iterative analytic framework, using a technique of familiarity in our lab.
Lastly, the sample size for each analytical group was relatively small, limiting the statistical power of the analyses.
To our knowledge, this is the first study to examine a variety of brain modalities and derive patterns to explain maximal variance in cognition. Here, rather than focusing on predefined brain features, we examined the association between BMs and cognition by tailoring best-predicting patterns from an expanded number of BMs in a flexible, yet inferentially rigorous, manner. We are also not aware of another study that included tests of predictive utility in held-out data. We also carefully used two cognitive domains instead of one general cognitive score, so that they better reflect both fluid and crystallized intelligence.
Both fluid and crystallized intelligence can be predicted in held-out data by an optimized set of brain morphometric markers, in cognitively healthy adults. The achieved prediction success is compellingly larger than what would be achieved a random “null” prediction. The added diagnostic potential of such multi-modal markers for upcoming cognitive decline or neurodegeneration will have to be assessed rigorously in longitudinal data sets. Our study marks a first step in this regard.
Table 3:
Bivariate correlations between PRESS, normalized PRESS and mean age in the replication sample.
| Fluid Reasoning prediction | ||
| PRESS | Normalized PRESS | |
| Volumes | −0.1287 *** | 0.1092 ** |
| Thickness | −0.1308 *** | 0.1450 *** |
| Tract-based measures | −0.1941 *** | 0.0295 |
| Deep Gray-Matter ROIs | −0.1310 *** | 0.1274 ** |
| WMH | −0.2090 *** | 0.0702 ** |
| All brain | −0.1936 *** | 0.1301 *** |
| Vocabulary prediction | ||
| Volumes | 0.0207 | −0.0317 |
| Thickness | 0.0547 | −0.0059 |
| Tract-based measures | 0.0168 | −0.0396 |
| Deep Gray-Matter ROIs | 0.0413 | 0.0428 |
| WMH | 0.0172 | −0.0661 * |
| All brain | 0.0296 | −0.0277 |
p<0.05,
p<0.001,
p<0.0001
Optimized prediction of cognition – Highlights –Reviewed.
Specific brain measurements account for a great variance of cognition across the adult age range
We can predict cognition in out-of-sample data in cognitive normal adults
This prediction can be done to various degrees for both fluid and crystallized intelligence
Acknowledgments
Funding
This work was supported: by The National Institute of Health (NIH)/ National Institute of Aging (NIA) [Grant numbers: R01 AG026158 and RF1 AG038465].
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Disclosure statement
The authors have nothing to disclose.
Optimized prediction – Credit author statement
Angeliki Tsapanou: conceptualization, methodology, writing
Yaakov Stern: conceptualization, methodology, writing
Christian Habeck: conceptualization, methodology, writing, statistical analyses, overall supervision
References
- Bishop NA, Lu T, and Yankner BA. 2010. ‘Neural mechanisms of ageing and cognitive decline’, Nature, 464: 529–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blessed G, Tomlinson BE, and Roth M. 1968. ‘The association between quantitative measures of dementia and of senile change in the cerebral grey matter of elderly subjects’, Br J Psychiatry, 114: 797–811. [DOI] [PubMed] [Google Scholar]
- Wechsler D, 1997. ‘Wechsler Adult Intelligence Scale, 3rd edn., San Antonio, TX: Harcourt Assessment, pp. 684–690. ‘. [Google Scholar]
- Wechsler D, 2001. ‘The Wechsler Test of Adult Reading (WTAR): Test Manual. San Antonio, TX: Psychological Corporation’. [Google Scholar]
- Desikan RS, Segonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL, Dale AM, Maguire RP, Hyman BT, Albert MS, and Killiany RJ. 2006. ‘An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest’, Neuroimage, 31: 968–80. [DOI] [PubMed] [Google Scholar]
- Duchesne S, Caroli A, Geroldi C, Collins DL, and Frisoni GB. 2009. ‘Relating one-year cognitive change in mild cognitive impairment to baseline MRI features’, Neuroimage, 47: 1363–70. [DOI] [PubMed] [Google Scholar]
- Duchesne S, Caroli A, Geroldi C, Frisoni GB, and Collins DL. 2005. ‘Predicting clinical variable from MRI features: application to MMSE in MCI’, Med Image Comput Comput Assist Interv, 8: 392–9. [DOI] [PubMed] [Google Scholar]
- Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, van der Kouwe A, Killiany R, Kennedy D, Klaveness S, Montillo A, Makris N, Rosen B, and Dale AM. 2002. ‘Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain’, Neuron, 33: 341–55. [DOI] [PubMed] [Google Scholar]
- Fischl B, Salat DH, van der Kouwe AJ, Makris N, Segonne F, Quinn BT, and Dale AM. 2004. ‘Sequence-independent segmentation of magnetic resonance images’, Neuroimage, 23 Suppl 1: S69–84. [DOI] [PubMed] [Google Scholar]
- Grober E, and Sliwinski M. 1991. ‘Development and validation of a model for estimating premorbid verbal intelligence in the elderly’, J Clin Exp Neuropsychol, 13: 933–49. [DOI] [PubMed] [Google Scholar]
- Habeck CG 2010. ‘Basics of multivariate analysis in neuroimaging data’, J Vis Exp. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Habeck C, Gazes Y, Razlighi Q, Steffener J, Brickman A, Barulli D, Salthouse T, and Stern Y. 2016. ‘The Reference Ability Neural Network Study: Life-time stability of reference-ability neural networks derived from task maps of young adults’, Neuroimage, 125: 693–704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Habeck C, Razlighi Q, Gazes Y, Barulli D, Steffener J, and Stern Y. 2017. ‘Cognitive Reserve and Brain Maintenance: Orthogonal Concepts in Theory and Practice’, Cereb Cortex, 27: 3962–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Habeck C, Stern Y, and Initiative Alzheimer’s Disease Neuroimaging. 2010. ‘Multivariate data analysis for neuroimaging data: overview and application to Alzheimer’s disease’, Cell Biochem Biophys, 58: 53–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hedden T, Schultz AP, Rieckmann A, Mormino EC, Johnson KA, Sperling RA, and Buckner RL. 2016. ‘Multiple Brain Markers are Linked to Age-Related Variation in Cognition’, Cereb Cortex, 26: 1388–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li P, Tsapanou A, Qolamreza RR, and Gazes Y. 2018. ‘White matter integrity mediates decline in age-related inhibitory control’, Behav Brain Res, 339: 249–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mattis S 1988. ‘Dementia Rating Scale (DRS)’, Psychological Assessment Resources, Odessa, FL. [Google Scholar]
- Moeller JR, Strother SC, Sidtis JJ, and Rottenberg DA. 1987. ‘Scaled subprofile model: a statistical approach to the analysis of functional patterns in positron emission tomographic data’, J Cereb Blood Flow Metab, 7: 649–58. [DOI] [PubMed] [Google Scholar]
- Razlighi QR, Habeck C, Barulli D, and Stern Y. 2017. ‘Cognitive neuroscience neuroimaging repository for the adult lifespan’, Neuroimage, 144: 294–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmidt P, Gaser C, Arsic M, Buck D, Forschler A, Berthele A, Hoshi M, Ilg R, Schmid VJ, Zimmer C, Hemmer B, and Muhlau M. 2012. ‘An automated tool for detection of FLAIR-hyperintense white-matter lesions in Multiple Sclerosis’, Neuroimage, 59: 3774–83. [DOI] [PubMed] [Google Scholar]
- Stern Y 2009. ‘Cognitive reserve’, Neuropsychologia, 47: 2015–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stern Y 2012. ‘Cognitive reserve in ageing and Alzheimer’s disease’, Lancet Neurol, 11: 1006–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stern Y, Habeck C, Moeller J, Scarmeas N, Anderson KE, Hilton HJ, Flynn J, Sackeim H, and van Heertum R. 2005. ‘Brain networks associated with cognitive reserve in healthy young and old adults’, Cereb Cortex, 15: 394–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stern Y, Habeck C, Steffener J, Barulli D, Gazes Y, Razlighi Q, Shaked D, and Salthouse T. 2014. ‘The Reference Ability Neural Network Study: motivation, design, and initial feasibility analyses’, Neuroimage, 103: 139–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strother SC, Anderson JR, Schaper KA, Sidtis JJ, Liow JS, Woods RP, and Rottenberg DA. 1995. ‘Principal component analysis and the scaled subprofile model compared to intersubject averaging and statistical parametric mapping: I. “Functional connectivity” of the human motor system studied with [15O]water PET’, J Cereb Blood Flow Metab, 15: 738–53. [DOI] [PubMed] [Google Scholar]
- Tsapanou A, Habeck C, Gazes Y, Razlighi Q, Sakhardande J, Stern Y, and Salthouse TA. 2019. ‘Brain biomarkers and cognition across adulthood’, Hum Brain Mapp. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tucker AM, and Stern Y. 2011. ‘Cognitive reserve in aging’, Curr Alzheimer Res, 8: 354–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y, Fan Y, Bhatt P, and Davatzikos C. 2010. ‘High-dimensional pattern regression using machine learning: from medical images to continuous clinical variables’, Neuroimage, 50: 1519–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yendiki A, Panneck P, Srinivasan P, Stevens A, Zollei L, Augustinack J, Wang R, Salat D, Ehrlich S, Behrens T, Jbabdi S, Gollub R, and Fischl B. 2011. ‘Automated probabilistic reconstruction of white-matter pathways in health and disease using an atlas of the underlying anatomy’, Front Neuroinform, 5: 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
