Skip to main content
Human Brain Mapping logoLink to Human Brain Mapping
. 2025 Aug 18;46(12):e70319. doi: 10.1002/hbm.70319

Predicting Real‐Life Cognitive Scores From Functional Connectivity

Maya Kadushin 1,2, Asaf Madar 1,2, Niv Tik 1,2, Michal Bernstein‐Eliav 1,2, Ido Tavor 1,2,
PMCID: PMC12358942  PMID: 40820961

ABSTRACT

Over the past decade, functional connectivity patterns, derived from functional magnetic resonance imaging (fMRI), have been widely used to predict various cognitive traits. However, most studies have focused on measures assessed under controlled laboratory conditions, which may not fully reflect the complexity of the traits in real‐world environments. In this study, we investigated connectome‐based predictions of cognitive performance in ecologically valid, real‐world settings. Participants (n = 194) performed the Psychometric Entrance Test, a standardized exam used for admission to higher education institutions in Israel and a strong predictor of undergraduate academic success. Using functional connectivity patterns, we significantly predicted overall test performance, as well as its three cognitive‐specific domains: quantitative reasoning, verbal reasoning, and proficiency in a foreign language. Significant predictions were consistent across four different prediction approaches, demonstrating the robustness of the relations between functional connectivity and cognition. Additionally, we examined which connectivity features mostly contributed to predictions, analyzing both edge‐ and node‐level contributions. We found that different cognitive abilities (i.e., quantitative skills vs. language‐related skills) were primarily predicted by unique connectivity patterns. Yet, predictive features were more similar for scores that were more strongly correlated at the behavioral level. Last, we implemented a transfer learning approach in which the predicted cognitive‐specific scores were used as features for prediction of the global score, resulting in an improved prediction compared to that derived directly from functional connectivity. Overall, our results demonstrate that the functional connectome captures real‐world variability in both global and domain‐specific cognitive abilities, emphasizing its potential to serve as an objective marker of real‐world cognitive performance.

Keywords: cognitive abilities, ecological validity, functional connectivity, language skills, predictive modeling, quantitative skills


Resting‐state functional connectivity can be used to predict real‐world performance, including global and domain‐specific scores. Predictions were consistent across prediction approaches. Distinct connectivity patterns primarily supported different cognitive abilities, yet more behaviorally correlated scores showed more similar features, suggesting both unique and shared neural mechanisms underlying diverse cognitive functions.

graphic file with name HBM-46-e70319-g004.jpg


Summary.

  • Real‐life cognitive abilities were significantly predicted from functional connectivity. Successful predictions were achieved for global cognitive performance, as well as cognitive‐specific skills.

  • Distinct cognitive abilities (e.g., quantitative vs. language‐related skills) were primarily predicted by unique connectivity patterns. Yet, predictive features were more similar for scores that were more strongly correlated behaviorally; indicating both domain‐specific and shared neural mechanisms.

  • Using a transfer learning approach in which the predicted cognitive‐specific scores were used as features for prediction of the global score improved prediction accuracy compared to direct predictions from functional connectivity.

1. Introduction

One of the key challenges in cognitive neuroscience is understanding how individual differences in brain function drive variability in real‐world behavior. Over the past decade, functional connectivity patterns, derived from functional magnetic resonance imaging (fMRI), have emerged as a central player in the study of brain‐behavior relationships. These patterns have been used to predict a wide range of cognitive traits, including fluid intelligence (Dubois et al. 2018; Finn et al. 2015; Gal et al. 2022), sustained attention (Rosenberg et al. 2015, 2020), creative thinking (Beaty et al. 2018), decision‐making qualities (Cai et al. 2020; Madar et al. 2024), and specific educational skills, such as literacy and numeracy (Tomoya Nakai et al. 2024).

A gold standard for such brain‐based predictions is the ability to predict real‐world outcomes, offering potential for practical applications, such as predicting learning trajectories and informing personalized interventions or clinical practices (Gabrieli et al. 2015; Rosenberg et al. 2018; Tomoya Nakai et al. 2024). Achieving this objective requires careful consideration of the behavioral targets of prediction, a step that is sometimes underappreciated (Rosenberg and Finn 2022). To date, most studies have focused on predicting cognitive traits that have been assessed under constrained laboratory settings. While such measures are valuable for isolating specific cognitive processes, they may fall short when predicting outcomes in the complex and variable contexts of everyday life. Moreover, some laboratory‐based tasks have been shown to yield relatively low inter‐individual variability, limiting their utility for studying individual differences in cognitive performance (Hedge et al. 2018). Consequently, optimal targets for connectivity‐based prediction should be both ecologically valid and sensitive to individual variability, for example, school grades, university entrance exam scores, or other real‐world indicators of cognitive performance.

Previous attempts to predict real‐life cognitive abilities from brain imaging‐derived measures have mainly focused on task‐based fMRI, where brain activity was measured in response to tasks directly related to the evaluated skills (Cantlon and Li 2013; Cetron et al. 2019; Meshulam et al. 2021). For example, predictions of grades in a computer science course were based on the alignment between an individual's neural activity while watching course lectures and the activity of other learners or computer science experts (Meshulam et al. 2021). Although task‐based fMRI has been shown to be useful for predicting behavioral traits (Finn 2021; Greene et al. 2018; Jiang et al. 2020), it is not always feasible, particularly when working with challenging populations, such as children or patients. Resting‐state fMRI, by contrast, is easier to acquire, to standardize across sites and populations, and less susceptible to confounds related to task performance or participant motivation. These advantages position resting‐state fMRI as a promising approach for investigating the real‐life applicability of brain‐based predictions of learning outcomes and academic predispositions. However, while strong links between connectivity patterns and laboratory‐evaluated cognitive performance have been demonstrated (Dubois et al. 2018; Finn et al. 2015; Gal et al. 2022), the predictability of real‐life cognitive abilities from resting‐state connectivity has been relatively underexplored.

In this study, we used resting‐state functional connectivity from 194 participants to predict real‐world performance on the Psychometric Entrance Test, a standardized exam used for admission to higher education institutions in Israel (Allalouf et al. 2020), in which ~0.75% of Israel's population are examined each year. We predicted the test's global score as well as its three domain‐specific scores: quantitative reasoning, verbal reasoning, and proficiency in English as a foreign language. Predictions were consistent across four different prediction approaches, demonstrating their robustness. Additionally, we explored the contribution of connectivity patterns to the predictions and compared them across the different scores. While common predictive features were found for the test's global score and each of its cognitive‐specific domains, performance in each cognitive domain was mostly predicted by distinct connectivity patterns. Yet, we found higher similarity in predictive features for scores that were more strongly correlated behaviorally, suggesting a common underlying mechanism. Finally, by applying a transfer learning approach on our predicted cognitive‐specific scores, we further improved the prediction accuracy of the global test score, strengthening the validity of our results. Taken together, our findings establish that connectivity‐based predictions generalize to real‐world context, emphasizing the robustness of the relationship between the functional connectome and diverse cognitive abilities.

2. Materials & Methods

2.1. Participants

We used the data of 223 participants (96 female, M age = 27.22, SD = 5.31) from the Tel Aviv University Strauss Neuroplasticity Brain Bank (SNBB; link to website). Participants were included if they completed a resting‐state fMRI scan and all additional scans required for preprocessing (see neuroimaging data acquisition and preprocessing), and if their Psychometric Entrance Test scores were available, including domain‐specific scores (see behavioral data). Data of 29 participants were excluded due to excessive head motion (> 0.2 mm mean framewise displacement (FD)), resulting in a final cohort of 194 participants (78 female, M age = 27.10, SD = 5.08; see Figure S1 for age distribution).

2.2. Behavioral Data

To predict participants' real‐life cognitive abilities, we collected their Psychometric Entrance Test scores. The Psychometric Entrance Test is used for admission to higher education institutions in Israel. It evaluates applicants' performance in three cognitive domains required for academic studies: quantitative reasoning, verbal reasoning, and proficiency in English as a foreign language (hereafter referred to as the foreign language score) (Allalouf et al. 2020). The global Psychometric Entrance Test score is a weighted combination of these cognitive‐specific scores: 40% quantitative reasoning, 40% verbal reasoning, and 20% foreign language. All participants reported their global score as well as the three domain‐specific scores.

The quantitative reasoning domain estimates the ability to solve mathematical problems and analyze information presented in various forms, such as graphs, tables, and charts. The verbal reasoning score evaluates the ability to analyze complex written material, draw logical conclusions, and discern fine variations in meaning among words and concepts. The foreign language domain assesses comprehension of academic‐level texts in English (for non‐English speakers).

Applicants' performance in each domain is graded on a scale from 50 to 150. The global score is scaled and graded on a range from 200 to 800. Importantly, the grading is performed relative to all psychometric examinees since the very first test was administered in 1983. This ensures that the test date does not affect the score, allowing comparison of examinees' performance across test sessions.

The Psychometric Entrance Test has demonstrated high test–retest reliability, with estimates around 0.9 when the retest occurs within 1 year of the initial assessment. The predictive validity of the test is also noteworthy, with a correlation coefficient of 0.43 with first‐year grade point average (GPA) and 0.41 with GPA at the end of undergraduate studies (Allalouf et al. 2020). Additionally, approximately 90% of examinees complete a preparation course before taking the test (Allalouf et al. 2020), suggesting that Psychometric Entrance Test scores may reflect learning outcomes in real‐life settings.

To evaluate potential associations between test performance and demographic factors in our sample, we conducted the following analyses: first, sex‐related differences in performance were assessed using independent‐samples t‐tests for each of the four psychometric scores, with p‐values FDR‐corrected for four comparisons. In addition, we tested whether performance was correlated with participants' age at the time of the exam or with the time difference between the imaging session and the exam. Correlations' p‐values were also FDR‐corrected for four comparisons.

Outliers in age and psychometric scores were defined as values more than three standard deviations above or below the mean. To ensure that prediction results were not driven by extreme values, we repeated the prediction of each psychometric score after excluding age outliers and outliers of the target score. For the global score, we additionally excluded participants who were outliers in any of the domain‐specific scores, given the dependence between global and domain‐specific performance. We tested whether outlier exclusion significantly affected prediction accuracy using a permutation test (see comparison of model performance). p‐values were FDR‐corrected for 16 comparisons (four target scores × four prediction models).

2.3. Neuroimaging Data Acquisition and Preprocessing

Imaging data was acquired at Tel Aviv University's Strauss Center for Computational Neuroimaging using a 3 T Magnetom Siemens Prisma (Siemens, Erlangen, Germany) scanner with a 64‐channel head coil. All participants underwent a single 10 min (800 timepoints) run of resting‐state fMRI (eyes open, instructed to view a fixation cross and avoid falling asleep). fMRI images were acquired using a T2*‐weighted gradient‐echo echo‐planar protocol (GE‐EPI) with the following parameters: TR/TE = 750/30.8 ms, multiband acceleration factor of 8, 2 mm isotropic voxels, 72 continuous axial slices covering the whole brain. T1‐weighted (T1w) anatomical images were acquired using a 3D magnetization‐prepared rapid acquisition gradient echo (MPRAGE) sequence, with the following parameters: TR/TE = 2400/2.78 ms, 0.9 mm isotropic voxels. T2‐weighted (T2w) anatomical images were acquired using a SPACE (SPC) sequence, with the following parameters: TR/TE = 3200/554, 0.9 mm isotropic voxels.

The functional data were preprocessed using the Human Connectome Project (HCP) minimal preprocessing pipeline (detailed in Glasser et al. 2013). Briefly, the data underwent motion and distortion corrections and nonlinear alignment to the MNI standard space using a combination of FMRIB Software Library (FSL) (Smith et al. 2004), FreeSurfer (Fischl 2012) and additional customized HCP steps. The data were denoised using FMRIB's ICA‐based Xnoiseifier (FIX) (Griffanti et al. 2014; Salimi‐Khorshidi et al. 2014), and then “projected” onto a surface representation of 91,282 vertices (“grayordinates”) in standard space. Surface‐based registration was performed using Multimodal Surface Matching (MSMAll) (Robinson et al. 2014, 2018).

2.4. Functional Connectivity Analysis

To calculate participants' functional connectivity patterns, we first applied a whole‐brain parcellation (Schaefer et al. 2018) on the preprocessed data to divide the cortex into 100 parcels (or 419 parcels; see below). We averaged vertices' time series within each parcel and computed the pairwise Pearson's correlation coefficients between the averaged time series, resulting in a 100 × 100 connectivity matrix for each participant. We then normalized the correlation coefficients using Fisher's z‐transformation. As the connectivity matrices are symmetrical, this analysis resulted in a total of 4950 unique entries (edges).

We chose a relatively sparse parcellation (100 regions) for the main analysis, as a higher number of regions substantially raises the number of edges, which are subsequently used as features for predictions. High feature dimensionality increases the risk of overfitting and also imposes greater computational demands. Yet, to evaluate the impact of atlas granularity on our results and to ensure consistency with the Meta‐Matching model (see Meta‐Matching algorithm in prediction pipeline and statistical analyses), we repeated the analyses with a parcellation of 400 cortical (Schaefer et al. 2018) and 19 subcortical (Fischl et al. 2002) regions.

2.5. Prediction Pipeline and Statistical Analyses

We aimed to predict participants' global psychometric scores, as well as their quantitative reasoning, verbal reasoning, and foreign language scores from resting‐state functional connectivity patterns. To ensure robustness across prediction approaches, we predicted each cognitive score using four prediction models: Linear Ridge Regression (LRR), Kernel Ridge Regression (KRR) (Chen et al. 2022; He et al. 2020), the Brain Basis Set algorithm (BBS) (Sripada et al. 2019, 2020), and the Meta‐Matching algorithm (He et al. 2022). Overall, we trained 16 models (four prediction models × four cognitive scores). All models were implemented using Python's scikit‐learn (Pedregosa et al. 2011).

2.5.1. Linear Ridge Regression and Kernel Ridge Regression

Ridge regression is a linear modeling technique that adds L2 regularization to the cost function to prevent overfitting (Hoerl and Kennard 1970). The strength of L2 regularization is controlled by the hyperparameter λ.

Kernel ridge regression extends ridge regression by incorporating a kernel function, enabling the model to capture more complex, non‐linear relationships between predictor variables and the target variable. Here, we used a radial basis function (RBF) kernel.

2.5.2. Brain Basis Set Algorithm

BBS is a multivariate method developed by Sripada et al. 2019, 2020 to predict behavioral phenotypes from fMRI data. As a first step, it reduces data dimensionality with principal components analysis (PCA) to a predetermined number of components. Secondly, these components are regressed against the individual data of each participant to generate a set of predictive features called “expression scores”. Expression scores are then used to fit a linear regression model to predict the target score. To prevent overfitting, we tuned the number of features (equivalents to the number of PCA components) using cross‐validation, as described below in the model evaluation and hyperparameter tuning section.

2.5.3. Meta‐Matching Algorithm

Meta‐Matching is a transfer‐learning framework that exploits large‐scale datasets for boosting connectivity‐based predictions of new traits in small‐scale datasets (He et al. 2022). In their original work, He and colleagues introduced multiple strategies for implementing the Meta‐Matching framework. We employed their “stacking” approach as it achieved the highest predictive accuracy.

In the first step of the Meta‐Matching procedure, He and colleagues trained a deep neural network (DNN) on the UK Biobank dataset to predict non‐brain‐imaging phenotypes from functional connectivity. The DNN received as input a functional connectivity matrix, derived from a parcellation to 400 cortical (Schaefer et al. 2018) and 19 subcortical (Fischl et al. 2002) regions. The DNN generated predictions for 67 behavioral and physiological phenotypes, which were then used as features in a KRR model to predict new behavioral traits.

Applying the pre‐trained DNN to new data requires using the same input structure as in the original work. Therefore, we could not use the 100‐region connectivity data for the Meta‐Matching algorithm. We applied the pre‐trained DNN on our participants' 419 × 419 connectivity matrices and used the resulting predicted phenotypes to train a KRR model to predict the psychometric scores.

2.5.4. Model Evaluation and Hyperparameter Tuning

We evaluated each prediction model using 10‐fold cross‐validation. Within each fold, we performed hyperparameter tuning using 5‐fold cross‐validation (regularization parameter λ for LRR and KRR, including the KRR of the Meta‐Matching model; number of PCA components in the BBS model). Importantly, hyperparameter tuning (inner cross‐validation) was performed exclusively on the training set data to prevent overfitting. The best hyperparameter was then used to train the model on the entire training set in each fold, and the trained model was applied on the test set to predict the target score. Prior to fitting each prediction model, we scaled connectivity features and behavioral scores across participants, within each fold, using the mean and standard deviation of the training set.

2.5.5. Feature Selection

We included a feature selection step before fitting LRR and KRR. This was unnecessary for the BBS and Meta‐Matching algorithms as they include an inherent dimensionality reduction step. Feature selection was performed by training a Random Forest Regression model (Breiman 2001) and selecting the top 20% features (or 5% when using the parcellation of 419 regions to account for the larger number of features) with the highest importance scores determined by the model. Random forest was chosen because it can capture complex, non‐linear relationships between features and the target variable, which is particularly important for non‐linear models such as KRR. For consistency, the same feature selection method was applied to the LRR model. Importantly, to prevent data leakage, the model was fitted on the training set and the same features were selected for the test set in each fold. Connectivity features were standardized, as described above, both before and after feature selection. In order to ensure the stability of our findings, we repeated the prediction pipeline without feature selection and with different thresholds (10%, 15%, 25%, 30%) and tested whether performance significantly differed from the 20% threshold using a two‐sided paired‐samples permutation test (see comparison of model performance). p‐values were FDR‐corrected for 40 comparisons (five thresholds × four target scores × two prediction models).

2.5.6. Prediction Accuracy and Statistical Significance

We estimated prediction accuracy using two measures: the Pearson's correlation coefficient between actual and predicted scores and the mean squared error (MSE). These measures were calculated on the test data and averaged across folds. Statistical significance of each accuracy measure was evaluated using a one‐sided permutation test: within each fold, we shuffled the predicted scores 10,000 times and calculated the Pearson's correlation coefficient/MSE using the permuted predicted scores, resulting in 10,000 permuted correlation/MSE values for each fold. The permuted correlation/MSE values were then averaged across folds to create a null distribution, against which we then compared the actual correlation/MSE. p‐values were FDR‐corrected for 16 comparisons (four target scores × four prediction models).

2.5.7. Comparison of Model Performance

To assess whether two predictive models significantly differ in their prediction accuracy of the same behavioral target, we performed a two‐sided paired‐samples permutation test. The test was conducted on the prediction accuracies (Pearson's correlation between actual and predicted scores) calculated in the cross‐validation procedure for each model. Since both models were trained and tested on the exact same participants within each fold, the prediction accuracies from corresponding folds were treated as paired samples. We first computed the difference in prediction accuracy between the two models for each fold and averaged them across folds. To generate a null distribution, we randomly flipped the sign of the difference in prediction accuracy within each fold and recalculated the mean difference. This procedure was repeated 5000 times, and the p‐value was computed as the proportion of permuted differences that were greater than or equal to the observed difference in absolute value. p‐values were FDR‐corrected for six comparisons, corresponding to the number of unique pairwise comparisons among the four prediction models.

2.5.8. Feature Importance Analysis

We wished to explore the contribution of different connectivity features to the prediction of each psychometric score. Given the use of four distinct prediction models, which extensively vary in their algorithms, it was essential to employ a method that could be robustly applied across approaches. Therefore, we implemented a permutation‐based feature importance technique. The rationale behind this method is that if a given feature substantially contributes to the prediction, then breaking its relationship with the target variable—by permuting its values across participants—would lead to a decrease in the model's performance. This decrease reflects the feature's importance, such that a larger decrease indicates higher feature importance. Model performance was estimated as the negative mean squared error (NMSE), where higher values indicate better prediction performance.

The importance score of each feature i (FI i ) was quantified by the difference between the baseline NMSE to the permuted NMSE, averaged across K permutations:

FIi=NMSEorig1kk=1KNMSEk,i

Where NMSEorig is the baseline NMSE (using the original dataset), k is the number of permutations, and NMSEk,i is the NMSE after kth of feature i.

The feature importance scores were calculated within each fold and then summed across folds, resulting in a single importance score for each connectivity feature. Critically, this analysis was performed using test set data only, so we could examine which features contributed to the generalization power of the model.

Next, we wished to gain a broader perspective on the functional features driving predictions by shifting the focus from individual edges to entire brain regions (nodes). To achieve this, we calculated the weighted node degree of each brain region, i.e., the sum of the edge importance scores of all edges that include that node, such that nodes with more important edges also get higher importance scores. Feature importance scores were scaled between 0 and 1 for each psychometric score, prior to summation. The resulting weighted node degrees were further scaled between 0 and 1 for each psychometric score for visualization purposes.

Demonstrating the reliability of feature importance scores is vital for their meaningful interpretation. Feature reliability has previously been assessed by examining consistency across bootstrapped samples (Wei et al. 2020). However, implementing this approach was not computationally feasible in our case, as the permutation‐based feature importance method we employed is itself computationally demanding; repeating it across numerous bootstrap samples would have exceeded our computational resources. Instead, following the core idea of assessing agreement across samples, we examined the stability of feature rankings across the ten cross‐validation folds. For each fold, we identified the top 25% of features with the highest importance scores and then calculated how many features consistently appeared in this top tier in at least 80% of the folds. We then calculated the proportion of these features relative to the number expected under perfect reliability (i.e., the same top 25% features appearing across all folds; 1237 features). This analysis was conducted on the feature importance scores calculated using the KRR model for each psychometric score.

2.5.9. Similarity in Feature Importance Across Psychometric Scores

To complement our feature importance analysis, we wished to examine similarity in feature importance across psychometric scores. Specifically, we were interested in whether the different psychometric scores were predicted by similar features, and if so, whether this similarity is higher for scores that are more strongly correlated at the behavioral level. Therefore, we calculated the Pearson's correlation between the feature importance scores of each pair of psychometric scores (i.e., feature importance similarity), as well as between the psychometric scores themselves (i.e., behavioral similarity). Last, we examined the relationship between feature importance similarity and behavioral similarity by calculating the Pearson's correlation between these two measures.

2.5.10. Stacking the Predicted Domain‐Specific Scores to Predict Global Psychometric Scores

As an exploratory analysis, we aimed to examine the relationship between the global psychometric score and the predicted domain‐specific scores. The global psychometric score, as previously described, is a weighted combination of the domain‐specific scores. Thus, a strong association with the predicted domain‐specific scores could indicate the validity of these predictions. Moreover, previous works of Gal et al. (2022) and He et al. (2022) have shown that using a transfer learning approach, predicting an individual trait from other brain‐based predictions, significantly improved accuracy compared to direct predictions from the neural data. Thus, representing the connectome through intermediate predicted scores may capture enriched information which goes beyond the individual's connectivity profile since predicted scores are calculated based on data of a cohort of participants (model's training set data). As such, they may be more sensitive to inter‐individual variability when used as features in a transfer learning approach.

Inspired by their work, we tested whether stacking the predicted domain‐specific scores would improve the prediction of the global psychometric score compared to predictions derived directly from functional connectivity. Each domain‐specific score was predicted from resting‐state functional connectivity using the same prediction pipeline described above. The predicted domain‐specific scores were then used as input for a linear regression model to predict participants' global psychometric scores. We performed this analysis with each of the prediction models; i.e., each model was used to generate predictions of the domain‐specific scores, which were stacked to predict the global score. The performance of the stacking model was then compared to that of the original model predicting the global score directly from functional connectivity. Statistical significance of the difference in performance was assessed using a two‐sided paired‐samples permutation test (see comparison of model performance). p‐values were FDR‐corrected for the number of models tested.

While the stacking strategy allows leveraging meaningful information from the predicted scores, it may also sum prediction errors. To assess this, we examined the association between the accuracy of the stacking model and that of the domain‐specific models. Specifically, we calculated the prediction error for each participant (i.e., the absolute difference between the actual and predicted score) for both the stacking model and each domain‐specific model. We then computed the Pearson's correlation coefficient between the stacking model's prediction errors and those of each domain‐specific model. Positive correlations would suggest that the performance of the stacking model depends on the accuracy of the intermediate predictions, indicating that it may inherit not only signal, but also bias from these predictions.

2.5.11. Examining Covariates' Impact on Prediction Accuracy

We aimed to ensure that connectivity‐based predictions captured variance in cognitive scores beyond that explained by demographic variables. Specifically, we controlled for participants' sex, age at the time of the exam, and the time difference between the imaging session and the exam. To do so, we regressed out each covariate from the target scores prior to prediction and re‐ran the prediction pipeline using these adjusted scores. Sex was dummy coded for this analysis. Importantly, to prevent data leakage, this procedure was performed within each cross‐validation fold by fitting the regression model on the training data and applying it to the test data.

As a complementary analysis, we also examined whether prediction errors from the original models (i.e., without regressing out covariates) were associated with age at the time of the exam or with the time difference between the imaging session and the exam. Prediction error was defined as the absolute difference between actual and predicted scores for each participant. We then computed Pearson's correlation coefficients between these error values and each covariate. p‐values were FDR‐corrected for 32 comparisons (two covariates × four target scores × four prediction models).

3. Results

3.1. Psychometric Entrance Test Scores

All participants reported their global psychometric scores as well as their quantitative reasoning, verbal reasoning, and foreign language scores. The mean and standard deviation of each score are presented in Table 1 (see Figure S1 for the distribution of each score). For reference, we have included the mean and standard deviation of Israel's general population between the years 2014–2023, as reported by Israel's National Institute for Testing and Evaluation.

TABLE 1.

Mean and Standard Deviation of Psychometric Scores.

Score Our sample General population
Global 686.51 (60.02) 548.27 (108.15)
Quantitative reasoning 132.01 (14.79) 110.00 (21.03)
Verbal reasoning 131.67 (12.22) 107.34 (20.46)
Foreign language 136.45 (14.60) 107.52 (24.42)

Note: Mean and standard deviation of global psychometric scores as well as the quantitative reasoning, verbal reasoning and foreign language scores in our study sample and Israel's general population between the years 2014–2023.

We next tested whether psychometric performance was associated with covariates. We found that males scored significantly higher than females on the global psychometric score (P FDR = 0.0364), as well as in the quantitative reasoning (P FDR = 0.0003) and foreign language domains (P FDR = 0.0381). While females had higher scores in verbal reasoning, the difference was not statistically significant (P FDR > 0.05). The mean and standard deviation of each score are reported separately for each sex in Table S1. No significant correlations were found between the psychometric scores, whether global or domain‐specific, and participants' age at the time of the exam or the time difference between the imaging session and the exam (P FDR > 0.05). To further account for these covariates, we re‐ran the prediction pipeline after regressing out each of these covariates from the target scores (see examining covariates' impact on prediction accuracy).

We note that our sample consists of participants with substantially higher scores and lower variance compared to the general population. This may limit our ability to predict individual differences. Nevertheless, achieving accurate predictions despite this challenge suggests that functional connectivity is closely linked to real‐life cognitive abilities and may discriminate between individuals even at the higher end of the cognitive ability spectrum.

3.2. Connectome‐Based Predictions of Real‐Life Cognitive Scores

Our main goal was to examine the ecological validity of connectome‐based predictions by predicting participants' real‐world performance in the Psychometric Entrance Test from resting‐state functional connectivity. We aimed to predict the test's global score as well as its domain‐specific scores (quantitative reasoning, verbal reasoning, and foreign language). To ensure robustness, we trained four different prediction models to predict each score. Prediction accuracy was evaluated using two measures: Pearson's correlation coefficient between actual and predicted scores (Figure 1 and Figure S2) and MSE (Figure S3). Statistical significance was assessed for each measure using a permutation test.

FIGURE 1.

FIGURE 1

Prediction accuracy of Psychometric Entrance Test scores from resting‐state functional connectivity. Pearson's correlation between actual and predicted scores across four prediction models: LRR, KRR, BBS and Meta‐Matching algorithm. Results are shown for the global (blue), quantitative reasoning (orange), verbal reasoning (yellow) and foreign language (green) scores. Error bars represent the standard error across cross‐validation folds. p‐values are FDR‐corrected for 16 comparisons. *P FDR < 0.05, **P FDR < 0.005, ***P FDR < 0.0005.

All psychometric scores were significantly predicted across prediction approaches. The predictions of the global psychometric score, as well as the quantitative reasoning and foreign language scores, were robust across all four prediction models (global: P FDR ≤ 0.002, quantitative: P FDR < 0.0001, foreign language: P FDR ≤ 0.015). Verbal reasoning scores were significantly predicted by three out of the four models (LRR, KRR and Meta‐Matching; PFDR0.043). No significant differences in prediction accuracy were found between models for any of the scores (PFDR>0.05).

Consistent results were obtained when using the 419‐region parcellation (Figure S4), with all psychometric scores significantly predicted across all four models (global: PFDR0.025, quantitative: PFDR0.0008, verbal reasoning: PFDR0.047, foreign language: PFDR0.031). Additionally, prediction accuracy remained stable (no significant differences) across different feature selection thresholds (Figure S5). Taken together, these results demonstrate the robustness of our results to analytical variability.

Most predictions also remained significant after controlling for participants' sex, age at the time of the exam, and the time difference between the imaging session and the exam (Figures S6–S8). Predictions that did not remain significant were primarily for the verbal reasoning score, which had the lowest prediction accuracy out of the four scores. Nevertheless, all cognitive scores were significantly predicted by at least one of the models after controlling for each covariate. Moreover, no correlation was found between prediction error (the absolute difference between the actual and predicted scores) and participants' age at the time of the exam as well as the time difference between the imaging session and the exam (PFDR>0.05).

Due to limited behavioral variance (see Psychometric Entrance Test scores) in our sample, outliers were not excluded from the main analysis. Although outliers' exclusion resulted in higher or lower prediction accuracies for some scores (see Figure S9), removing them did not significantly change predictions' accuracies compared to those reported in the main text (PFDR>0.05).

Having established the robustness of our predictions, we turned to compare model performance across the different cognitive domains. Interestingly, we found the quantitative reasoning scores were most accurately predicted across all models, whereas the verbal reasoning scores were consistently the least accurately predicted. We hypothesize this pattern may reflect differences in the cognitive demands required by each domain as assessed in the test. For instance, the verbal reasoning domain might involve more diverse and distributed cognitive processes than quantitative reasoning, making it harder to predict. To better understand these differences, we investigated the connectivity patterns underlying the prediction of each cognitive score.

3.3. Feature Importance

We wished to identify the connectivity patterns that contributed the most to the successful prediction of each psychometric score and to determine whether the predictions of different scores relied on common or distinct connectivity patterns. First, we examined whether the global test score shared predictive features with its component scores (i.e., domain‐specific scores) to assess whether the behavioral association between scores is reflected by their predictive features. Second, we examined the similarity between the predictive features of the different cognitive‐specific scores, which may explain their variability in prediction success and provide insights into the shared and unique connectivity patterns underlying different cognitive abilities.

Feature importance scores were calculated using a permutation‐based technique (see feature importance analysis) and were examined at both the edge and node levels. To maintain figure clarity, the top 10 most contributing edges are presented in Figure 2 and the top 50 edges (approximately 1% of all edges) in Figure S10. The most contributing brain regions (nodes) are shown in Figure 3. Results presented in the main text are based on the KRR model as it achieved the best or second‐best prediction accuracy for most psychometric scores. While Meta‐Matching performed best for verbal reasoning and foreign language scores, applying the permutation‐based feature importance technique to this model was computationally infeasible due to its high complexity and the large number of features (87,571 edges). Feature importance results for the LRR and BBS models are provided in the Figures S10–S13.

FIGURE 2.

FIGURE 2

Edge‐level feature importance. Feature importance scores were calculated using a permutation‐based feature importance method. Results are shown here for the KRR model. Black lines represent the ten most contributing edges to the prediction of global, quantitative reasoning, verbal reasoning and as foreign language scores. Brain regions are colored according to the seven canonical resting‐state networks (Thomas Yeo et al. 2011). LH, left hemisphere; RH, right hemisphere.

FIGURE 3.

FIGURE 3

Node‐level feature importance. Node‐level feature importance was defined as the weighted node degree (sum of the edge‐level importance scores of all edges that include that node). Results are shown here for the KRR model. Importance scores are projected on a brain surface for each psychometric score. Darker colors indicate higher contribution. A, anterior; LH, left hemisphere; RH, right hemisphere; P, posterior.

3.3.1. Shared Predictive Features Between Global and Domain‐Specific Cognitive Scores

The top ten edges that contributed the most to the prediction of the global psychometric score were widely distributed across the cortex. Yet, they mostly involved nodes of the dorsal attention, ventral attention, and default mode networks. Specifically, these edges were comprised of the intra‐network connectivity of the dorsal attention and the default mode networks, as well as the inter‐network connectivity involving the dorsal attention and ventral attention networks. The observation that the most predictive edges primarily involved these specific functional networks may imply their importance for general cognitive abilities that span across domains.

Notably, most of these edges were also ranked among the top ten most contributing edges of the different cognitive‐specific scores. Consistently, we found common regions among the ten most contributing nodes to the prediction of the global score and each cognitive‐specific score. For example, bilateral regions in the intraparietal sulcus and inferior parietal lobule were among the top ten most important regions to the prediction of both global and quantitative reasoning scores, with the left one also shared with the verbal reasoning score. Prefrontal regions in both hemispheres were shared between the global score and the language‐related scores. Additionally, regions in the right superior parietal lobule and the left anterior temporal lobe were found to highly contribute to the prediction of the global, quantitative reasoning, and foreign language scores. See Figure S13 for a detailed list of the top ten most contributing regions to the prediction of each psychometric score.

Comparing feature importance scores across the whole functional connectome (4950 edges) revealed a significant similarity between the global score and all cognitive‐specific scores, with higher similarity to the quantitative reasoning and verbal reasoning domains (r gloabl‐quantitative = 0.27, r global‐verbal = 0.22, p < 0.0001) than to the foreign language domain (r global‐foreign language = 0.15, p < 0.0001). This aligns with the weighted calculation of the global psychometric scores, where greater weight is assigned to the quantitative reasoning and verbal reasoning domains (40% each) compared to the foreign language domain (20%). Similarity between node‐level importance scores (100 regions) was even more striking, with correlations of 0.48–0.5 (p < 0.0001) between the importance scores of the global score and each cognitive‐specific score.

Taken together, these findings demonstrate a substantial similarity between the predictive features of the global psychometric score and those of each cognitive‐specific score, both at the level of individual edges and, to an even greater extent, at the level of entire brain regions. Importantly, while behavioral correlations between the global score and each cognitive‐specific score are somewhat trivial, given that the global score is composed of these scores, there were no constraints that would enforce this similarity in predictive features as predictive models were trained independently. As such, the observed similarity in feature importance may support the validity of the feature importance scores as they capture real‐life associations between scores.

3.3.2. Cognitive‐Specific Scores Were Predicted by Distinct Functional Networks

Although strong similarity was observed between the predictive features of the global score and each domain‐specific score, each specific score was mainly predicted by a unique set of connectivity patterns, suggesting that different cognitive skills are mainly supported by different connectivity circuits.

The prediction of the quantitative reasoning primarily relied on the connectivity of the dorsal attention network. The top ten most contributing edges included both intra‐network edges and inter‐network edges of the dorsal attention network with the default‐mode and ventral attention networks. Moreover, the nodes with the highest weighted node degree mainly included dorsal attention regions, encompassing the bilateral intra‐parietal sulcus and inferior parietal lobule, and the right superior parietal lobule (see Figure S13).

Surprisingly, the most contributing features to the prediction of verbal reasoning and foreign language were relatively distinct, despite both scores being language related. The ten most predictive edges of the verbal reasoning scores were focused on the default mode network, including both intra‐network edges and inter‐network edges with the visual and control networks. The ten most important edges to prediction of the foreign language included different inter‐network edges of the default mode network, as well as inter‐network edges of the dorsal attention and ventral attention networks. Comparison of edges' importance scores at the level of the whole functional connectome revealed relatively low, but still significant, similarity (r = 0.1, p < 0.0001). This might stand in line with the fact that these two scores engage different language‐related skills: the verbal reasoning domain requires high‐level analysis of complex written material in native language, whereas the foreign language domain focus on more straightforward comprehension of foreign language text (Allalouf et al. 2020).

The node‐level feature importance analysis further highlighted the dominant role of the default mode network in prediction of the verbal reasoning scores, and specifically of prefrontal regions. Other important regions to prediction included the left inferior parietal lobule and inferior temporal gyrus, both part of the dorsal attention network, as well as visual regions in both hemispheres. The most contributing nodes for the foreign language score primarily included ventral attention regions in the prefrontal and insular cortices, as well as default mode regions in the anterior temporal lobe and precuneus, all located in left hemisphere, in line with the known left‐lateralization of language (Knecht et al. 2000; Szaflarski et al. 2006) (see Figure S4). Although there was no overlap (i.e., no common regions) in the ten most contributing nodes of these two scores, there was significant similarity in nodes' importance scores at the whole‐brain level (r verbal‐foreign language = 0.32, p = 0.001). This indicate that the overall contribution pattern of brain regions was relatively similar for the verbal reasoning and foreign language scores, which may imply on a common connectivity basis supporting performance in both domains.

Comparing the edges' importance scores between the language‐related scores and quantitative scores revealed the latter was predicted by entirely distinct connectivity patterns (r = 0.02, p > 0.05). Yet, node‐level comparison revealed significant correlations between the contribution patterns of the quantitative reasoning score with each language score (r quantitative‐verbal = 0.22, p = 0.027; r quantitative‐foreign language = 0.24, p = 0.016). This suggests that although predictions of the quantitative reasoning and the language‐related scores were driven by very distinct edges, the involved brain regions were somewhat similar.

To ensure the reliability of these results, we examined the agreement of feature importance scores across the ten cross‐validation folds. Specifically, we identified how many features were ranked among the top 25% most important features in at least 80% of the folds and calculated the proportion of these features relative to the number expected under perfect reliability. For the global score, 583 features (47.13%) met the reliability threshold. Similarly, 678 features (54.81%) were identified for quantitative reasoning, 671 (54.24%) for verbal reasoning, and 721 (58.29%) for foreign language score. Overall, these results indicate a substantial degree of consistency in the most important features across folds, supporting the reliability of the feature importance scores.

3.3.3. Similarity in Feature Importance Reflects Behavioral Similarity

To complement the analysis of the common and distinct patterns that contributed to the prediction of different scores, we sought to explore whether similarity in predictive features corresponds to behavioral similarity. In other words, we tested whether similarity in predictive features is higher for scores that are more strongly correlated. This analysis was conducted for edge‐ and node‐level feature importance, separately.

The edge‐ (Figure 4A, left) and node‐level (Figure 4A, right) analyses depicted similar relationships across psychometric scores. In both cases, highest correlations were observed between the global and each of the domain‐specific cognitive scores.

FIGURE 4.

FIGURE 4

Similarity in feature importance across psychometric scores. Similarity was computed as the Pearson's correlation between feature importance scores for each pair of psychometric scores (A). Analysis was performed both for the edge‐level (left) and the node‐level (right) scores. Psychometric scores that showed higher similarity in their predictive features were more behaviorally correlated (B). FL, foreign language; Qn., quantitative.

Furthermore, both edge‐level and node‐level analyses showed that the predictive features of the two language‐related scores (verbal reasoning and foreign language) were more similar to each other than to the quantitative reasoning score, reflecting behavioral similarity: language‐related scores are more strongly correlated with each other than with the quantitative score. This strengthens the idea of a distributed functional network underlying multiple cognitive abilities, together with distinct networks involved in different cognitive domains.

Overall, the pattern of feature importance similarity across scores mirrors the behavioral relationships between them. To quantify this effect, we calculated the Pearson's correlation between the feature importance similarity and behavioral similarity matrices (lower triangle, excluding the diagonal). The correlation coefficient between these matrices was 0.97 (p = 0.001) for edge‐level feature importance scores and 0.92 (p = 0.009) for node‐level feature importance scores, further emphasizing that our feature importance scores reflect real‐world relationships between scores, thereby strengthening their validity.

3.4. Stacking the Predicted Domain‐Specific Scores Improves the Prediction of Global Psychometric Scores

Having demonstrated strong associations between the predictive features of the global test score and each cognitive‐specific score, we aimed to further explore their relationship. Inspired by previous works of Gal et al. (2022) and He et al. (2022), which highlighted the advantages of transfer learning in brain‐based predictions, we tested whether stacking the predicted domain‐specific scores would enhance the prediction of the global psychometric score compared to predictions derived directly from functional connectivity.

Following the prediction of each domain‐specific score from functional connectivity, we trained a linear regression model to predict the global psychometric scores from that predicted domain‐specific scores. We conducted this analysis for the LRR, KRR, and Meta‐Matching models as they yielded significantly accurate predictions for all domain‐specific scores, whereas the BBS model did not significantly predict the verbal reasoning score. The LRR‐based and Meta‐Matching‐based stacking models yielded more accurate predictions compared to the original models by 13% and 19%, respectively (LRR: r original = 0.240, SE = 0.057, r stacking = 0.271, SE = 0.061; Meta‐Matching: r original = 0.226, SE = 0.037, r stacking = 0.269, SE = 0.062). However, these improvements did not reach statistical significance (P FDR > 0.05). The KRR‐based stacking model was slightly less accurate than the original model (2.4% lower; r original = 0.253, SE = 0.069, r stacking = 0.247, SE = 0.066), and this difference was not statistically significant. Yet, it still outperformed the original LRR and BBS models.

To test whether the performance of the stacking model is associated with the accuracy of its inputs (i.e., the domain‐specific predictions), we calculated the correlations between prediction errors of the stacking model and those of each domain‐specific model. Strong positive correlations were found for all three domain‐specific scores, across all models (LRR: r quantitative = 0.552, r verbal = 0.570, r foreign language = 0.484, KRR: r quantitative = 0.559, r verbal = 0.641, r foreign language = 0.542, Meta‐Matching: r quantitative = 0.555, r verbal = 0.61, r foreign language = 0.451). These results indicate that the performance of the stacking model depends on the accuracy of the intermediate predictions, suggesting that it inherits not only signal but also bias from these predictions. Nevertheless, the improved prediction accuracy of the stacking model observed in two out of three models suggests that the approach ultimately captures more signal than noise.

Overall, these results support the potential advantage of using more complex representations of the connectome to capture brain–behavior associations (Gal et al. 2022; He et al. 2022).

4. Discussion

In this work, we demonstrated that functional connectivity patterns are predictive of cognitive measures as they manifest in real‐world contexts. We successfully predicted participants' performance on the Psychometric Entrance Test, a standardized exam used for admission to higher education institutions in Israel and a strong predictor of undergraduate academic success. Significant predictions were achieved for both global test performance and cognitive‐specific scores, including quantitative reasoning, verbal reasoning, and proficiency in English as a foreign language. Moreover, we showed that predictions are consistent across different parcellations and predictive approaches, demonstrating the robustness of our results to analytical variability. To better understand the neural basis of these predictions, we examined the most contributing connectivity features to the prediction of each score. The prediction of the global psychometric score relied on distributed connectivity patterns that also contributed to the prediction of the different cognitive‐specific scores. Yet, the prediction of each of the cognitive‐specific scores was primarily associated with distinct connectivity patterns. Interestingly, these patterns were more similar for scores that are more correlated with each other at the behavioral level, suggesting that while cognitive abilities are largely supported by unique connectivity patterns, certain connectivity patterns are shared across cognitive domains. Last, leveraging a transfer learning approach, we used the predicted cognitive‐specific scores to enhance the prediction of the global psychometric score, improving accuracy compared to direct predictions from functional connectivity. Together, our findings demonstrate that functional connectivity patterns capture meaningful variability in cognitive performance under naturalistic conditions, supporting the generalizability of connectome‐based predictions to real‐world settings.

While previous studies investigating the predictability of cognitive scores from functional connectivity have primarily focused on measures obtained in controlled laboratory conditions, our work extends this line of research into real‐world contexts. Real‐life settings are inherently more variable, introducing various confounding factors such as environmental distractions, multitasking demands, fluctuating motivation levels, emotional states, and varying social interactions—all of which can influence cognitive performance in ways that are difficult to control or quantify. These sources of variability pose challenges for predictive modeling and may therefore limit the prediction success of real‐life outcomes. Nonetheless, we achieved significantly accurate predictions across all four real‐life cognitive scores. Importantly, prediction accuracies were comparable to those reported in studies using laboratory‐based measures (Dubois et al. 2018; Gal et al. 2022; He et al. 2022; Tomoya Nakai et al. 2024). This suggests that connectivity‐based predictions hold beyond controlled experimental conditions, demonstrating their ecological validity.

Predicting real‐life cognitive performance from neural data is a crucial step toward the integration of neuroimaging methods into practical, real‐world applications. For example, functional connectivity patterns may be used to identify cognitive strengths and weaknesses in educational settings, potentially supporting personalized learning strategies based on an individual's neural profile. In clinical contexts, they could contribute to detecting risks of cognitive decline or tracking responses to cognitive interventions (Tomoya Nakai et al. 2024). While our findings provide preliminary evidence that functional connectivity can be used to assess cognitive performance in real‐world scenarios, prediction accuracies were relatively modest, suggesting that further progress is needed before neuroimaging‐based markers can be reliably implemented, especially considering the high cost and logistical challenges associated with acquiring such data. Nonetheless, we argue that future research should not only aim to refine feature extraction techniques and predictive models but also prioritize the use of ecologically valid prediction targets. Together, these efforts are essential for advancing the long‐term goal of real‐world applicability in brain‐based predictive modeling.

As mentioned, significant predictions were achieved for both overall test performance and the three specific cognitive domains, indicating that the functional connectome captures real‐world variability in a variety of cognitive abilities. Interestingly, quantitative reasoning scores were predicted with the highest accuracy, while verbal reasoning scores were the least accurately predicted by all models. We propose this variation may reflect differences in the cognitive demands required by each domain of the test. For instance, the verbal reasoning domain may engage a broader and more diverse set of cognitive processes, supported by more distributed neural circuits, compared to the quantitative reasoning domain, making it inherently more challenging to predict. This pattern of domain‐specific differences in prediction accuracy may relate to a broader question of how brain‐based predictions might inform the assessment of cognitive traits. In this context, such predictions have been proposed as a potential tool for developing brain‐validated behavioral assessments (Gabrieli et al. 2015). At a conceptual level, there ought to be a strong association between neural measures and behavioral performance; thus, higher brain‐based predictability of a trait could serve as a form of validation for the behavioral measure itself.

Predictions were robust across four widely used prediction models: LRR, KRR (Chen et al. 2022; He et al. 2020) BBS (Gal et al. 2022; Sripada et al. 2019, 2020), and the Meta‐Matching algorithm (He et al. 2022). These models encompass a range of computational complexities, from simple linear regression to the advanced neural network‐based Meta‐Matching approach, further strengthening the robustness of our results. Consistent with previous research (He et al. 2020), the KRR model achieved the highest prediction accuracy for global and quantitative reasoning scores. For the verbal reasoning and foreign language scores, which had overall smaller effect sizes in prediction accuracy, the Meta‐Matching algorithm outperformed the other models. This underlines the value of the Meta‐Matching approach (see Meta‐Matching algorithm in prediction pipeline and statistical analyses) in leveraging large‐scale datasets for discovering relatively small effects in small‐scale datasets (He et al. 2022).

Although we achieved significant predictions for all four psychometric scores, prediction accuracies were modest or low in some cases. First, this may be due to noise inherent in real‐life evaluation. As mentioned above, real‐life performance is influenced by numerous factors that cannot be controlled, introducing variability that can obscure the underlying relationships between brain connectivity and behavior. Second, because the imaging session and the exam did not occur at the same time, the models may have primarily captured trait‐like aspects of cognitive functioning rather than state‐dependent performance, further limiting the variance explained in certain cases.

The global psychometric scores were predicted by widely distributed connectivity across the cortex, primarily involving the dorsal attention, ventral attention, and default mode networks. Interestingly, we observed high overlap in the ten most contributing features of the global psychometric scores and those of the cognitive‐specific scores. This finding was further supported by a substantial similarity in whole‐brain comparison of predictive features, both at the level of individual edges and entire brain regions. It is worth mentioning that although behavioral correlations between the global score and each cognitive specific score are somewhat trivial, as the global score is a combination of these scores, there were no restrictions that would enforce similarity in predictive features, as predictive models for each score were trained independently. This may strengthen the validity of the feature importance scores, as they reflect the ground truth of real‐world behavioral relationships between scores.

While high similarity was observed in feature contribution of global and domain‐specific cognitive scores, a comparison of feature contribution across cognitive domains revealed that different cognitive abilities were mostly predicted by distinct connectivity patterns. However, we observed higher similarity in predictive features between scores that are more correlated with each other. For instance, performance on the verbal reasoning domain was more highly correlated with performance on the foreign language domain than with performance on the quantitative reasoning domain; accordingly, these two language‐related scores were predicted by more similar features than each of them and the quantitative score. This finding might suggest that different cognitive abilities are supported by distinct connectivity patterns, alongside a distributed network of brain regions involved in multiple cognitive abilities.

Both our edge‐level and node‐level feature importance analyses showed that the prediction of the quantitative reasoning scores highly relied on the connectivity of the dorsal attention network. The most contributing regions to the prediction included the bilateral intraparietal sulcus and adjacent regions in the inferior and superior parietal lobules. This stands in line with previous findings showing that the connectivity of these regions is associated with mathematical abilities (Moeller et al. 2015; Tomoya Nakai et al. 2024; Wilkey et al. 2023). For example, the connectivity of these regions has been associated with individual differences in children's arithmetic abilities (Price et al. 2018; Rosenberg‐Lee et al. 2011), and adults' algebraic reasoning skills (Ventura‐Campos et al. 2022), and were also utilized to classify individuals based on their mathematical skills, for instance, with or without dyslexia (Jolles et al. 2016).

The prediction of verbal reasoning scores was mainly driven by the connectivity of the default mode network, which has previously been linked to language processing (Dardo Tomasi and Volkow 2020; Tomoya Nakai et al. 2024). Key regions to prediction included default mode regions, including the bilateral prefrontal cortex and the right precuneus, as well as dorsal attention regions in the left inferior parietal lobule and inferior temporal gyrus. This is consistent with previous studies showing that the connectivity of these regions predicts various language processing abilities (Kristanto et al. 2020; McNorgan 2021; Dardo Tomasi and Volkow 2020). Additionally, we found a high contribution to the prediction of visual regions, which may be due to the visual modality of the test (reading) (McNorgan 2021; D. Tomasi and Volkow 2012).

The most predictive features of the foreign language domain mainly involved the ventral attention, dorsal attention, and default mode networks. For instance, high contribution was observed for ventral attention regions in the left prefrontal and insular cortices, as well as default mode regions in the left anterior temporal lobe and precuneus. The left dominancy of these regions aligns with the well‐established left lateralization of language (Knecht et al. 2000; Szaflarski et al. 2006). Furthermore, activation of these regions has been shown to be indicative of second‐language proficiency (Zhang et al. 2023), and predictive of learning outcomes after 7 days of artificial language training (Feng et al. 2021).

Surprisingly, language regions, such as the left inferior frontal gyrus (Broca's area) and the posterior superior temporal gyrus (Wernicke's area) were not found as main contributors to the successful prediction of the language‐related scores. This does not imply that their activation was not critical for performance in the language‐related domains of the test but rather that their connectivity did not differentiate between participants' performance levels. In other words, while these areas are crucial for language, their connections may not exhibit enough variability to be meaningful for prediction. Additionally, it is possible that higher‐level language abilities, especially required for the verbal reasoning domain, rely more on the integration of information across large‐scale networks rather than on the connectivity of core language regions alone.

Last, following previous studies (Gal et al. 2022; He et al. 2022), we tested whether a transfer learning approach could improve the prediction of the global psychometric score. These studies have shown that predicting an individual behavioral trait from other brain‐based predictions significantly improved accuracy compared to direct predictions from the neural data. Consistent with these findings, stacking the predicted cognitive‐specific scores improved the prediction of the global psychometric score for the LRR‐based and the Meta‐Matching‐based models by 13% and 19%, respectively. In the case of the KRR‐based model, stacking resulted in a slight decrease in accuracy compared to the original KRR model; yet it reached higher accuracy than the LRR and BBS original models. While these improvements in prediction accuracy did not reach statistical significance, they may nonetheless support the validity of our domain‐specific predictions, as they are informative to the prediction of the global score. Notably, while some domain‐specific scores were predicted less accurately than the global score, their combination enhanced the prediction accuracy of the global score. This suggests that the stacking‐based improvement cannot be explained simply by the addition of easier‐to‐predict variables. Instead, the improvement observed in two out of three models supports the idea that representing the connectome via intermediate predicted scores may enhance sensitivity to individual differences and thereby improve prediction accuracy in a transfer learning framework. Therefore, transfer learning approaches, such as stacking intermediate predictions, may be promising both as a means to improve prediction accuracy and in light of the theoretical insight they offer into how distributed brain representations relate to complex behavioral traits.

Several study limitations should be considered when interpreting our results and planning the following examinations. First, our sample consists of participants with substantially higher scores and lower variance compared to the general population (see Table 1). Future explorations may benefit from a more diverse cohort. Still, the ability to achieve significant predictions despite this limited variance highlights the sensitivity of the functional connectome to cognitive abilities, as it successfully discriminates between individuals at the higher end of the cognitive ability spectrum. Second, as mentioned above, the study was conducted retrospectively, with all participants completing the exam prior to their imaging session and with varying time gaps between the two. Controlling for this time gap yielded similar prediction accuracies to those obtained in the original analysis. Yet, future research may explore whether functional connectivity patterns can predict cognitive performance assessed after the imaging session or even forecast future learning outcomes based on functional connectivity patterns acquired prior to the learning process.

To summarize, our findings provide evidence for the generalizability of connectome‐based predictions to real‐life cognitive abilities, highlighting the potential of functional connectivity as a non‐invasive marker of cognitive abilities in real‐life settings.

Ethics Statement

The research protocol was approved by the Institutional Review Board of the Sheba Medical Center (Ramat‐Gan, Israel). All participants signed an informed consent form.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Data S1: hbm70319‐sup‐0001‐supporting_information.docx.

HBM-46-e70319-s001.docx (2.6MB, docx)

Acknowledgments

We gratefully acknowledge the support of the Minducate Science of Learning Research and Innovation Center at the Sagol School of Neuroscience, Tel Aviv University. We also wish to thank Tel Aviv University's Strauss Center for Computational Neuroimaging for establishing the Strauss Neuroplasticity Brain Bank (SNBB), the largest brain bank in Israel, which advances research into brain‐behavior relationships.

Kadushin, M. , Madar A., Tik N., Bernstein‐Eliav M., and Tavor I.. 2025. “Predicting Real‐Life Cognitive Scores From Functional Connectivity.” Human Brain Mapping 46, no. 12: e70319. 10.1002/hbm.70319.

Funding: Funding for this research was provided by the Minducate Science of Learning Research and Innovation Center of the Sagol School of Neuroscience, Tel Aviv University. This work was also supported by a grant from the Tel Aviv University Center for AI and Data Science (TAD).

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

  1. Allalouf, A. , Cohen Y., and Gafni N.. 2020. “Higher Education Admissions Practices in Israel.” In Higher Education Admissions Practices: An International Perspective, 174–189. Cambridge University Press. 10.1017/9781108559607.010. [DOI] [Google Scholar]
  2. Beaty, R. E. , Kenett Y. N., Christensen A. P., et al. 2018. “Robust Prediction of Individual Creative Ability From Brain Functional Connectivity.” Proceedings of the National Academy of Sciences of the United States of America 115, no. 5: 1087–1092. 10.1073/pnas.1713532115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Breiman, L. 2001. “Random Forests.” Machine Learning 45, no. 1: 5–32. [Google Scholar]
  4. Cai, H. , Chen J., Liu S., Zhu J., and Yu Y.. 2020. “Brain Functional Connectome‐Based Prediction of Individual Decision Impulsivity.” Cortex 125: 288–298. 10.1016/j.cortex.2020.01.022. [DOI] [PubMed] [Google Scholar]
  5. Cantlon, J. F. , and Li R.. 2013. “Neural Activity During Natural Viewing of Sesame Street Statistically Predicts Test Scores in Early Childhood.” PLoS Biology 11, no. 1: e1001462. 10.1371/journal.pbio.1001462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cetron, J. S. , Connolly A. C., Diamond S. G., May V. V., Haxby J. V., and Kraemer D. J. M.. 2019. “Decoding Individual Differences in STEM Learning From Functional MRI Data.” Nature Communications 10, no. 1: 2027. 10.1038/s41467-019-10053-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chen, J. , Tam A., Kebets V., et al. 2022. “Shared and Unique Brain Network Features Predict Cognitive, Personality, and Mental Health Scores in the ABCD Study.” Nature Communications 13, no. 1: 2217. 10.1038/s41467-022-29766-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dubois, J. , Galdi P., Paul L. K., and Adolphs R.. 2018. “A Distributed Brain Network Predicts General Intelligence From Resting‐State Human Neuroimaging Data.” Philosophical Transactions of the Royal Society, B: Biological Sciences 373, no. 1756: 20170284. 10.1098/rstb.2017.0284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Feng, G. , Ou J., Gan Z., et al. 2021. “Neural Fingerprints Underlying Individual Language Learning Profiles.” Journal of Neuroscience 41, no. 35: 7372–7387. 10.1523/JNEUROSCI.0415-21.2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Finn, E. S. 2021. “Is It Time to Put Rest to Rest?” Trends in Cognitive Sciences 25, no. 12: 1021–1032. 10.1016/j.tics.2021.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Finn, E. S. , Shen X., Scheinost D., et al. 2015. “Functional Connectome Fingerprinting: Identifying Individuals Using Patterns of Brain Connectivity.” Nature Neuroscience 18, no. 11: 1664–1671. 10.1038/nn.4135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fischl, B. 2012. “FreeSurfer.” NeuroImage 62, no. 2: 774–781. 10.1016/j.neuroimage.2012.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fischl, B. , Salat D. H., Busa E., et al. 2002. “Whole Brain Segmentation: Automated Labeling of Neuroanatomical Structures in the Human Brain.” Neuron 33, no. 3: 341–355. 10.1016/S0896-6273(02)00569-X. [DOI] [PubMed] [Google Scholar]
  14. Gabrieli, J. D. E. , Ghosh S. S., and Whitfield‐Gabrieli S.. 2015. “Prediction as a Humanitarian and Pragmatic Contribution From Human Cognitive Neuroscience.” Neuron 85, no. 1: 11–26. 10.1016/j.neuron.2014.10.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gal, S. , Tik N., Bernstein‐Eliav M., and Tavor I.. 2022. “Predicting Individual Traits From Unperformed Tasks.” NeuroImage 249: 118920. 10.1016/j.neuroimage.2022.118920. [DOI] [PubMed] [Google Scholar]
  16. Glasser, M. F. , Sotiropoulos S. N., Wilson J. A., et al. 2013. “The Minimal Preprocessing Pipelines for the Human Connectome Project.” NeuroImage 80: 105–124. 10.1016/j.neuroimage.2013.04.127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Greene, A. S. , Gao S., Scheinost D., and Constable R. T.. 2018. “Task‐Induced Brain State Manipulation Improves Prediction of Individual Traits.” Nature Communications 9, no. 1: 2807. 10.1038/s41467-018-04920-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Griffanti, L. , Salimi‐Khorshidi G., Beckmann C. F., et al. 2014. “ICA‐Based Artefact Removal and Accelerated fMRI Acquisition for Improved Resting State Network Imaging.” NeuroImage 95: 232–247. 10.1016/j.neuroimage.2014.03.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. He, T. , An L., Chen P., et al. 2022. “Meta‐Matching as a Simple Framework to Translate Phenotypic Predictive Models From Big to Small Data.” Nature Neuroscience 25, no. 6: 795–804. 10.1038/s41593-022-01059-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. He, T. , Kong R., Holmes A. J., et al. 2020. “Deep Neural Networks and Kernel Regression Achieve Comparable Accuracies for Functional Connectivity Prediction of Behavior and Demographics.” NeuroImage 206: 116276. 10.1016/j.neuroimage.2019.116276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hedge, C. , Powell G., and Sumner P.. 2018. “The Reliability Paradox: Why Robust Cognitive Tasks Do Not Produce Reliable Individual Differences.” Behavior Research Methods 50, no. 3: 1166–1186. 10.3758/s13428-017-0935-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hoerl, A. E. , and Kennard R. W.. 1970. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics 12, no. 1: 55–67. 10.1080/00401706.1970.10488634. [DOI] [Google Scholar]
  23. Jiang, R. , Zuo N., Ford J. M., et al. 2020. “Task‐Induced Brain Connectivity Promotes the Detection of Individual Differences in Brain‐Behavior Relationships.” NeuroImage 207: 116370. 10.1016/j.neuroimage.2019.116370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Jolles, D. , Ashkenazi S., Kochalka J., et al. 2016. “Parietal Hyper‐Connectivity, Aberrant Brain Organization, and Circuit‐Based Biomarkers in Children With Mathematical Disabilities.” Developmental Science 19, no. 4: 613–631. 10.1111/desc.12399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Knecht, S. , Deppe M., Dräger B., et al. 2000. “Language Lateralization in Healthy Right‐Handers.” Brain 123, no. 1: 74–81. 10.1093/brain/123.1.74. [DOI] [PubMed] [Google Scholar]
  26. Kristanto, D. , Liu M., Liu X., Sommer W., and Zhou C.. 2020. “Predicting Reading Ability From Brain Anatomy and Function: From Areas to Connections.” NeuroImage 218: 116966. 10.1016/j.neuroimage.2020.116966. [DOI] [PubMed] [Google Scholar]
  27. Madar, A. , Kurtz‐David V., Hakim A., Levy D. J., and Tavor I.. 2024. “Pre‐Acquired Functional Connectivity Predicts Choice Inconsistency.” Journal of Neuroscience 44, no. 18: 1–11. 10.1523/JNEUROSCI.0453-23.2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. McNorgan, C. 2021. “The Connectivity Fingerprints of Highly‐Skilled and Disordered Reading Persist Across Cognitive Domains.” Frontiers in Computational Neuroscience 15: 590093. 10.3389/fncom.2021.590093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Meshulam, M. , Hasenfratz L., Hillman H., et al. 2021. “Neural Alignment Predicts Learning Outcomes in Students Taking an Introduction to Computer Science Course.” Nature Communications 12, no. 1: 1922. 10.1038/s41467-021-22202-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Moeller, K. , Willmes K., and Klein E.. 2015. “A Review on Functional and Structural Brain Connectivity in Numerical Cognition.” Frontiers in Human Neuroscience 9: 227. 10.3389/fnhum.2015.00227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Nakai, T. , Tirou C., and Prado J.. 2024. “From Brain to Education Through Machine Learning: Predicting Literacy and Numeracy Skills From Neuroimaging Data.” Imaging Neuroscience 2: 1–24. 10.1162/imag. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Pedregosa, F. , Varoquaux G., Gramfort A., et al. 2011. “Scikit‐Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–2830. [Google Scholar]
  33. Price, G. R. , Yeo D. J., Wilkey E. D., and Cutting L. E.. 2018. “Prospective Relations Between Resting‐State Connectivity of Parietal Subdivisions and Arithmetic Competence.” Developmental Cognitive Neuroscience 30: 280–290. 10.1016/j.dcn.2017.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Robinson, E. C. , Garcia K., Glasser M. F., et al. 2018. “Multimodal Surface Matching With Higher‐Order Smoothness Constraints.” NeuroImage 167: 453–465. 10.1016/j.neuroimage.2017.10.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Robinson, E. C. , Jbabdi S., Glasser M. F., et al. 2014. “MSM: A New Flexible Framework for Multimodal Surface Matching.” NeuroImage 100: 414–426. 10.1016/j.neuroimage.2014.05.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Rosenberg, M. D. , Casey B. J., and Holmes A. J.. 2018. “Prediction Complements Explanation in Understanding the Developing Brain.” Nature Communications 9, no. 1: 589. 10.1038/s41467-018-02887-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Rosenberg, M. D. , and Finn E. S.. 2022. “How to Establish Robust Brain–Behavior Relationships Without Thousands of Individuals.” Nature Neuroscience 25, no. 7: 835–837. 10.1038/s41593-022-01110-9. [DOI] [PubMed] [Google Scholar]
  38. Rosenberg, M. D. , Finn E. S., Scheinost D., et al. 2015. “A Neuromarker of Sustained Attention From Whole‐Brain Functional Connectivity.” Nature Neuroscience 19, no. 1: 165–171. 10.1038/nn.4179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Rosenberg, M. D. , Scheinost D., Greene A. S., et al. 2020. “Functional Connectivity Predicts Changes in Attention Observed Across Minutes, Days, and Months.” Proceedings of the National Academy of Sciences of the United States of America 117, no. 7: 3797–3807. 10.1073/pnas.1912226117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Rosenberg‐Lee, M. , Barth M., and Menon V.. 2011. “What Difference Does a Year of Schooling Make?. Maturation of Brain Response and Connectivity Between 2nd and 3rd Grades During Arithmetic Problem Solving.” NeuroImage 57, no. 3: 796–808. 10.1016/j.neuroimage.2011.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Salimi‐Khorshidi, G. , Douaud G., Beckmann C. F., Glasser M. F., Griffanti L., and Smith S. M.. 2014. “Automatic Denoising of Functional MRI Data: Combining Independent Component Analysis and Hierarchical Fusion of Classifiers.” NeuroImage 90: 449–468. 10.1016/j.neuroimage.2013.11.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Schaefer, A. , Kong R., Gordon E. M., et al. 2018. “Local‐Global Parcellation of the Human Cerebral Cortex From Intrinsic Functional Connectivity MRI.” Cerebral Cortex 28, no. 9: 3095–3114. 10.1093/cercor/bhx179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Smith, S. M. , Jenkinson M., Woolrich M. W., et al. 2004. “Advances in Functional and Structural MR Image Analysis and Implementation as FSL.” NeuroImage 23, no. SUPPL. 1: 208–219. 10.1016/j.neuroimage.2004.07.051. [DOI] [PubMed] [Google Scholar]
  44. Sripada, C. , Angstadt M., Rutherford S., et al. 2019. “Basic Units of Inter‐Individual Variation in Resting State Connectomes.” Scientific Reports 9, no. 1: 1900. 10.1038/s41598-018-38406-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Sripada, C. , Angstadt M., Rutherford S., Taxali A., and Shedden K.. 2020. “Toward a “Treadmill Test” for Cognition: Improved Prediction of General Cognitive Ability From the Task Activated Brain.” Human Brain Mapping 41, no. 12: 3186–3197. 10.1002/hbm.25007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Szaflarski, J. P. , Holland S. K., Schmithorst V. J., and Byars A. W.. 2006. “fMRI Study of Language Lateralization in Children and Adults.” Human Brain Mapping 27, no. 3: 202–212. 10.1002/hbm.20177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Thomas Yeo, B. T. , Krienen F. M., Sepulcre J., et al. 2011. “The Organization of the Human Cerebral Cortex Estimated by Intrinsic Functional Connectivity.” Journal of Neurophysiology 106, no. 3: 1125–1165. 10.1152/jn.00338.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Tomasi, D. , and Volkow N. D.. 2012. “Resting Functional Connectivity of Language Networks: Characterization and Reproducibility.” Molecular Psychiatry 17, no. 8: 841–854. 10.1038/mp.2011.177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Tomasi, D. , and Volkow N. D.. 2020. “Network Connectivity Predicts Language Processing in Healthy Adults.” Human Brain Mapping 41, no. 13: 3696–3708. 10.1002/hbm.25042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Ventura‐Campos, N. , Ferrando‐Esteve L., and Epifanio I.. 2022. “The Underlying Neural Bases of the Reversal Error While Solving Algebraic Word Problems.” Scientific Reports 12, no. 1: 21654. 10.1038/s41598-022-25442-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Wei, L. , Jing B., and Li H.. 2020. “Bootstrapping Promotes the RSFC‐Behavior Associations: An Application of Individual Cognitive Traits Prediction.” Human Brain Mapping 41, no. 9: 2302–2316. 10.1002/hbm.24947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Wilkey, E. D. , Gupta I., Peiris A., and Ansari D.. 2023. “The Mathematical Brain at Rest.” Current Opinion in Behavioral Sciences 49: 101246. 10.1016/j.cobeha.2022.101246. [DOI] [Google Scholar]
  53. Zhang, R. , Wang J., Lin H., Turk‐Browne N. B., and Cai Q.. 2023. “Neural Signatures of Second Language Proficiency in Narrative Processing.” Cerebral Cortex 33, no. 13: 8477–8484. 10.1093/cercor/bhad133. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1: hbm70319‐sup‐0001‐supporting_information.docx.

HBM-46-e70319-s001.docx (2.6MB, docx)

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.


Articles from Human Brain Mapping are provided here courtesy of Wiley

RESOURCES