Skip to main content
eLife logoLink to eLife
. 2023 Mar 13;12:e85082. doi: 10.7554/eLife.85082

Evidence for embracing normative modeling

Saige Rutherford 1,2,3,, Pieter Barkema 2, Ivy F Tso 3,4, Chandra Sripada 3,5, Christian F Beckmann 1,2,6,, Henricus G Ruhe 2,7,, Andre F Marquand 1,2,
Editors: Chris I Baker8, Chris I Baker9
PMCID: PMC10036120  PMID: 36912775

Abstract

In this work, we expand the normative model repository introduced in Rutherford et al., 2022a to include normative models charting lifespan trajectories of structural surface area and brain functional connectivity, measured using two unique resting-state network atlases (Yeo-17 and Smith-10), and an updated online platform for transferring these models to new data sources. We showcase the value of these models with a head-to-head comparison between the features output by normative modeling and raw data features in several benchmarking tasks: mass univariate group difference testing (schizophrenia versus control), classification (schizophrenia versus control), and regression (predicting general cognitive ability). Across all benchmarks, we show the advantage of using normative modeling features, with the strongest statistically significant results demonstrated in the group difference testing and classification tasks. We intend for these accessible resources to facilitate the wider adoption of normative modeling across the neuroimaging community.

Research organism: Human

Introduction

Normative modeling is a framework for mapping population-level trajectories of the relationships between health-related variables while simultaneously preserving individual-level information (Marquand et al., 2016a; Marquand et al., 2016b; Rutherford et al., 2022b). Health-related variables is an intentionally inclusive and broad definition that may involve demographics (i.e. age and gender), simple (i.e. height and weight), or complex (i.e. brain structure and function, genetics) biological measures, environmental factors (i.e. urbanicity, pollution), self-report measures (i.e. social satisfaction, emotional experiences), or behavioral tests (i.e. cognitive ability, spatial reasoning). Charting the relationships, as mappings between a covariate (e.g. age) and response variable (e.g. brain measure) in a reference population creates a coordinate system that defines the units in which humans vary. Placing individuals into this coordinate system creates the opportunity to characterize their profiles of deviation. While this is an important aspect of normative modeling, it is usually just the first step, i.e., you are often interested in using the outputs of normative models in downstream analyses to detect case-control differences, stratification, or individual statistics. This framework provides a platform for such analyses as it effectively translates diverse data to a consistent scale, defined with respect to population norms.

Normative modeling has seen widespread use spanning diverse disciplines. The most well-known example can be found in pediatric medicine, where conventional growth charts are used to map the height, weight, and head circumference trajectories of children (Borghi et al., 2006). Under the neuroscience umbrella, generalizations of this approach have been applied in the fields of psychiatry (Floris et al., 2021; Madre et al., 2020; Wolfers et al., 2015; Wolfers et al., 2017; Wolfers et al., 2018; Wolfers et al., 2020; Wolfers et al., 2021; Zabihi et al., 2019; Zabihi et al., 2020), neurology (Itälinna et al., 2022; Verdi et al., 2021), developmental psychology (Holz et al., 2022; Kjelkenes et al., 2022), and cognitive neuroscience (Marquand et al., 2017). Throughout these numerous applications, normative models have exposed the shortcomings of prior case-control frameworks, i.e., that they rely heavily on the assumption, there is within-group homogeneity. This case versus control assumption is often an oversimplification, particularly in psychiatric diagnostic categories, where the clinical labels used to place individuals into group categories are often unreliable, poorly measured, and may not map cleanly onto underlying biological mechanisms (Cai et al., 2020; Cuthbert and Insel, 2013; Flake and Fried, 2020; Insel et al., 2010; Linden, 2012; Loth et al., 2021; Michelini et al., 2021; Moriarity et al., 2022; Moriarity and Alloy, 2021; Nour et al., 2022; Sanislow, 2020; Zhang et al., 2021). Correspondingly, traditional analysis techniques for modeling case versus control effects have often led to null findings (Winter et al., 2022) or significant but very small clinically meaningless differences. These effects are furthermore frequently aspecific to an illness or disorder (Baker et al., 2019; Goodkind et al., 2015; McTeague et al., 2017; Sprooten et al., 2017) and inconsistent or contradictory (Filip et al., 2022; Lee et al., 2007; Pereira-Sanchez and Castellanos, 2021) yielding questionable clinical utility (Etkin, 2019; Mottron and Bzdok, 2022).

In addition to the applications of normative modeling, there is also active technical development (Dinga et al., 2021; Fraza et al., 2022; Fraza et al., 2021; Kia et al., 2020; Kia et al., 2021; Kia and Marquand, 2018; Kumar, 2021). Due to the growing popularity of normative modeling and in recognition of the interdisciplinary requirements using and developing this technology (clinical domain knowledge, statistical expertise, data management, and computational demands), research interests have been centered on open science, and inclusive, values (Gau et al., 2021; Levitis et al., 2021) that support this type of interdisciplinary scientific work. These values encompass open-source software, sharing pre-trained big data models (Rutherford et al., 2022a), online platforms for communication and collaboration, extensive documentation, code tutorials, and protocol-style publications (Rutherford et al., 2022b).

The central contribution of this paper is to, first, augment the models in Rutherford et al., 2022a, with additional normative models for surface area and functional connectivity, which are made open and accessible to the community. Second, we comprehensively evaluate the utility of normative models for a range of downstream analyses, including (1) mass univariate group difference testing (schizophrenia versus controls), (2) multivariate prediction – classification (using support vector machines to distinguish schizophrenia from controls), and (3) multivariate prediction – regression (using principal component regression (PCR) to predict general cognitive ability) (Figure 1). Within these benchmarking tasks, we show the benefit of using normative modeling features compared to using raw features. We aim for these benchmarking results, along with our publicly available resources (code, documentation, tutorials, protocols, community forum, and website for running models without using any code). Combined this provides practical utility as well as scientific evidence for embracing normative modeling.

Figure 1. Overview of workflow.

Figure 1.

(A) Datasets included the Human Connectome Project (young adult) study, the University of Michigan schizophrenia study, and the Center for Biomedical Research Excellence (COBRE) schizophrenia study. (B) Openly shared, pre-trained on big data, normative models were estimated for large-scale resting-state functional brain networks and cortical thickness. (C) Deviation (Z) scores and raw data, for both functional and structural data, were input into three benchmarking tasks: 1. group difference testing, 2. support vector machine (SVM) classification, and 3. regression (predicting cognition). (D) Evaluation metrics were calculated for each benchmarking task. These metrics were calculated for the raw data models and the deviation score models. The difference between each models’ performance was calculated for both functional and structural modalities.

Methods

Dataset selection and scanner parameters

Datasets used for training the functional normative models closely match the sample included in Rutherford et al., 2022a, apart from sites that did not collect or were unable to share functional data. Evaluation of the functional normative models was performed in a test set (20% of the training set) and in two transfer sets that are comprised of scanning sites not seen by the model during training (clinical and healthy controls). The full details of the data included in the functional normative model training can be found in Appendix 1 and Supplementary file 1. We leverage several datasets for the benchmarking tasks, the Human Connectome Project Young Adult study (HCP) (Van Essen et al., 2013), The Center for Biomedical Research Excellence (COBRE) (Aine et al., 2017; Sui et al., 2018), and the University of Michigan SchizGaze (UMich) (Tso et al., 2021; Table 1). The HCP data was chosen because it is widely used by the neuroscience community, especially for prediction studies. Also, prior studies using HCP data have shown promising results for predicting general cognitive ability (Sripada et al., 2020a). The HCP data was used in the prediction – regression benchmarking task. The COBRE and UMich datasets are used in the classification and group difference testing benchmarking tasks. Inclusion criteria across all the datasets were that the participant has necessary behavioral and demographic variables, as well as high-quality MRI data. High-quality was defined for structural images as in our prior work (Rutherford et al., 2022a), namely as the lack of any artifacts such as ghosting or ringing, that Freesurfer surface reconstruction was able to run successfully, and that the Euler number calculated from Freesurfer (Klapwijk et al., 2019), which is a proxy metric for scan quality, was below a chosen threshold (rescaled Euler <10) (Kia et al., 2022). High-quality functional data followed recommended practices (Siegel et al., 2017) and was defined as having a high-quality structural MRI (required for co-registration and normalization) and at least 5 min of low motion data (framewise displacement <0.5 mm). The HCP, COBRE, and UMich functional and structural data were manually inspected for quality at several tasks during preprocessing (after co-registration of functional and structural data and after normalization of functional data to MNI template space).

Table 1. Dataset inclusion and sample overview.

Cortical Thickness Functional Networks
Study Benchmark Task N Age
(m, s.d.)
F, M (%) N Age
(m, s.d.)
F, M (%)
HCP Regression – predicting cognition 529 28.8, 3.6 53.4, 46.6 499 28.9, 3.6 54.3, 45.6
COBRE Classification & Group Difference 124 37.0, 12.7 24.2, 75.8 121 35.4, 12.4 23.1, 76.9
UMich Classification & Group Difference 89 32.6, 9.6 50.6, 49.3 87 33.0, 10.1 50.6, 49.3

All subjects provided informed consent. Subject recruitment procedures and informed consent forms, including consent to share de-identified data, were approved by the corresponding university institutional review board where data were collected. The scanning acquisition parameters were similar but varied slightly across the studies, details in Appendix 1.

Demographic, cognition, and clinical diagnosis variables

Demographic variables included age, sex, and MRI scanner site. A latent variable of cognition, referred to as General Cognitive Ability (GCA), was created for the regression benchmarking task using HCP data. The HCP study administered the NIHToolbox Cognition battery (Gershon et al., 2010), and a bi-factor model was fit (for further modeling details and assessment of model fit see Sripada et al., 2020b). For COBRE and UMich studies, clinical diagnosis of schizophrenia was confirmed using the Structured Clinical Interview used for DSM-5 disorders (SCID) (First, 1956). All subjects were screened and excluded if they had: a history of neurological disorder, mental retardation, severe head trauma, or substance abuse/dependence within the last 6 (UMich) or 12 months (COBRE), were pregnant/nursing (UMich), or had any contraindications for MRI.

Image preprocessing

Structural MRI data were preprocessed using the Freesurfer (version 6.0) recon-all pipeline (Dale et al., 1999; Fischl et al., 2002; Fischl and Dale, 2000) to reconstruct surface representations of the volumetric data. Estimates of cortical thickness and subcortical volume were then extracted (aparc and aseg) for each subject from their Freesurfer output folder, then merged, and formatted into a csv file (rows = subjects, columns = brain ROIs). We also share models of surface area, extracted in the same manner as the cortical thickness data from a similar dataset (described in Supplementary file 2).

Resting-state data were preprocessed separately for each study using fMRIPrep Esteban et al., 2019; however, similar steps were done to all resting-state data following best practices including field-map correction of multi-band data, slice time correction (non-multi-band data), co-registration of functional to structural data, normalization to MNI template space, spatial smoothing (2 x voxel size, 4–6 mm), and regression of nuisance confounders (WM/CSF signals, non-aggressive AROMA components [Pruim et al., 2015a; Pruim et al., 2015b], linear and quadratic effects of motion).

Large-scale brain networks from the 17-network Yeo atlas (Yeo et al., 2011) were then extracted and between-network connectivity was calculated using full correlation. We also shared functional normative models using the Smith-10 ICA-based parcellation (Smith et al., 2009) which includes subcortical coverage, however, the benchmarking tasks only use the Yeo-17 functional data. Fisher r-to-z transformation was performed on the correlation matrices. If there were multiple functional runs, connectivity matrices were calculated separately for each run then all runs for a subject were averaged. For further details regarding the preparation of the functional MRI data, see Appendix 1.

Normative model formulation

After dataset selection and preprocessing, normative models were estimated using the Predictive Clinical Neuroscience toolkit (PCNtoolkit), an open-source python package for normative modeling (Marquand et al., 2021). For the structural data, we used a publicly shared repository of pre-trained normative models that were estimated on approximately 58,000 subjects using a warped Bayesian Linear Regression algorithm (Fraza et al., 2021). The covariates used to train the structural normative models included age, sex, data quality metric (Euler number), and site. Normative models of surface area were also added to the same repository Supplementary file 2. Model fit was established using explained variance, mean standardized log loss, skew, and kurtosis. The outputs of normative modeling also include a Z-score, or deviation score, for all brain regions and all subjects. The deviation score represents where the individual is in comparison to the population the model was estimated on, where a positive deviation score corresponds to the greater cortical thickness or subcortical volume than average, and a negative deviation score represents less cortical thickness or subcortical volume than average. The deviation (Z) scores that are output from the normative model are the features input for the normative modeling data in the benchmarking analyses.

In addition to normative models of brain structure, we also expanded our repository by estimating normative models of brain functional connectivity (resting-state brain networks, Yeo-17 and Smith-10) using the same algorithm (Bayesian Linear Regression) as the structural models. The covariates used to train the functional normative models were similar to the structural normative models which included age, sex, data quality metric (mean framewise displacement), and site. Functional normative models were trained on a large multi-site dataset (approx. N=22,000) and evaluated in several test sets using explained variance, mean standardized log loss, skew, and kurtosis. The training dataset excluded subjects with any known psychiatric diagnosis. We transferred the functional normative models to the datasets used in this work for benchmarking (Table 1) to generate deviation (Z) scores. HCP was included in the initial training (half of the sample was held out in the test set), while the UMich and COBRE datasets were not included in the training and can be considered as examples of transfer to new, unseen sites.

“Raw” input data

The data that we compare the output of normative modeling to, referred to throughout this work as ‘raw’ input data, is simply the outputs of traditional preprocessing methods for structural and functional MRI. For structural MRI, this corresponds to the cortical thickness files that are output after running the Freesurfer recon-all pipeline. We used the aparcstats2table and asegstats2table functions to extract the cortical thickness and subcortical volume from each region in the Destrieux atlas and Freesurfer subcortical atlas. For functional MRI, tradition data refers to the Yeo17 brain network connectomes which were extracted from the normalized, smoothed, de-noised functional time-series. The upper triangle of each subject’s symmetric connectivity matrix was vectorized, where each cell represents a unique between-network connection. For clarification, we also note that the raw input data is the starting point of the normative modeling analysis, or in other words, the raw input data is the response variable or independent (Y) variable that is predicted from the vector of covariates when estimating the normative model. Before entering into the benchmarking tasks, to create a fair comparison between raw data and deviation scores, nuisance variables including sex, site, linear and quadratic effects of age and head motion (only for functional models) were regressed out of the raw data (structural and functional) using least squares regression.

Benchmarking

The benchmarking was performed in three separate tasks, mass univariate group difference testing, multivariate prediction – classification, and multivariate prediction – regression, described in further detail below. In each benchmarking task, a model was estimated using the deviation scores as input features and then estimated again using the raw data as the input features. For task one, group difference testing, the models fit in a univariate approach meaning there was one test performed for each brain feature, and for tasks 2 and 3, classification and regression, the models fit in a multivariate approach. After each model was fit, the performance metrics were evaluated and the difference in performance between the deviation score and raw data models was calculated, again described in more detail below.

Task one: Mass univariate group difference testing

Mass univariate group difference (schizophrenia versus control) testing was performed across all brain regions. Two sample independent t-tests were estimated and run on the data using the SciPy python package (Virtanen et al., 2020). After addressing multiple comparison corrections, brain regions with FDR corrected p<.05 were considered significant and the total number of regions displaying statistically significant group differences was counted.

For the purpose of comparing group difference effects to individual differences, we also summarized the individual deviation maps and compare this map to the group difference map. Individual deviation maps were summarized by counting the number of individuals with ‘extreme’ deviations (Z>2 or Z<–2) at a given brain region or network connectivity pair. This was done separately for positive and negative deviations and for each group and visualized qualitatively (Figure 4B). To quantify the individual difference maps in comparison to group differences, we performed a Mann-Whitney U-test on the count of extreme deviations in each group. The U-test was used because the distribution of count data is skewed (non-Gaussian) which the U-test is designed to account for (Mann and Whitney, 1947).

Task two: Multivariate prediction – classification

Support vector machine is a commonly used algorithm in machine learning studies and performs well in classification settings. A support vector machine constructs a set of hyper-planes in a high dimensional space and optimizes to find the hyper-plane that has the largest distance, or margin, to the nearest training data points of any class. A larger margin represents better linear separation between classes and will correspond to a lower error of the classifier in new samples. Samples that lie on the margin boundaries are also called ‘support vectors.’ The decision function provides per-class scores than can be turned into probabilities estimates of class membership. We used Support vector classification (SVC) with a linear kernel as implemented in the scikit-learn package (version 1.0.9) (Pedregosa et al., 2011) to classify a schizophrenia group from a control group. These default hyperparameters were chosen based on following an example of SVC provided by scikit-learn, however, similar results were obtained using a radial basis function kernel (not shown). This classification setting of distinguishing schizophrenia from a control group was chosen due to past work showing the presence of both case-control group differences and individual differences (Wolfers et al., 2018). The evaluation metric for the classification task is an area under the receiving operator curve (AUC) averaged across all folds within a 10-fold cross-validation framework.

Task three: Multivariate prediction – regression

A linear regression model was implemented to predict a latent variable of cognition (general cognitive ability) in the HCP dataset. Brain Basis Set (BBS) is a predictive modeling approach developed and validated in previous studies (Sripada et al., 2019; Sripada et al., 2019); see also studies by Wager and colleagues for a broadly similar approach (Chang et al., 2015; Wager et al., 2013; Woo et al., 2017). BBS is similar to principal component regression (Jolliffe, 1982; Park, 1981), with an added predictive element. In the training set, PCA is performed on a n_subjects by p_brain_features matrix using the PCA function from scikit-learn in Python, yielding components ordered by descending eigenvalues. Expression scores are then calculated for each of the k components for each subject by projecting each subject’s feature matrix onto each component. A linear regression model is then fit with these expression scores as predictors and the phenotype of interest (general cognitive ability) as the outcome, saving B, the k x 1 vector of fitted coefficients, for later use. In a test partition, the expression scores for each of the k components for each subject are again calculated. The predicted phenotype for each test subject is the dot product of B learned from the training partition with the vector of component expression scores for that subject. We set k=15 in all models, following prior work (Rutherford et al., 2020). The evaluation metric for the regression task is the mean squared error of the prediction in the test set.

Benchmarking: Model comparison evaluation

Evaluation metrics of each task (count, AUC, and MSE) were calculated independently for both deviation score (Z) and raw data (R) models. Higher AUC, higher count, and lower MSE represent better model performance. We then have a statistic of interest that is observed, theta, which represents the difference between deviation and raw data model performance.

θtask1=Countz-CountR
θtask2=AUCz-AUCR
θtask3=MSER-MSEz

To assess whether θ is more likely than would be expected by chance, we generated the null distribution for theta using permutations. Within one iteration of the permutation framework, a random sample is generated by shuffling the labels (In tasks 1 & 2 we shuffle SZ/HC labels, and in task three we shuffle cognition labels). Then this sample is used to train both deviation and raw models, ensuring the same row shuffling scheme across both deviation score and raw data datasets (for each permutation iteration). The shuffled models are evaluated, and we calculate θperm for each random shuffle of labels. We set n_permutations =10,000 and use the distribution of θperm to calculate a p-value for θobserved at each benchmarking task. The permuted p-value is equal to (C+1)/(n_permutations+1). Where C is the number of permutations where θperm>=θobserved. The same evaluation procedure described here 293 (including permutations) was performed for both cortical thickness and functional network modalities.

Results

Sharing of functional big data normative models

The first result of this work is the evaluation of the functional big data normative models (Figure 2). These models build upon the work of Rutherford et al., 2022a in which we shared population-level structural normative models charting cortical thickness and subcortical volume across the human lifespan (ages 2–100). The datasets used for training the functional models, the age range of the sample, and the procedures for evaluation closely resemble the structural normative models. The sample size (approx. N=22,000) used for training and testing the functional models is smaller than the structural models (approx. N=58,000) due to data availability (i.e. some sites included in the structural models did not collect functional data or could not share the data) and the quality control procedures (see methods). However, despite the smaller sample size of the functional data reference cohort, the ranges of the evaluation metrics are quite similar to the structural models (Figure 3). Most importantly, we demonstrate the opportunity to transfer the functional models to new samples, or sites that were not included in the original training and testing sets, referred to as the transfer set, and show that transfer works well in a clinical sample (Figure 3 - transfer patients) or sample of healthy controls (Figure 3 - transfer controls).

Figure 2. Functional brain network normative modeling.

Figure 2.

(A) Age distribution per scanning site in the train, test, and transfer data partitions and across the full sample (train +test). (B) The Yeo-17 brain network atlas is used to generate connectomes. Between network connectivity was calculated for all 17 networks, resulting in 136 unique network pairs that were each individually input into a functional normative model. (C) The explained variance in the controls test set (N=7244) of each of the unique 136 network pairs of the Yeo-17 atlas. Networks were clustered for visualization to show similar variance patterns.

Figure 3. Functional normative model evaluation metrics.

Figure 3.

(A) Explained variance per network pair across the test set (top), and both transfer sets (patients – middle, controls – bottom). Networks were clustered for visualization to show similar variance patterns. (B) The distribution across all models of the evaluation metrics (columns) in the test set (top row) and both transfer sets (middle and bottom rows). Higher explained variance (closer to one), more negative MSLL, and normally distributed skew and kurtosis correspond to better model fit.

Normative modeling shows larger effect sizes in mass univariate group differences

The strongest evidence for embracing normative modeling can be seen in the benchmarking task one group difference (schizophrenia versus controls) testing results (Table 2, Figure 4). In this application, we observe numerous group differences in both functional and structural deviation score models after applying stringent multiple comparison corrections (FDR p-value <0.05). The strongest effects (HC>SZ) in the structural models were located in the right hemisphere lateral occipitotemporal sulcus (S_oc_temp_lat) thickness, right hemisphere superior segment of the circular sulcus of the insula (S_circular_ins_sup) thickness, right Accumbens volume, left hemisphere Supramarginal gyrus (G_pariet_inf_Supramar) thickness, and left hemisphere Inferior occipital gyrus (O3) and sulcus (G_and_S_occipital_inf) thickness. For the functional models, the strongest effects (HC>SZ t-statistic) were observed in the between-network connectivity of Visual A-Default B, Dorsal Attention A-Control B, and Visual B-Limbic A. In the raw data models, which were residualized of covariates including site, sex, and linear +quadratic effects of age and head motion (only included for functional models), we observe no group differences after multiple comparison corrections. The lack of any group differences in the raw data was initially a puzzling finding due to reported group differences in the literature (Arbabshirani et al., 2013; Cetin et al., 2015; Cetin et al., 2016; Cheon et al., 2022; Dansereau et al., 2017; Howes et al., 2023; Lei et al., 2020a; Lei et al., 2020b; Meng et al., 2017; Rahim et al., 2017; Rosa et al., 2015; Salvador et al., 2017; Shi et al., 2021; van Erp et al., 2018; Venkataraman et al., 2012; Wannan et al., 2019; Yu et al., 2012), however, upon the investigation of the uncorrected statistical maps, we observe that the raw data follows a similar pattern to the deviation group difference map (Figure 4), but these results do not withstand multiple comparison correction. For full statistics including the corrected and uncorrected p-values and test-statistic of every ROI, see Supplementary files 3 and 4. While there have been reported group differences between controls and schizophrenia in cortical thickness and resting state brain networks in the literature, these studies have used different datasets (of varying sample sizes), different preprocessing pipelines and software versions, and different statistical frameworks (Castro et al., 2016; Di Biase et al., 2019; Dwyer et al., 2018; Geisler et al., 2015; Marek et al., 2022; Sui et al., 2015; van Haren et al., 2011). When reviewing the literature of studies on SZ versus HC group difference testing, we did not find any study that performed univariate t-testing and multiple comparison correction at the ROI-level or network-level, rather most works used statistical tests and multiple comparison correction at the voxel-level or edge-level. Combined with the known patterns of heterogeneity present in schizophrenia disorder (Lv et al., 2021; Wolfers et al., 2018), it is unsurprising that our results differ from past studies.

Table 2. Benchmarking results.

Deviation (Z) score column shows the performance using deviation scores (AUC for classification, the total number of regions with significant group differences FDR-corrected p<0.05 for case versus control, mean squared error for regression), Raw column represents the performance when using the raw data, and Difference column shows the difference between the deviation scores and raw data (Deviation - Raw). Higher AUC, higher count, and lower MSE represent better performance. Positive values in the Difference column show that there is better performance when using deviation scores as input features for classification and group difference tasks, and negative performance difference values for the regression task show there is a better performance using the deviation scores. *=statistically significant difference between Z and Raw established using permutation testing (10 k perms).

Benchmark Modality Normative ModelingDeviation Score Data Raw Data PerformanceDifference
Group Difference Cortical thickness 117/187 0/187 117*
Group Difference Functional Networks 50/136 0/136 50*
Classification Cortical thickness 0.87 0.43 0.44*
Classification Functional Networks 0.69 0.68 0.01
Regression Cortical thickness 0.699 0.708 –0.008
Regression Functional Networks 0.877 0.890 –0.013

Figure 4. Group difference testing evaluation.

Figure 4.

(A) Significant group differences in the deviation score models, (top left) functional brain network deviation, and (top right) cortical thickness deviation scores. The raw data, either cortical thickness or functional brain networks (residualized of sex and linear/ quadratic effects of age and motion (mean framewise displacement)) resulted in no significant group differences after multiple comparison corrections. Functional networks were clustered for visualization to show similar variance patterns. (B) There are still individual differences observed that do not overlap with the group difference map, showing the benefit of normative modeling, which can detect both group and individual differences through proper modeling of variation. Functional networks were clustered for visualization to show similar variance patterns. (C) There are significant group differences in the summaries (count) of the individual difference maps (panel B).

The qualitative (Figure 4B) and quantitative (Figure 4C) comparison of the group difference maps with the individual difference maps showed the additional benefit of normative modeling - that it can reveal subtle individual differences which are lost when only looking at group means. The individual difference maps show that at every brain region or connection, there is at least one person, across both patient and clinical groups, that has an extreme deviation. We found significant differences in the count of negative deviations (SZ >HC) for both cortical thickness (p=0.0029) and functional networks (p=0.013), and significant differences (HC >SZ) in the count of positive cortical thickness (p=0.0067).

Normative modeling shows highest classification performance using cortical thickness

In benchmarking task two, we classified schizophrenia versus controls using SVC within a 10-fold cross-validation framework (Table 2, Figure 5). The best-performing model used cortical thickness deviation scores to achieve a classification accuracy of 87% (AUC = 0.87). The raw cortical thickness model accuracy was indistinguishable from chance accuracy (AUC = 0.43). The AUC performance difference between the cortical thickness deviation and raw data models was 0.44, and this performance difference was statistically significant. The functional models, both deviation scores (0.69) and raw data (0.68) were more accurate than chance accuracy, however, the performance difference (i.e. improvement in accuracy using the deviation scores) was small (0.01) and was not statistically significant.

Figure 5. Benchmark task two multivariate prediction Classification evaluation.

Figure 5.

(A) Support vector classification (SVC) using cortical thickness deviation scores as input features (most accurate model). (B) SVC using cortical thickness (residualized of sex and linear/quadratic effects of age) as input features. (C) SVC using functional brain network deviation scores as input features. (D) SVC using functional brain networks (residualized of sex and linear/ quadratic effects of age and motion (mean framewise displacement)) as input features.

Normative modeling shows modest performance improvement in predicting cognition

In benchmarking task three we fit multivariate predictive models in a held-out test set of healthy individuals in the Human Connectome Project young-adult study to predict general cognitive ability (Table 2). The evidence provided by this task weakly favors the deviation score models. The most accurate (lowest mean squared error) model was the deviation cortical thickness model (MSE = 0.699). However, there was only an improvement of 0.008 in the deviation score model compared to the raw data model (MSE = 0.708) and this difference was not statistically significant. For the functional models, both the deviation score (MSE = 0.877) and raw data (MSE = 0.890) models were less accurate than the structural models and the difference between them (0.013) was also not statistically significant.

Discussion

This work expands the available open-source tools for conducting normative modeling analyses and provides clear evidence for why normative modeling should be utilized by the neuroimaging community (and beyond). We updated our publicly available repository of pre-trained normative models to include a new MRI imaging modality (models of resting-state functional connectivity extracted from the Yeo-17 and Smith-10 brain network atlases) and demonstrate how to transfer these models to new data sources. The repository includes an example transfer dataset and in addition, we have developed a user-friendly interface (https://pcnportal.dccn.nl/) that allows transferring the pre-trained normative models to new samples without requiring any programming. Next, we compared the features that are output from normative modeling (deviation scores) against ‘raw’ data features across several benchmarking tasks including univariate group difference testing (schizophrenia vs. control), multivariate prediction – classification (schizophrenia vs. control), and multivariate prediction – regression (predicting general cognitive ability). We found across all benchmarking tasks there were minor (regression) to strong (group difference testing) benefits of using deviation scores compared to the raw data features.

The fact that the deviation score models perform better than the raw data models confirms the utility of placing individuals into reference models. Our results show that normative modeling can capture population trends, uncover clinical group differences, and preserve the ability to study individual differences. We have some intuition on why the deviation score models perform better on the benchmarking tasks than the raw data. With normative modeling, we are accounting for many sources of variance that are not necessarily clinically meaningful (i.e. site) and we are able to capture clinically meaningful information within the reference cohort perspective. The reference model helps beyond just removing confounding variables such as scanner noise because we show that even when removing the nuisance covariates (age, sex, site, head motion) from the raw data, the normative modeling features still perform better on the benchmarking tasks.

Prior works on the methodological innovation and application of normative modeling Kia et al., 2018; Kia et al., 2020; Kia et al., 2021; Kia and Marquand, 2018 have focused on the beginning foundational steps of the framework (i.e. data selection and preparation, algorithmic implementation, and carefully evaluating out of sample model performance). However, the framework does not end after the model has been fit to the data (estimation step) and performance metrics have been established (evaluation step). Transferring the models to new samples, interpretation of the results, and potential downstream analysis are equally important steps, but they have received less attention. When it comes time to interpret the model outputs, it is easy to fall back into the case-control thinking paradigm, even after fitting a normative model to one’s data (which is supposed to be an alternative to case vs. control approaches). This is due in part to the challenges arising from the results existing in a very high dimensional space (~100 s to 1000 s of brain regions from ~100 s to 1000 s of subjects). There is a reasonable need to distill and summarize these high-dimensional results. However, it is important to remember there is always a trade-off between having a complex enough of a model to explain the data and dimensionality reduction for the sake of interpretation simplicity. This distillation process often leads back to placing individuals into groups (i.e. case-control thinking) and interpreting group patterns or looking for group effects, rather than interpreting results at the level of the individual. We acknowledge the value and complementary nature of understanding individual variation relative to group means (case-control thinking) and clarify that we do not claim the superiority of normative modeling over case-control methods. Rather, our results from this work, especially in the comparisons of group difference map to individual difference maps (Figure 4), show that the outputs of normative modeling can be used to validate, refine, and further understand some of the inconsistencies in previous findings from case-control literature.

There are several limitations of the present work. First, the representation of functional normative models may be surprising and concerning. Typically, resting-state connectivity matrices are calculated using parcellations containing between 100–1000 nodes and 5000–500,000 connections. However, the Yeo-17 atlas (Yeo et al., 2011) was specifically chosen because of its widespread use and the fact that many other (higher resolution) functional brain parcellations have been mapped to the Yeo brain networks (Eickhoff et al., 2018; Glasser et al., 2016; Gordon et al., 2016; Gordon et al., 2017; Kong et al., 2019; Laumann et al., 2015; Power et al., 2011; Schaefer et al., 2018; Shen et al., 2013). There is an on-going debate about the best representation of functional brain activity. Using the Yeo-17 brain networks to model functional connectivity ignores important considerations regarding brain dynamics, flexible node configurations, overlapping functional modes, hard versus soft parcellations, and many other important issues. We have also shared functional normative models using the Smith-10 ICA-based parcellation, though did not repeat the benchmarking tasks using these data. Apart from our choice of parcellation, there are fundamental open questions regarding the nature of the brain’s functional architecture, including how it is defined and measured. While it is outside the scope of this work to engage in these debates, we acknowledge their importance and refer curious readers to a thorough review of functional connectivity challenges (Bijsterbosch et al., 2020).

We would also like to expand on our prior discussion (Rutherford et al., 2022a) on the limitations of the reference cohort demographics, and the use of the word ‘normative.’ The included sample for training the functional normative models in this work, and the structural normative modeling sample in Rutherford et al., 2022a are most likely overrepresentative of European-ancestry (WEIRD population Henrich et al., 2010) due to the data coming from academic research studies, which do not match global population demographics. Our models do not include race or ethnicity as covariates due to data availability (many sites did not provide race or ethnicity information). Prior research supports the use of age-specific templates and ethnicity-specific growth charts (Dong et al., 2020). This is a major limitation that requires additional future work and should be considered carefully when transferring the model to diverse data (Benkarim et al., 2022; Greene et al., 2022; Li et al., 2022). The term ‘normative model’ can be defined in other fields in a very different manner than ours (Baron, 2004; Colyvan, 2013; Titelbaum, 2021). We clarify that ours is strictly a statistical notion (normative = being within the central tendency for a population). Critically, we do not use normative in a moral or ethical sense, and we are not suggesting that individuals with high deviation scores require action or intervention to be pulled toward the population average. Although in some cases this may be true, we in no way assume that high deviations are problematic or unhealthy (they may in fact represent compensatory changes that are adaptive). In any case, we treat large deviations from statistical normality strictly as markers predictive of clinical states or conditions of interest.

There are many open research questions regarding normative modeling. Future research directions are likely to include: (1) further expansion of open-source pre-trained normative modeling repositories to include additional MRI imaging modalities such as task-based functional MRI and diffusion-weighted imaging, other neuroimaging modalities such as EEG or MEG, and models that include other non-biological measures, (2) increase in the resolution of existing models (i.e. voxel, vertex, models of brain structure and higher resolution functional parcellations), (3) replication and refinement of the proposed benchmarking tasks in other datasets including hyperparameter tuning and different algorithm implementation, and improving the regression benchmarking task, and (4) including additional benchmarking tasks beyond the ones considered here.

There has been recent interesting work on ‘failure analysis’ of brain-behavior models (Greene et al., 2022), and we would like to highlight that normative modeling is an ideal method for conducting this type of analysis. Through normative modeling, research questions such as ‘what are the common patterns in the subjects that are classified well versus those that are not classified well’ can be explored. Additional recent work (Marek et al., 2022) has highlighted important issues the brain-behavior modeling community must face, such as poor reliability of the imaging data, poor stability and accuracy of the predictive models, and the very large sample sizes (exceeding that of even the largest neuroimaging samples) required for accurate predictions. There has also been working showing that brain-behavior predictions are more reliable than the underlying functional data (Taxali et al., 2021), and other ideas for improving brain-behavior predictive models are discussed in-depth here (Finn and Rosenberg, 2021; Rosenberg and Finn, 2022). Nevertheless, we acknowledge these challenges and believe that sharing pre-trained machine learning models and further development of transfer learning of these models could help further address these issues.

In this work, we have focused on the downstream steps of the normative modeling framework involving evaluation and interpretation, and how insights can be made on multiple levels. Through the precise modeling of different sources of variation, there is much knowledge to be gained at the level of populations, clinical groups, and individuals.

Appendix 1

Functional MRI Acquisition Parameters

In the HCP study, four runs of resting state fMRI data (14.5 min each) were acquired on a Siemens Skyra 3 Tesla scanner using multi-band gradient-echo EPI (TR = 720ms, TE = 33ms, flip angle = 52°, multiband acceleration factor = 8, 2 mm isotropic voxels, FOV = 208 × 180 mm, 72 slices, alternating RL/LR phase encode direction). T1 weighted scans were acquired with 3D MPRAGE sequence (TR = 2400ms, TE = 2.14ms, TI = 1000ms, flip angle = 8, 0.7 mm isotropic voxels, FOV = 224 mm, 256 sagittal slices) and T2 weighted scans were acquired with a SPACE sequence (TR = 3200ms, TE = 565ms, 0.7 mm isotropic voxels, FOV = 224 mm, 256 sagittal slices). In the COBRE study, the T1 weighted acquisition is a multi-echo MPRAGE (MEMPR) sequence (1 mm isotropic). Resting state functional MRI data was collected with single-shot full k-space echo-planar imaging (EPI) (TR = 2000ms, TE = 29ms, FOV = 64 × 64, 32 slices in axial plane interleaved multi slice series ascending, voxel size = 3 × 3 x4 mm3). The University of Michigan SchizGaze study was collected in two phases with different parameters but using the same MRI machine (3.0T GE MR 750 Discovery scanner). In SchizGaze1 (N=47), functional images were acquired with a T2*-weighted, reverse spiral acquisition sequence (TR = 2000ms, 240 volumes (8 min), 3 mm isotropic voxels) and a T1-weighted image was acquired in the same prescription as the functional images to facilitate co-registration. In SchizGaze2 (N=68), functional images were acquired with a T2*-weighted multi-band EPI sequence (multi-band acceleration factor of 8, TR = 800ms, 453 volumes (6 min), 2.4 mm isotropic voxels) and T1w (MPRAGE) and T2w structural scans were acquired for co-registration with the functional data. In addition, field maps were acquired to correct for intensity and geometric distortions.

Functional MRI Preprocessing Methods

T1w images are corrected for intensity nonuniformity, reconstructed with recon-all (FreeSurfer), spatially normalized (ANTs), and segmented with FAST (FSL). For every BOLD run, data are co-registered to the corresponding T1w reference, and the BOLD signal is sampled onto the subject’s surfaces with mri_vol2surf (FreeSurfer). A set of noise regressors are generated during the preceding steps that are used to remove a number of artifactual signals from the data during subsequent processing, and these noise regressors include: head-motion parameters (via MCFLIFT; FSL) framewise displacement and DVARS, and physiological noise regressors for use in component-based noise correction (CompCor). ICA-based denoising is implemented via ICA-AROMA and we compute ‘non-aggressive’ noise regressors. Resting state connectomes are generated from the fMRIPrep processed resting state data using Nilearn, denoising using the noise regressors generated above, with orthogonalization of regressors to avoid reintroducing artifactual signals.

Functional Brain Networks Normative Modeling

Data from 40 sites were combined to create the initial full sample. These sites are described in detail in Supplementary file 1, including the sample size, age (mean and standard deviation), and sex distribution of each site. Many sites were pulled from publicly available datasets including ABCD, CAMCAN, CMI-HBN, HCP-Aging, HCP-Development, HCP-Early Psychosis, HCP-Young Adult, NKI-RS, OpenNeuro, PNC, and UKBiobank. For datasets that include repeated visits (i.e. ABCD, UKBiobank), only the first visit was included. Full details regarding sample characteristics, diagnostic procedures and acquisition protocols can be found in the publications associated with each of the studies. Training and testing datasets (80/20) were created using scikit-learn’s train_test_split function, stratifying on the site variable. To show generalizability of the models to new data not included in training, we leveraged three datasets (ds000243, ds002843, ds003798) from OpenNeuro.org to create a multi-site transfer dataset.

Normative modeling was run using python 3.8 and the PCNtoolkit package (version 0.26). Bayesian Linear Regression (BLR) with likelihood warping was used to predict each Yeo-17 network pair from a vector of covariates (age, sex, mean_FD, site). For a detailed mathematical description see Fraza et al., 2021. Briefly, for each region of interest, y is predicted as:

y=wTϕx+ϵ (1)

Where wT is the estimated weight vector, ϕx is a basis expansion of the of covariate vector x, consisting of a B-spline basis expansion (cubic spline with 5 evenly spaced knots) to model non-linear effects of age, and ϵ=η0,β a Gaussian noise distribution with mean zero and noise precision term β (the inverse variance). A likelihood warping approach was used to model non-Gaussian effects by applying a bijective nonlinear warping function to the non-Gaussian response variables to map them to a Gaussian latent space where inference can be performed in closed form. We used a ‘sinarcsinsh’ warping function, equivalent to the SHASH distribution that is commonly used in the generalized additive modeling literature (Jones and Pewsey, 2009). Site variation was modeled using fixed effects. A fast numerical optimization algorithm was used to optimize hyperparameters (‘Powell’). Deviation scores (Z-scores) are calculated for the n-th subject, and d-th brain area, in the test set as:

Znd=yndy^ndσd2+(σ2)d (2)

Where ynd is the true response, y^nd is the predicted mean, σd2 is the estimated noise variance (reflecting uncertainty in the data), and σ*2d is the variance attributed to modeling uncertainty. Model fit for each brain region was evaluated by calculating the explained variance (which measures central tendency), the mean squared log-loss (MSLL, central tendency and variance) plus skew and kurtosis of the deviation scores (2) which measures how well the shape of the regression function matches the data (Dinga et al., 2021).

All pretrained models and code are shared online including documentation for transferring to new sites and an example transfer dataset. Given a new set of data (e.g. sites not present in the training set), this is done by first applying the warp parameters estimating on the training data to the new dataset, adjusting the mean and variance in the latent Gaussian space, then (if necessary) warping the adjusted data back to the original space, which is similar to the approach outlined in Dinga et al., 2021.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Contributor Information

Saige Rutherford, Email: saige.rutherford@donders.ru.nl.

Chris I Baker, National Institute of Mental Health, United States.

Chris I Baker, National Institute of Mental Health, United States.

Funding Information

This paper was supported by the following grants:

  • European Research Council 10100118 to Andre F Marquand.

  • Wellcome Trust 215698/Z/19/Z to Andre F Marquand.

  • Wellcome Trust 098369/Z/12/Z to Christian F Beckmann.

  • National Institute of Mental Health R01MH122491 to Ivy F Tso.

  • National Institute of Mental Health R01MH123458 to Chandra Sripada.

  • National Institute of Mental Health R01MH130348 to Chandra Sripada.

Additional information

Competing interests

No competing interests declared.

No competing interests declared.

is director and shareholder of SBGNeuro Ltd.

received speaker's honorarium from Lundbeck and Janssen.

Author contributions

Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Methodology, Writing - original draft, Writing – review and editing.

Resources, Software, Writing – review and editing.

Supervision, Funding acquisition, Writing – review and editing.

Supervision, Funding acquisition, Writing – review and editing.

Supervision, Funding acquisition, Writing – review and editing.

Supervision, Funding acquisition, Writing – review and editing.

Supervision, Funding acquisition, Writing – review and editing.

Ethics

Human subjects: Secondary data analysis was conducted in this work. Data were pooled from multiple data sources described in the supplemental tables. All subjects provided informed consent. Subject recruitment procedures and informed consent forms, including consent to share de-identified data, were approved by the corresponding university institutional review board where data were collected. Human subjects: Ethical approval for the public data were provided by the relevant local research authorities for the studies contributing data. For full details, see the main study publications in the main text. For all clinical studies, approval was obtained via the local ethical review authorities, i.e., Delta: The local ethics committee of the Academic Medical Center of the University of Amsterdam (AMC-METC) Nr.:11/050, UMich_IMPS: University of Michigan Institution Review Board HUM00088188, UMich_SZG: University of Michigan Institution Review Board HUM00080457.

Additional files

Supplementary file 1. Functional Normative Model Demographics.

Description: For each included site, we show the sample size (N), age (mean, standard deviation), and sex distribution (Female/Male percent) in the training set (shown in blue) and testing set (shown in green) of the normative models of functional connectivity between large scale resting-state brain networks from the Yeo 17 network atlas.

elife-85082-supp1.xlsx (12.2KB, xlsx)
Supplementary file 2. Surface Area Normative Model Demographics.

Description: For each included site, we show the sample size (N), age (mean, standard deviation), and sex distribution (Female/Male percent) of the normative models of surface area extracted for all regions of interest in the Destrieux Freesurfer atlas.

elife-85082-supp2.xlsx (11.9KB, xlsx)
Supplementary file 3. Structural Group Difference Testing Statistics.

Description: We show for all cortical thickness and subcortical volume from the Destrieux and aseg Freesurfer atlases regions of interest (ROIs from a two-sample t-test between Schizophrenia versus Healthy Controls) the t-statistic (T-stat), False Discovery Rate corrected p-value (FDRcorr_pvalue), and uncorrected p-value (uncorr_pvalue) for both the raw data (shown in green) and the deviation scores (shown in blue).

elife-85082-supp3.xlsx (19.2KB, xlsx)
Supplementary file 4. Functional Connectivity Group Difference Testing Statistics.

Description: We show for all Yeo-17 between network connectivity regions of interest (ROIs) from a two-sample t-test between Schizophrenia versus Healthy Controls the t-statistic (T-stat), False Discovery Rate corrected p-value (FDRcorr_pvalue), and uncorrected p-value (uncorr_pvalue) for both the raw data (shown in green) and the deviation scores (shown in blue).

elife-85082-supp4.xlsx (15.9KB, xlsx)
MDAR checklist

Data availability

Pre-trained normative models are available on GitHub (https://github.com/predictive-clinical-neuroscience/braincharts, (copy archived at swh:1:rev:299e126ff053e2353091831a888c3ccd1ca6edeb)) and Google Colab (https://colab.research.google.com/github/predictive-clinical-neuroscience/braincharts/blob/master/scripts/apply_normative_models_yeo17.ipynb). Scripts for running the benchmarking analysis and visualizations are available on GitHub (https://github.com/saigerutherford/evidence_embracing_nm, (copy archived at swh:1:rev:1b4198389e2940dd3d10055164d68d46e0a20750)). An online portal for running models without code is available (https://pcnportal.dccn.nl).

References

  1. Aine CJ, Bockholt HJ, Bustillo JR, Cañive JM, Caprihan A, Gasparovic C, Hanlon FM, Houck JM, Jung RE, Lauriello J, Liu J, Mayer AR, Perrone-Bizzozero NI, Posse S, Stephen JM, Turner JA, Clark VP, Calhoun VD. Multimodal neuroimaging in schizophrenia: description and dissemination. Neuroinformatics. 2017;15:343–364. doi: 10.1007/s12021-017-9338-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Arbabshirani MR, Kiehl KA, Pearlson GD, Calhoun VD. Classification of schizophrenia patients based on resting-state functional network connectivity. Frontiers in Neuroscience. 2013;7:133. doi: 10.3389/fnins.2013.00133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baker JT, Dillon DG, Patrick LM, Roffman JL, Brady RO, Pizzagalli DA, Öngür D, Holmes AJ. Functional connectomics of affective and psychotic pathology. PNAS. 2019;116:9050–9059. doi: 10.1073/pnas.1820780116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Baron J. In: Blackwell Handbook of Judgment and Decision Making. Koehler DJ, Harvey N, editors. Blackwell Publishing Ltd; 2004. Normative models of judgment and decision making; pp. 19–36. [DOI] [Google Scholar]
  5. Benkarim O, Paquola C, Park BY, Kebets V, Hong SJ, Vos de Wael R, Zhang S, Yeo BTT, Eickenberg M, Ge T, Poline JB, Bernhardt BC, Bzdok D. Population heterogeneity in clinical cohorts affects the predictive accuracy of brain imaging. PLOS Biology. 2022;20:e3001627. doi: 10.1371/journal.pbio.3001627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bijsterbosch J, Harrison SJ, Jbabdi S, Woolrich M, Beckmann C, Smith S, Duff EP. Challenges and future directions for representations of functional brain organization. Nature Neuroscience. 2020;23:1484–1495. doi: 10.1038/s41593-020-00726-z. [DOI] [PubMed] [Google Scholar]
  7. Borghi E, de Onis M, Garza C, Van den Broeck J, Frongillo EA, Grummer-Strawn L, Van Buuren S, Pan H, Molinari L, Martorell R, Onyango AW, Martines JC, WHO Multicentre Growth Reference Study Group Construction of the world health organization child growth standards: selection of methods for attained growth curves. Statistics in Medicine. 2006;25:247–265. doi: 10.1002/sim.2227. [DOI] [PubMed] [Google Scholar]
  8. Cai N, Choi KW, Fried EI. Reviewing the genetics of heterogeneity in depression: operationalizations, manifestations and etiologies. Human Molecular Genetics. 2020;29:R10–R18. doi: 10.1093/hmg/ddaa115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Castro E, Hjelm RD, Plis SM, Dinh L, Turner JA, Calhoun VD. Deep independence network analysis of structural brain imaging: application to schizophrenia. IEEE Transactions on Medical Imaging. 2016;35:1729–1740. doi: 10.1109/TMI.2016.2527717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cetin MS, Houck JM, Vergara VM, Miller RL, Calhoun V. Multimodal based classification of schizophrenia patients. Conf Proc IEEE Eng Med Biol Soc. 2015;2015:2629–2632. doi: 10.1109/EMBC.2015.7318931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cetin MS, Houck JM, Rashid B, Agacoglu O, Stephen JM, Sui J, Canive J, Mayer A, Aine C, Bustillo JR, Calhoun VD. Multimodal classification of schizophrenia patients with MEG and fmri data using static and dynamic connectivity measures. Frontiers in Neuroscience. 2016;10:466. doi: 10.3389/fnins.2016.00466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Chang LJ, Gianaros PJ, Manuck SB, Krishnan A, Wager TD. A sensitive and specific neural signature for picture-induced negative affect. PLOS Biology. 2015;13:e1002180. doi: 10.1371/journal.pbio.1002180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cheon EJ, Bearden CE, Sun D, Ching CRK, Andreassen OA, Schmaal L, Veltman DJ, Thomopoulos SI, Kochunov P, Jahanshad N, Thompson PM, Turner JA, van Erp TGM. Cross disorder comparisons of brain structure in schizophrenia, bipolar disorder, major depressive disorder, and 22q11.2 deletion syndrome: a review of enigma findings. Psychiatry and Clinical Neurosciences. 2022;76:140–161. doi: 10.1111/pcn.13337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Colyvan M. Idealisations in normative models. Synthese. 2013;190:1337–1350. doi: 10.1007/s11229-012-0166-z. [DOI] [Google Scholar]
  15. Cuthbert BN, Insel TR. Toward the future of psychiatric diagnosis: the seven pillars of rdoc. BMC Medicine. 2013;11:126. doi: 10.1186/1741-7015-11-126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Dale AM, Fischl B, Sereno MI. Cortical surface-based analysis. I. segmentation and surface reconstruction. NeuroImage. 1999;9:179–194. doi: 10.1006/nimg.1998.0395. [DOI] [PubMed] [Google Scholar]
  17. Dansereau C, Benhajali Y, Risterucci C, Pich EM, Orban P, Arnold D, Bellec P. Statistical power and prediction accuracy in multisite resting-state fmri connectivity. NeuroImage. 2017;149:220–232. doi: 10.1016/j.neuroimage.2017.01.072. [DOI] [PubMed] [Google Scholar]
  18. Di Biase MA, Cropley VL, Cocchi L, Fornito A, Calamante F, Ganella EP, Pantelis C, Zalesky A. Linking cortical and connectional pathology in schizophrenia. Schizophrenia Bulletin. 2019;45:911–923. doi: 10.1093/schbul/sby121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Dinga R, Fraza CJ, Bayer JMM, Kia SM, Beckmann CF, Marquand AF. Normative Modeling of Neuroimaging Data Using Generalized Additive Models of Location Scale and Shape. bioRxiv. 2021 doi: 10.1101/2021.06.14.448106. [DOI]
  20. Dong H-M, Castellanos FX, Yang N, Zhang Z, Zhou Q, He Y, Zhang L, Xu T, Holmes AJ, Thomas Yeo BT, Chen F, Wang B, Beckmann C, White T, Sporns O, Qiu J, Feng T, Chen A, Liu X, Chen X, Weng X, Milham MP, Zuo X-N. Charting brain growth in tandem with brain templates at school age. Science Bulletin. 2020;65:1924–1934. doi: 10.1016/j.scib.2020.07.027. [DOI] [PubMed] [Google Scholar]
  21. Dwyer DB, Cabral C, Kambeitz-Ilankovic L, Sanfelici R, Kambeitz J, Calhoun V, Falkai P, Pantelis C, Meisenzahl E, Koutsouleris N. Brain subtyping enhances the neuroanatomical discrimination of schizophrenia. Schizophrenia Bulletin. 2018;44:1060–1069. doi: 10.1093/schbul/sby008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Eickhoff SB, Yeo BTT, Genon S. Imaging-based parcellations of the human brain. Nature Reviews. Neuroscience. 2018;19:672–686. doi: 10.1038/s41583-018-0071-7. [DOI] [PubMed] [Google Scholar]
  23. Esteban O, Markiewicz CJ, Blair RW, Moodie CA, Isik AI, Erramuzpe A, Kent JD, Goncalves M, DuPre E, Snyder M, Oya H, Ghosh SS, Wright J, Durnez J, Poldrack RA, Gorgolewski KJ. FMRIPrep: a robust preprocessing pipeline for functional MRI. Nature Methods. 2019;16:111–116. doi: 10.1038/s41592-018-0235-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Etkin A. A reckoning and research agenda for neuroimaging in psychiatry. The American Journal of Psychiatry. 2019;176:507–511. doi: 10.1176/appi.ajp.2019.19050521. [DOI] [PubMed] [Google Scholar]
  25. Filip P, Bednarik P, Eberly LE, Moheet A, Svatkova A, Grohn H, Kumar AF, Seaquist ER, Mangia S. Different freesurfer versions might generate different statistical outcomes in case-control comparison studies. Neuroradiology. 2022;64:765–773. doi: 10.1007/s00234-021-02862-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Finn ES, Rosenberg MD. Beyond fingerprinting: choosing predictive connectomes over reliable connectomes. NeuroImage. 2021;239:118254. doi: 10.1016/j.neuroimage.2021.118254. [DOI] [PubMed] [Google Scholar]
  27. First MB. In SCID-5-CV: Structured Clinical Interview for DSM-5 Disorders: Clinician Version. American Psychiatric Association Publishing; U-M Catalog Search; 1956. [Google Scholar]
  28. Fischl B, Dale AM. Measuring the thickness of the human cerebral cortex from magnetic resonance images. PNAS. 2000;97:11050–11055. doi: 10.1073/pnas.200033797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, van der Kouwe A, Killiany R, Kennedy D, Klaveness S, Montillo A, Makris N, Rosen B, Dale AM. Whole brain segmentation. Neuron. 2002;33:341–355. doi: 10.1016/S0896-6273(02)00569-X. [DOI] [PubMed] [Google Scholar]
  30. Flake JK, Fried EI. Measurement schmeasurement: questionable measurement practices and how to avoid them. Advances in Methods and Practices in Psychological Science. 2020;3:456–465. doi: 10.1177/2515245920952393. [DOI] [Google Scholar]
  31. Floris DL, Wolfers T, Zabihi M, Holz NE, Zwiers MP, Charman T, Tillmann J, Ecker C, Dell’Acqua F, Banaschewski T, Moessnang C, Baron-Cohen S, Holt R, Durston S, Loth E, Murphy DGM, Marquand A, Buitelaar JK, Beckmann CF, EU-AIMS Longitudinal European Autism Project Group Atypical brain asymmetry in autism-A candidate for clinically meaningful stratification. Biological Psychiatry. Cognitive Neuroscience and Neuroimaging. 2021;6:802–812. doi: 10.1016/j.bpsc.2020.08.008. [DOI] [PubMed] [Google Scholar]
  32. Fraza CJ, Dinga R, Beckmann CF, Marquand AF. Warped bayesian linear regression for normative modelling of big data. NeuroImage. 2021;245:118715. doi: 10.1016/j.neuroimage.2021.118715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Fraza C, Zabihi M, Beckmann CF, Marquand AF. The Extremes of Normative Modelling. bioRxiv. 2022 doi: 10.1101/2022.08.23.505049. [DOI]
  34. Gau R, Noble S, Heuer K, Bottenhorn KL, Bilgin IP, Yang Y-F, Huntenburg JM, Bayer JMM, Bethlehem RAI, Rhoads SA, Vogelbacher C, Borghesani V, Levitis E, Wang H-T, Van Den Bossche S, Kobeleva X, Legarreta JH, Guay S, Atay SM, Varoquaux GP, Huijser DC, Sandström MS, Herholz P, Nastase SA, Badhwar A, Dumas G, Schwab S, Moia S, Dayan M, Bassil Y, Brooks PP, Mancini M, Shine JM, O’Connor D, Xie X, Poggiali D, Friedrich P, Heinsfeld AS, Riedl L, Toro R, Caballero-Gaudes C, Eklund A, Garner KG, Nolan CR, Demeter DV, Barrios FA, Merchant JS, McDevitt EA, Oostenveld R, Craddock RC, Rokem A, Doyle A, Ghosh SS, Nikolaidis A, Stanley OW, Uruñuela E, Brainhack Community Brainhack: developing a culture of open, inclusive, community-driven neuroscience. Neuron. 2021;109:1769–1775. doi: 10.1016/j.neuron.2021.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Geisler D, Walton E, Naylor M, Roessner V, Lim KO, Charles Schulz S, Gollub RL, Calhoun VD, Sponheim SR, Ehrlich S. Brain structure and function correlates of cognitive subtypes in schizophrenia. Psychiatry Research. 2015;234:74–83. doi: 10.1016/j.pscychresns.2015.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Gershon RC, Cella D, Fox NA, Havlik RJ, Hendrie HC, Wagster MV. Assessment of neurological and behavioural function: the NIH toolbox. The Lancet. Neurology. 2010;9:138–139. doi: 10.1016/S1474-4422(09)70335-7. [DOI] [PubMed] [Google Scholar]
  37. Glasser MF, Coalson TS, Robinson EC, Hacker CD, Harwell J, Yacoub E, Ugurbil K, Andersson J, Beckmann CF, Jenkinson M, Smith SM, Van Essen DC. A multi-modal parcellation of human cerebral cortex. Nature. 2016;536:171–178. doi: 10.1038/nature18933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Goodkind M, Eickhoff SB, Oathes DJ, Jiang Y, Chang A, Jones-Hagata LB, Ortega BN, Zaiko YV, Roach EL, Korgaonkar MS, Grieve SM, Galatzer-Levy I, Fox PT, Etkin A. Identification of a common neurobiological substrate for mental illness. JAMA Psychiatry. 2015;72:305–315. doi: 10.1001/jamapsychiatry.2014.2206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Gordon EM, Laumann TO, Adeyemo B, Huckins JF, Kelley WM, Petersen SE. Generation and evaluation of a cortical area parcellation from resting-state correlations. Cerebral Cortex. 2016;26:288–303. doi: 10.1093/cercor/bhu239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Gordon EM, Laumann TO, Gilmore AW, Newbold DJ, Greene DJ, Berg JJ, Ortega M, Hoyt-Drazen C, Gratton C, Sun H, Hampton JM, Coalson RS, Nguyen AL, McDermott KB, Shimony JS, Snyder AZ, Schlaggar BL, Petersen SE, Nelson SM, Dosenbach NUF. Precision functional mapping of individual human brains. Neuron. 2017;95:791–807. doi: 10.1016/j.neuron.2017.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Greene AS, Shen X, Noble S, Horien C, Hahn CA, Arora J, Tokoglu F, Spann MN, Carrión CI, Barron DS, Sanacora G, Srihari VH, Woods SW, Scheinost D, Constable RT. Brain-phenotype models fail for individuals who defy sample stereotypes. Nature. 2022;609:109–118. doi: 10.1038/s41586-022-05118-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Henrich J, Heine SJ, Norenzayan A. The weirdest people in the world? The Behavioral and Brain Sciences. 2010;33:61–83. doi: 10.1017/S0140525X0999152X. [DOI] [PubMed] [Google Scholar]
  43. Holz NE, Floris DL, Llera A, Aggensteiner PM, Kia SM, Wolfers T, Baumeister S, Böttinger B, Glennon JC, Hoekstra PJ, Dietrich A, Saam MC, Schulze UME, Lythgoe DJ, Williams SCR, Santosh P, Rosa-Justicia M, Bargallo N, Castro-Fornieles J, Arango C, Penzol MJ, Walitza S, Meyer-Lindenberg A, Zwiers M, Franke B, Buitelaar J, Naaijen J, Brandeis D, Beckmann C, Banaschewski T, Marquand AF. Age-related brain deviations and aggression. Psychological Medicine. 2022;1:1–10. doi: 10.1017/S003329172200068X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Howes OD, Cummings C, Chapman GE, Shatalina E. Neuroimaging in schizophrenia: an overview of findings and their implications for synaptic changes. Neuropsychopharmacology. 2023;48:151–167. doi: 10.1038/s41386-022-01426-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, Sanislow C, Wang P. Research domain criteria (rdoc): toward a new classification framework for research on mental disorders. The American Journal of Psychiatry. 2010;167:748–751. doi: 10.1176/appi.ajp.2010.09091379. [DOI] [PubMed] [Google Scholar]
  46. Itälinna V, Kaltiainen H, Forss N, Liljeström M, Parkkonen L. Detecting Mild Traumatic Brain Injury with MEG, Normative Modelling and Machine Learning. medRxiv. 2022 doi: 10.1101/2022.09.29.22280521. [DOI] [PMC free article] [PubMed]
  47. Jolliffe IT. A note on the use of principal components in regression. Applied Statistics. 1982;31:300. doi: 10.2307/2348005. [DOI] [Google Scholar]
  48. Jones MC, Pewsey A. Sinh-arcsinh distributions. Biometrika. 2009;96:761–780. doi: 10.1093/biomet/asp053. [DOI] [Google Scholar]
  49. Kia SM, Beckmann CF, Marquand AF. Scalable Multi-Task Gaussian Process Tensor Regression for Normative Modeling of Structured Variation in Neuroimaging Data. arXiv. 2018 http://arxiv.org/abs/1808.00036
  50. Kia SM, Marquand A. Normative Modeling of Neuroimaging Data Using Scalable Multi-Task Gaussian Processes. arXiv. 2018 http://arxiv.org/abs/1806.01047
  51. Kia SM, Huijsdens H, Dinga R, Wolfers T, Mennes M, Andreassen OA, Westlye LT, Beckmann CF, Marquand AF. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. Martel AL, Abolmaesumi P, Stoyanov D, Mateus D, Zuluaga MA, Zhou SK, Racoceanu D, Joskowicz L, editors. Springer International Publishing; 2020. Hierarchical bayesian regression for multi-site normative modeling of neuroimaging data; pp. 699–709. [DOI] [Google Scholar]
  52. Kia SM, Huijsdens H, Rutherford S, Dinga R, Wolfers T, Mennes M, Andreassen OA, Westlye LT, Beckmann CF, Marquand AF. Federated Multi-Site Normative Modeling Using Hierarchical Bayesian Regression. bioRxiv. 2021 doi: 10.1101/2021.05.28.446120. [DOI] [PMC free article] [PubMed]
  53. Kia SM, Huijsdens H, Rutherford S, de Boer A, Dinga R, Wolfers T, Berthet P, Mennes M, Andreassen OA, Westlye LT, Beckmann CF, Marquand AF. Closing the life-cycle of normative modeling using federated hierarchical bayesian regression. PLOS ONE. 2022;17:e0278776. doi: 10.1371/journal.pone.0278776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Kjelkenes R, Wolfers T, Alnæs D, van der Meer D, Pedersen ML, Dahl A, Voldsbekk I, Moberget T, Tamnes CK, Andreassen OA, Marquand AF, Westlye LT. Mapping normative trajectories of cognitive function and its relation to psychopathology symptoms and genetic risk in youth. Biological Psychiatry Global Open Science. 2022;1:007. doi: 10.1016/j.bpsgos.2022.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Klapwijk ET, van de Kamp F, van der Meulen M, Peters S, Wierenga LM. Qoala-T: a supervised-learning tool for quality control of freesurfer segmented MRI data. NeuroImage. 2019;189:116–129. doi: 10.1016/j.neuroimage.2019.01.014. [DOI] [PubMed] [Google Scholar]
  56. Kong R, Li J, Orban C, Sabuncu MR, Liu H, Schaefer A, Sun N, Zuo XN, Holmes AJ, Eickhoff SB, Yeo BTT. Spatial topography of individual-specific cortical networks predicts human cognition, personality, and emotion. Cerebral Cortex. 2019;29:2533–2551. doi: 10.1093/cercor/bhy123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Kumar S. NormVAE: Normative Modeling on Neuroimaging Data Using Variational Autoencoders. arXiv. 2021 https://arxiv.org/abs/2110.04903v2
  58. Laumann TO, Gordon EM, Adeyemo B, Snyder AZ, Joo SJ, Chen MY, Gilmore AW, McDermott KB, Nelson SM, Dosenbach NUF, Schlaggar BL, Mumford JA, Poldrack RA, Petersen SE. Functional system and areal organization of a highly sampled individual human brain. Neuron. 2015;87:657–670. doi: 10.1016/j.neuron.2015.06.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Lee W, Bindman J, Ford T, Glozier N, Moran P, Stewart R, Hotopf M. Bias in psychiatric case-control studies: literature survey. The British Journal of Psychiatry. 2007;190:204–209. doi: 10.1192/bjp.bp.106.027250. [DOI] [PubMed] [Google Scholar]
  60. Lei D, Pinaya WHL, van Amelsvoort T, Marcelis M, Donohoe G, Mothersill DO, Corvin A, Gill M, Vieira S, Huang X, Lui S, Scarpazza C, Young J, Arango C, Bullmore E, Qiyong G, McGuire P, Mechelli A. Detecting schizophrenia at the level of the individual: relative diagnostic value of whole-brain images, connectome-wide functional connectivity and graph-based metrics. Psychological Medicine. 2020a;50:1852–1861. doi: 10.1017/S0033291719001934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Lei D, Pinaya WHL, Young J, van Amelsvoort T, Marcelis M, Donohoe G, Mothersill DO, Corvin A, Vieira S, Huang X, Lui S, Scarpazza C, Arango C, Bullmore E, Gong Q, McGuire P, Mechelli A. Integrating machining learning and multimodal neuroimaging to detect schizophrenia at the level of the individual. Human Brain Mapping. 2020b;41:1119–1135. doi: 10.1002/hbm.24863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Levitis E, van Praag CDG, Gau R, Heunis S, DuPre E, Kiar G, Bottenhorn KL, Glatard T, Nikolaidis A, Whitaker KJ, Mancini M, Niso G, Afyouni S, Alonso-Ortiz E, Appelhoff S, Arnatkeviciute A, Atay SM, Auer T, Baracchini G, Bayer JMM, Beauvais MJS, Bijsterbosch JD, Bilgin IP, Bollmann S, Bollmann S, Botvinik-Nezer R, Bright MG, Calhoun VD, Chen X, Chopra S, Chuan-Peng H, Close TG, Cookson SL, Craddock RC, De La Vega A, De Leener B, Demeter DV, Di Maio P, Dickie EW, Eickhoff SB, Esteban O, Finc K, Frigo M, Ganesan S, Ganz M, Garner KG, Garza-Villarreal EA, Gonzalez-Escamilla G, Goswami R, Griffiths JD, Grootswagers T, Guay S, Guest O, Handwerker DA, Herholz P, Heuer K, Huijser DC, Iacovella V, Joseph MJE, Karakuzu A, Keator DB, Kobeleva X, Kumar M, Laird AR, Larson-Prior LJ, Lautarescu A, Lazari A, Legarreta JH, Li X-Y, Lv J, Mansour L S, Meunier D, Moraczewski D, Nandi T, Nastase SA, Nau M, Noble S, Norgaard M, Obungoloch J, Oostenveld R, Orchard ER, Pinho AL, Poldrack RA, Qiu A, Raamana PR, Rokem A, Rutherford S, Sharan M, Shaw TB, Syeda WT, Testerman MM, Toro R, Valk SL, Van Den Bossche S, Varoquaux G, Váša F, Veldsman M, Vohryzek J, Wagner AS, Walsh RJ, White T, Wong F-T, Xie X, Yan C-G, Yang Y-F, Yee Y, Zanitti GE, Van Gulick AE, Duff E, Maumet C. Centering inclusivity in the design of online conferences-an OHBM-open science perspective. GigaScience. 2021;10:giab051. doi: 10.1093/gigascience/giab051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Li J, Bzdok D, Chen J, Tam A, Ooi LQR, Holmes AJ, Ge T, Patil KR, Jabbi M, Eickhoff SB, Yeo BTT, Genon S. Cross-ethnicity/race generalization failure of behavioral prediction from resting-state functional connectivity. Science Advances. 2022;8:eabj1812. doi: 10.1126/sciadv.abj1812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Linden DEJ. The challenges and promise of neuroimaging in psychiatry. Neuron. 2012;73:8–22. doi: 10.1016/j.neuron.2011.12.014. [DOI] [PubMed] [Google Scholar]
  65. Loth E, Ahmad J, Chatham C, López B, Carter B, Crawley D, Oakley B, Hayward H, Cooke J, San José Cáceres A, Bzdok D, Jones E, Charman T, Beckmann C, Bourgeron T, Toro R, Buitelaar J, Murphy D, Dumas G. The meaning of significant mean group differences for biomarker discovery. PLOS Computational Biology. 2021;17:e1009477. doi: 10.1371/journal.pcbi.1009477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Lv J, Di Biase M, Cash RFH, Cocchi L, Cropley VL, Klauser P, Tian Y, Bayer J, Schmaal L, Cetin-Karayumak S, Rathi Y, Pasternak O, Bousman C, Pantelis C, Calamante F, Zalesky A. Individual deviations from normative models of brain structure in a large cross-sectional schizophrenia cohort. Molecular Psychiatry. 2021;26:3512–3523. doi: 10.1038/s41380-020-00882-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Madre M, Canales-Rodríguez EJ, Fuentes-Claramonte P, Alonso-Lana S, Salgado-Pineda P, Guerrero-Pedraza A, Moro N, Bosque C, Gomar JJ, Ortíz-Gil J, Goikolea JM, Bonnin CM, Vieta E, Sarró S, Maristany T, McKenna PJ, Salvador R, Pomarol-Clotet E. Structural abnormality in schizophrenia versus bipolar disorder: a whole brain cortical thickness, surface area, volume and gyrification analyses. NeuroImage. Clinical. 2020;25:102131. doi: 10.1016/j.nicl.2019.102131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics. 1947;18:50–60. doi: 10.1214/aoms/1177730491. [DOI] [Google Scholar]
  69. Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, Donohue MR, Foran W, Miller RL, Hendrickson TJ, Malone SM, Kandala S, Feczko E, Miranda-Dominguez O, Graham AM, Earl EA, Perrone AJ, Cordova M, Doyle O, Moore LA, Conan GM, Uriarte J, Snider K, Lynch BJ, Wilgenbusch JC, Pengo T, Tam A, Chen J, Newbold DJ, Zheng A, Seider NA, Van AN, Metoki A, Chauvin RJ, Laumann TO, Greene DJ, Petersen SE, Garavan H, Thompson WK, Nichols TE, Yeo BTT, Barch DM, Luna B, Fair DA, Dosenbach NUF. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022;603:654–660. doi: 10.1038/s41586-022-04492-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Marquand AF, Rezek I, Buitelaar J, Beckmann CF. Understanding heterogeneity in clinical cohorts using normative models: beyond case-control studies. Biological Psychiatry. 2016a;80:552–561. doi: 10.1016/j.biopsych.2015.12.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Marquand AF, Wolfers T, Mennes M, Buitelaar J, Beckmann CF. Beyond lumping and splitting: A review of computational approaches for stratifying psychiatric disorders. Biological Psychiatry. Cognitive Neuroscience and Neuroimaging. 2016b;1:433–447. doi: 10.1016/j.bpsc.2016.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Marquand AF, Haak KV, Beckmann CF. Functional corticostriatal connection topographies predict goal directed behaviour in humans. Nature Human Behaviour. 2017;1:0146. doi: 10.1038/s41562-017-0146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Marquand A, Rutherford S, Kia SM, Wolfers T, Fraza C, Dinga R, Zabihi M. PCNToolkit. Zenodo. 2021 doi: 10.5281/zenodo.5207839. [DOI]
  74. McTeague LM, Huemer J, Carreon DM, Jiang Y, Eickhoff SB, Etkin A. Identification of common neural circuit disruptions in cognitive control across psychiatric disorders. The American Journal of Psychiatry. 2017;174:676–685. doi: 10.1176/appi.ajp.2017.16040400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Meng X, Jiang R, Lin D, Bustillo J, Jones T, Chen J, Yu Q, Du Y, Zhang Y, Jiang T, Sui J, Calhoun VD. Predicting individualized clinical measures by a generalized prediction framework and multimodal fusion of MRI data. NeuroImage. 2017;145:218–229. doi: 10.1016/j.neuroimage.2016.05.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Michelini G, Palumbo IM, DeYoung CG, Latzman RD, Kotov R. Linking rdoc and hitop: a new interface for advancing psychiatric nosology and neuroscience. Clinical Psychology Review. 2021;86:102025. doi: 10.1016/j.cpr.2021.102025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Moriarity DP, Alloy LB. Back to basics: the importance of measurement properties in biological psychiatry. Neuroscience and Biobehavioral Reviews. 2021;123:72–82. doi: 10.1016/j.neubiorev.2021.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Moriarity DP, Joyner KJ, Slavich GM, Alloy LB. Unconsidered issues of measurement noninvariance in biological psychiatry: a focus on biological phenotypes of psychopathology. Molecular Psychiatry. 2022;27:1281–1285. doi: 10.1038/s41380-021-01414-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Mottron L, Bzdok D. Diagnosing as autistic people increasingly distant from prototypes lead neither to clinical benefit nor to the advancement of knowledge. Molecular Psychiatry. 2022;27:773–775. doi: 10.1038/s41380-021-01343-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Nour MM, Liu Y, Dolan RJ. Functional neuroimaging in psychiatry and the case for failing better. Neuron. 2022;110:2524–2544. doi: 10.1016/j.neuron.2022.07.005. [DOI] [PubMed] [Google Scholar]
  81. Park SH. Collinearity and optimal restrictions on regression parameters for estimating responses. Technometrics. 1981;23:289. doi: 10.2307/1267793. [DOI] [Google Scholar]
  82. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É. Scikit-learn: machine learning in python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
  83. Pereira-Sanchez V, Castellanos FX. Neuroimaging in attention-deficit/hyperactivity disorder. Current Opinion in Psychiatry. 2021;34:105–111. doi: 10.1097/YCO.0000000000000669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Power JD, Cohen AL, Nelson SM, Wig GS, Barnes KA, Church JA, Vogel AC, Laumann TO, Miezin FM, Schlaggar BL, Petersen SE. Functional network organization of the human brain. Neuron. 2011;72:665–678. doi: 10.1016/j.neuron.2011.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Pruim RHR, Mennes M, Buitelaar JK, Beckmann CF. Evaluation of ICA-AROMA and alternative strategies for motion artifact removal in resting state fmri. NeuroImage. 2015a;112:278–287. doi: 10.1016/j.neuroimage.2015.02.063. [DOI] [PubMed] [Google Scholar]
  86. Pruim RHR, Mennes M, van Rooij D, Llera A, Buitelaar JK, Beckmann CF. ICA-AROMA: a robust ICA-based strategy for removing motion artifacts from fmri data. NeuroImage. 2015b;112:267–277. doi: 10.1016/j.neuroimage.2015.02.064. [DOI] [PubMed] [Google Scholar]
  87. Rahim M, Thirion B, Bzdok D, Buvat I, Varoquaux G. Joint prediction of multiple scores captures better individual traits from brain images. NeuroImage. 2017;158:145–154. doi: 10.1016/j.neuroimage.2017.06.072. [DOI] [PubMed] [Google Scholar]
  88. Rosa MJ, Portugal L, Hahn T, Fallgatter AJ, Garrido MI, Shawe-Taylor J, Mourao-Miranda J. Sparse network-based models for patient classification using fmri. NeuroImage. 2015;105:493–506. doi: 10.1016/j.neuroimage.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Rosenberg MD, Finn ES. How to establish robust brain-behavior relationships without thousands of individuals. Nature Neuroscience. 2022;25:835–837. doi: 10.1038/s41593-022-01110-9. [DOI] [PubMed] [Google Scholar]
  90. Rutherford S, Angstadt M, Sripada C, Chang SE. Leveraging Big Data for Classification of Children Who Stutter from Fluent Peers. bioRxiv. 2020 doi: 10.1101/2020.10.28.359711. [DOI]
  91. Rutherford S, Fraza C, Dinga R, Kia SM, Wolfers T, Zabihi M, Berthet P, Worker A, Verdi S, Andrews D, Han LK, Bayer JM, Dazzan P, McGuire P, Mocking RT, Schene A, Sripada C, Tso IF, Duval ER, Chang SE, Penninx BW, Heitzeg MM, Burt SA, Hyde LW, Amaral D, Wu Nordahl C, Andreasssen OA, Westlye LT, Zahn R, Ruhe HG, Beckmann C, Marquand AF. Charting brain growth and aging at high spatial precision. eLife. 2022a;11:e72904. doi: 10.7554/eLife.72904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Rutherford S, Kia SM, Wolfers T, Fraza C, Zabihi M, Dinga R, Berthet P, Worker A, Verdi S, Ruhe HG, Beckmann CF, Marquand AF. The normative modeling framework for computational psychiatry. Nature Protocols. 2022b;17:1711–1734. doi: 10.1038/s41596-022-00696-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Salvador R, Radua J, Canales-Rodríguez EJ, Solanes A, Sarró S, Goikolea JM, Valiente A, Monté GC, Natividad MDC, Guerrero-Pedraza A, Moro N, Fernández-Corcuera P, Amann BL, Maristany T, Vieta E, McKenna PJ, Pomarol-Clotet E. Evaluation of machine learning algorithms and structural features for optimal MRI-based diagnostic prediction in psychosis. PLOS ONE. 2017;12:e0175683. doi: 10.1371/journal.pone.0175683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Sanislow CA. RDoC at 10: changing the discourse for psychopathology. World Psychiatry. 2020;19:311–312. doi: 10.1002/wps.20800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Schaefer A, Kong R, Gordon EM, Laumann TO, Zuo XN, Holmes AJ, Eickhoff SB, Yeo BTT. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cerebral Cortex. 2018;28:3095–3114. doi: 10.1093/cercor/bhx179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Shen X, Tokoglu F, Papademetris X, Constable RT. Groupwise whole-brain parcellation from resting-state fmri data for network node identification. NeuroImage. 2013;82:403–415. doi: 10.1016/j.neuroimage.2013.05.081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Shi D, Li Y, Zhang H, Yao X, Wang S, Wang G, Ren K. Machine learning of schizophrenia detection with structural and functional neuroimaging. Disease Markers. 2021;2021:9963824. doi: 10.1155/2021/9963824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Siegel JS, Mitra A, Laumann TO, Seitzman BA, Raichle M, Corbetta M, Snyder AZ. Data quality influences observed links between functional connectivity and behavior. Cerebral Cortex. 2017;27:4492–4502. doi: 10.1093/cercor/bhw253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, Mackay CE, Filippini N, Watkins KE, Toro R, Laird AR, Beckmann CF. Correspondence of the brain’s functional architecture during activation and rest. PNAS. 2009;106:13040–13045. doi: 10.1073/pnas.0905267106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Sprooten E, Rasgon A, Goodman M, Carlin A, Leibu E, Lee WH, Frangou S. Addressing reverse inference in psychiatric neuroimaging: meta-analyses of task-related brain activation in common mental disorders. Human Brain Mapping. 2017;38:1846–1864. doi: 10.1002/hbm.23486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Sripada C, Angstadt M, Rutherford S, Kessler D, Kim Y, Yee M, Levina E. Basic units of inter-individual variation in resting state connectomes. Scientific Reports. 2019;9:1900. doi: 10.1038/s41598-018-38406-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Sripada C, Angstadt M, Rutherford S, Taxali A, Shedden K. Toward a “treadmill test” for cognition: improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping. 2020a;41:3186–3197. doi: 10.1002/hbm.25007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Sripada C, Rutherford S, Angstadt M, Thompson WK, Luciana M, Weigard A, Hyde LH, Heitzeg M. Prediction of neurocognition in youth from resting state fmri. Molecular Psychiatry. 2020b;25:3413–3421. doi: 10.1038/s41380-019-0481-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Sui J, Pearlson GD, Du Y, Yu Q, Jones TR, Chen J, Jiang T, Bustillo J, Calhoun VD. In search of multimodal neuroimaging biomarkers of cognitive deficits in schizophrenia. Biological Psychiatry. 2015;78:794–804. doi: 10.1016/j.biopsych.2015.02.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Sui J, Qi S, van Erp TGM, Bustillo J, Jiang R, Lin D, Turner JA, Damaraju E, Mayer AR, Cui Y, Fu Z, Du Y, Chen J, Potkin SG, Preda A, Mathalon DH, Ford JM, Voyvodic J, Mueller BA, Belger A, McEwen SC, O’Leary DS, McMahon A, Jiang T, Calhoun VD. Multimodal neuromarkers in schizophrenia via cognition-guided MRI fusion. Nature Communications. 2018;9:3028. doi: 10.1038/s41467-018-05432-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Taxali A, Angstadt M, Rutherford S, Sripada C. Boost in test-retest reliability in resting state fmri with predictive modeling. Cerebral Cortex. 2021;31:2822–2833. doi: 10.1093/cercor/bhaa390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Titelbaum MG. Normative Modeling [Preprint] 2021. [February 1, 2021]. http://philsci-archive.pitt.edu/18670/
  108. Tso IF, Angstadt M, Rutherford S, Peltier S, Diwadkar VA, Taylor SF. Dynamic causal modeling of eye gaze processing in schizophrenia. Schizophrenia Research. 2021;229:112–121. doi: 10.1016/j.schres.2020.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. van Erp TGM, Walton E, Hibar DP, Schmaal L, Jiang W, Glahn DC, Pearlson GD, Yao N, Fukunaga M, Hashimoto R, Okada N, Yamamori H, Bustillo JR, Clark VP, Agartz I, Mueller BA, Cahn W, de Zwarte SMC, Hulshoff Pol HE, Kahn RS, Ophoff RA, van Haren NEM, Andreassen OA, Dale AM, Doan NT, Gurholt TP, Hartberg CB, Haukvik UK, Jørgensen KN, Lagerberg TV, Melle I, Westlye LT, Gruber O, Kraemer B, Richter A, Zilles D, Calhoun VD, Crespo-Facorro B, Roiz-Santiañez R, Tordesillas-Gutiérrez D, Loughland C, Carr VJ, Catts S, Cropley VL, Fullerton JM, Green MJ, Henskens FA, Jablensky A, Lenroot RK, Mowry BJ, Michie PT, Pantelis C, Quidé Y, Schall U, Scott RJ, Cairns MJ, Seal M, Tooney PA, Rasser PE, Cooper G, Shannon Weickert C, Weickert TW, Morris DW, Hong E, Kochunov P, Beard LM, Gur RE, Gur RC, Satterthwaite TD, Wolf DH, Belger A, Brown GG, Ford JM, Macciardi F, Mathalon DH, O’Leary DS, Potkin SG, Preda A, Voyvodic J, Lim KO, McEwen S, Yang F, Tan Y, Tan S, Wang Z, Fan F, Chen J, Xiang H, Tang S, Guo H, Wan P, Wei D, Bockholt HJ, Ehrlich S, Wolthusen RPF, King MD, Shoemaker JM, Sponheim SR, De Haan L, Koenders L, Machielsen MW, van Amelsvoort T, Veltman DJ, Assogna F, Banaj N, de Rossi P, Iorio M, Piras F, Spalletta G, McKenna PJ, Pomarol-Clotet E, Salvador R, Corvin A, Donohoe G, Kelly S, Whelan CD, Dickie EW, Rotenberg D, Voineskos AN, Ciufolini S, Radua J, Dazzan P, Murray R, Reis Marques T, Simmons A, Borgwardt S, Egloff L, Harrisberger F, Riecher-Rössler A, Smieskova R, Alpert KI, Wang L, Jönsson EG, Koops S, Sommer IEC, Bertolino A, Bonvino A, Di Giorgio A, Neilson E, Mayer AR, Stephen JM, Kwon JS, Yun J-Y, Cannon DM, McDonald C, Lebedeva I, Tomyshev AS, Akhadov T, Kaleda V, Fatouros-Bergman H, Flyckt L, Busatto GF, Rosa PGP, Serpa MH, Zanetti MV, Hoschl C, Skoch A, Spaniel F, Tomecek D, Hagenaars SP, McIntosh AM, Whalley HC, Lawrie SM, Knöchel C, Oertel-Knöchel V, Stäblein M, Howells FM, Stein DJ, Temmingh HS, Uhlmann A, Lopez-Jaramillo C, Dima D, McMahon A, Faskowitz JI, Gutman BA, Jahanshad N, Thompson PM, Turner JA, Karolinska Schizophrenia Project Cortical brain abnormalities in 4474 individuals with schizophrenia and 5098 control subjects via the enhancing neuro imaging genetics through meta analysis (enigma) consortium. Biological Psychiatry. 2018;84:644–654. doi: 10.1016/j.biopsych.2018.04.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil K, Consortium WMH. The WU-minn human connectome project: an overview. NeuroImage. 2013;80:62–79. doi: 10.1016/j.neuroimage.2013.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. van Haren NEM, Schnack HG, Cahn W, van den Heuvel MP, Lepage C, Collins L, Evans AC, Hulshoff Pol HE, Kahn RS. Changes in cortical thickness during the course of illness in schizophrenia. Archives of General Psychiatry. 2011;68:871–880. doi: 10.1001/archgenpsychiatry.2011.88. [DOI] [PubMed] [Google Scholar]
  112. Venkataraman A, Whitford TJ, Westin CF, Golland P, Kubicki M. Whole brain resting state functional connectivity abnormalities in schizophrenia. Schizophrenia Research. 2012;139:7–12. doi: 10.1016/j.schres.2012.04.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Verdi S, Marquand AF, Schott JM, Cole JH. Beyond the average patient: how neuroimaging models can address heterogeneity in dementia. Brain. 2021;144:2946–2953. doi: 10.1093/brain/awab165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat İ, Feng Y, Moore EW, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P, SciPy 1.0 Contributors SciPy 1.0: fundamental algorithms for scientific computing in python. Nature Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Wager TD, Atlas LY, Lindquist MA, Roy M, Woo CW, Kross E. An fmri-based neurologic signature of physical pain. The New England Journal of Medicine. 2013;368:1388–1397. doi: 10.1056/NEJMoa1204471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Wannan CMJ, Cropley VL, Chakravarty MM, Bousman C, Ganella EP, Bruggemann JM, Weickert TW, Weickert CS, Everall I, McGorry P, Velakoulis D, Wood SJ, Bartholomeusz CF, Pantelis C, Zalesky A. Evidence for network-based cortical thickness reductions in schizophrenia. The American Journal of Psychiatry. 2019;176:552–563. doi: 10.1176/appi.ajp.2019.18040380. [DOI] [PubMed] [Google Scholar]
  117. Winter NR, Leenings R, Ernsting J, Sarink K, Fisch L, Emden D, Blanke J, Goltermann J, Opel N, Barkhau C, Meinert S, Dohm K, Repple J, Mauritz M, Gruber M, Leehr EJ, Grotegerd D, Redlich R, Jansen A, Nenadic I, Nöthen MM, Forstner A, Rietschel M, Groß J, Bauer J, Heindel W, Andlauer T, Eickhoff SB, Kircher T, Dannlowski U, Hahn T. Quantifying deviations of brain structure and function in major depressive disorder across neuroimaging modalities. JAMA Psychiatry. 2022;79:879–888. doi: 10.1001/jamapsychiatry.2022.1780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Wolfers T, Buitelaar JK, Beckmann CF, Franke B, Marquand AF. From estimating activation locality to predicting disorder: a review of pattern recognition for neuroimaging-based psychiatric diagnostics. Neuroscience and Biobehavioral Reviews. 2015;57:328–349. doi: 10.1016/j.neubiorev.2015.08.001. [DOI] [PubMed] [Google Scholar]
  119. Wolfers T, Arenas AL, Onnink AMH, Dammers J, Hoogman M, Zwiers MP, Buitelaar JK, Franke B, Marquand AF, Beckmann CF. Refinement by integration: aggregated effects of multimodal imaging markers on adult ADHD. Journal of Psychiatry & Neuroscience. 2017;42:386–394. doi: 10.1503/jpn.160240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Wolfers T, Doan NT, Kaufmann T, Alnæs D, Moberget T, Agartz I, Buitelaar JK, Ueland T, Melle I, Franke B, Andreassen OA, Beckmann CF, Westlye LT, Marquand AF. Mapping the heterogeneous phenotype of schizophrenia and bipolar disorder using normative models. JAMA Psychiatry. 2018;75:1146–1155. doi: 10.1001/jamapsychiatry.2018.2467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Wolfers T, Beckmann CF, Hoogman M, Buitelaar JK, Franke B, Marquand AF. Individual differences v. the average patient: mapping the heterogeneity in ADHD using normative models. Psychological Medicine. 2020;50:314–323. doi: 10.1017/S0033291719000084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. Wolfers T, Rokicki J, Alnaes D, Berthet P, Agartz I, Kia SM, Kaufmann T, Zabihi M, Moberget T, Melle I, Beckmann CF, Andreassen OA, Marquand AF, Westlye LT. Replicating extensive brain structural heterogeneity in individuals with schizophrenia and bipolar disorder. Human Brain Mapping. 2021;42:2546–2555. doi: 10.1002/hbm.25386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Woo CW, Schmidt L, Krishnan A, Jepma M, Roy M, Lindquist MA, Atlas LY, Wager TD. Quantifying cerebral contributions to pain beyond nociception. Nature Communications. 2017;8:14211. doi: 10.1038/ncomms14211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Yeo BTT, Krienen FM, Sepulcre J, Sabuncu MR, Lashkari D, Hollinshead M, Roffman JL, Smoller JW, Zöllei L, Polimeni JR, Fischl B, Liu H, Buckner RL. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. Journal of Neurophysiology. 2011;106:1125–1165. doi: 10.1152/jn.00338.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Yu Q, Allen EA, Sui J, Arbabshirani MR, Pearlson G, Calhoun VD. Brain connectivity networks in schizophrenia underlying resting state functional magnetic resonance imaging. Current Topics in Medicinal Chemistry. 2012;12:2415–2425. doi: 10.2174/156802612805289890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Zabihi M, Oldehinkel M, Wolfers T, Frouin V, Goyard D, Loth E, Charman T, Tillmann J, Banaschewski T, Dumas G, Holt R, Baron-Cohen S, Durston S, Bölte S, Murphy D, Ecker C, Buitelaar JK, Beckmann CF, Marquand AF. Dissecting the heterogeneous cortical anatomy of autism spectrum disorder using normative models. Biological Psychiatry. Cognitive Neuroscience and Neuroimaging. 2019;4:567–578. doi: 10.1016/j.bpsc.2018.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Zabihi M, Floris DL, Kia SM, Wolfers T, Tillmann J, Arenas AL, Moessnang C, Banaschewski T, Holt R, Baron-Cohen S, Loth E, Charman T, Bourgeron T, Murphy D, Ecker C, Buitelaar JK, Beckmann CF, Marquand A, EU-AIMS LEAP Group Fractionating autism based on neuroanatomical normative modeling. Translational Psychiatry. 2020;10:384. doi: 10.1038/s41398-020-01057-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  128. Zhang J, Kucyi A, Raya J, Nielsen AN, Nomi JS, Damoiseaux JS, Greene DJ, Horovitz SG, Uddin LQ, Whitfield-Gabrieli S. What have we really learned from functional connectivity in clinical populations? NeuroImage. 2021;242:118466. doi: 10.1016/j.neuroimage.2021.118466. [DOI] [PubMed] [Google Scholar]

Editor's evaluation

Chris I Baker 1

This is a rigorous and compelling extension of previous normative modeling work. The current study demonstrates that normative models incorporating lifespan trajectories of structural and functional connectivity provide a strong basis for brain imaging studies across a range of tasks including, univariate group difference assessment, classification, and building regression models. The work is important, rigorous and a valuable contribution to the field.

Decision letter

Editor: Chris I Baker1
Reviewed by: Todd Constable2, Oscar Esteban3

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Evidence for Embracing Normative Modeling" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Chris Baker as the Senior Editor. The following individuals involved in the review of your submission have agreed to reveal their identity: Todd Constable (Reviewer #1); Oscar Esteban (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

While the reviewers are convinced of the potential importance of this extension of your earlier work, they raise some comments about the clarity of aspects of the. In your revision, it is important for you to consider each of these issues and revise the manuscript as appropriate.

Reviewer #1 (Recommendations for the authors):

This is a clearly written and scientifically rigorous presentation of the utility of normative modeling with structural and functional MR data. I have no suggestions for improving the work and find it acceptable for publication in its current form.

Reviewer #2 (Recommendations for the authors):

Please clarify: Are all normative models (i.e. across all modalities) based on the same set of covariates?

What exactly is the input to models in the benchmarking tasks? I.e. is it a single value (deviation score) per individual for the normative model but the full functional network for the raw data model?

As mentioned in the manuscript, the null findings for raw data models in the group difference task are surprising. Could you elaborate on the difference between cited work that has found significant effects? (E.g. differences in sample sizes, differences in preprocessing steps, did previous work not include multiple comparisons corrections?)

The choice of a linear kernel for the SVM classification task is not clear to me. Since the input vectors are low dimensional, would a non-linear (e.g. rbf) kernel not perform better?

Is there any expected reason why classification in the case of the raw data models would work on the functional but not on the structural measures? (Generally, I would expect a higher noise level in the functional data and thus a more challenging classification scenario.)

In line 415ff, you mention that normative models allow for the interrogation of patterns in classified vs misclassified samples. Have you seen any such patterns in the schizophrenia vs control benchmark test? (If so, I think it would be instructive for readers to include a brief discussion of observed patterns for the schizophrenia vs controls application.)

Reviewer #3 (Recommendations for the authors):

I believe this manuscript is in an advanced maturity status; therefore, my recommendations are not particularly major. In no particular order:

– Colab examples and tutorials: for some reason, the pnctoolkit package cannot be installed due to dependencies issues. Unfortunately, that dissuaded me from executing the examples (which I was really looking forward to doing and, I believe, is a significant aspect of the overall work).

– Abstract. The sentence "Across all benchmarks, we confirm the advantage (i.e., stronger effect sizes, more accurate classification, and prediction) of using normative modeling features" is bordering the overstatement. I would suggest bringing up more specific aspects of the results (more quantitative statements, if you will).

– Introduction (ll. 63--72): this paragraph feels a bit out of place. I think the intent is to connect the last sentence (69--72), which I agree is relevant. However, I am not convinced by the flow of the paragraph (e.g., not sure the references to Gau et al. (2021) and Levitis et al. (2021) are a good fit here, as opposed to a discussion). I would invite the authors to rethink how this paragraph is carried out.

– Table 1. HCP's sex ratios. I believe it should say 46.6% of males.

– Figure 1. I suggest annotating the different tasks within (C) and (D), and updating the caption accordingly.

– Methods (l. 203). For ignorant readers like this reviewer, a reduced mention of why this particular test (Mann-Whitney U-test) was chosen would be appreciated. Perhaps some further details could also be given in the appendix.

– Methods (l. 212). Some elaboration on how the hyperparameters were selected (i.e., kernel and C constant). I have looked into the corresponding jupyter notebook (which was of course, available, thanks!), and it seems the defaults were used. Could this have an impact on the advantages of normative modeling?

– Methods (l. 232) Evaluation. I would suggest (i) moving the metrics within the body of each task previously (i.e.,) the definition of $\theta$ in equations. (1, 2, 3); and (ii) keeping a subsection with ll. 244--254 with perhaps a more detailed section title (e.g., statistical inference --or evaluation-- of model comparisons).

– Results section: I would suggest the authors provide subsection titles that advance the contents of the subsection. E.g., in line 271, something along the lines of "Normative modeling showcased larger effect sizes in mass univariate group differences."

– ll. 290--292. I suggest reflecting this in Figure 4A, showcasing the corresponding results without multiple comparisons correction (which should be clearly indicated) to give further support to the idea that they were along the same lines.

– ll. 285--286. "The lack of any group differences in the raw data was initially a puzzling finding…" Can this be interpreted as normative models being more powered in the discussion? This is sideways mentioned when indicating that normative models have a harmonization effect on the individual metrics, more effective than regressing out obvious confounders. A more frontal/intentional incursion into this discussion would be appreciated.

– l. 301. The subsection hierarchy seems broken. It is confusing that this particular result is not at the same level as the other (e.g., l. 311).

– l. 365. That "from this work" seemed out of place.

eLife. 2023 Mar 13;12:e85082. doi: 10.7554/eLife.85082.sa2

Author response


Reviewer #2 (Recommendations for the authors):

Please clarify: Are all normative models (i.e. across all modalities) based on the same set of covariates?

The normative models for structure and function were fit using the same demographic variables (age and sex). Both structure and functional models also used MRI scanner site as a covariate, and both included a metric representing data quality (Euler number for structural models and mean framewise displacement for functional models). We have clarified this in the manuscript.

“The covariates used to train the structural normative models included age, sex, data quality metric (Euler number), and site… The covariates used to train the functional normative models were similar as the structural normative models which included age, sex, data quality metric (mean framewise displacement), and site.”

What exactly is the input to models in the benchmarking tasks? I.e. is it a single value (deviation score) per individual for the normative model but the full functional network for the raw data model?

The input into the benchmarking tasks is the same dimensionality for the deviation scores and raw data. For the structural models, each subject has 187 features, which represent all of the ROIs in the Destrieux atlas combined with all of the ROIs in the subcortical Freesurfer atlas. For the functional models, each subject has 136 features, which represent all of the 17 Yeo between brain network connectivity pairs. We have clarified this further in the text.

“For task 1, group difference testing, the models were fit in a univariate approach meaning there was one test performed for each brain feature and for tasks 2 and 3, classification and regression, the models were fit in a multivariate approach.”

As mentioned in the manuscript, the null findings for raw data models in the group difference task are surprising. Could you elaborate on the difference between cited work that has found significant effects? (E.g. differences in sample sizes, differences in preprocessing steps, did previous work not include multiple comparisons corrections?)

We found several papers that used the same COBRE dataset (1-3 described below) and looked at structural differences and classification (4 described below) between SZ and HC groups. While some of these papers report group differences, these papers do not directly perform t-tests on the cortical thickness ROI-level or voxel-level data. They all use much fancier approaches such as clustering, CCA, and non-linear ICA, as well as difference multiple comparison techniques and methods for preprocessing the data. There are no published works showing group differences in cortical thickness in the second dataset we used (UMich). We also further explored publications (using different datasets) that have directly performed ttests on cortical thickness data between SZ and HC groups and found significant group differences (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6948361/, https://academic.oup.com/schizophreniabulletin/article/45/4/911/5095722, and https://jamanetwork.com/journals/jamapsychiatry/fullarticle/1107282). We noticed these works performed voxel-level tests and used cluster-based multiple comparison correction methods. We could not find any publications that performed the t-tests and multiple comparison correction at the ROI-level, as we did in this work. We also point out work establishing the heterogeneity of cortical thickness in schizophrenia (https://www.nature.com/articles/s41380-020-00882-5), and we consider this to likely be a cause of these varying results in the literature (along with the varying methods used across studies). We have added these references and summarized these ideas in the Discussion section of our paper.

1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4705852/#SD1

They use a much older FS version (4.0.1), and a different method for multiple comparison correction (Monte Carlo simulations 10.1016/j.neuroimage.2006.07.036). They took a cluster approach and found significant group differences in 3 out of 4 clusters (reduced cortical thickness in SZ compared to HC). They do not directly do t-tests on the cortical thickness images alone (only on the clusters which are combined with behavioral data).

2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547923/

They use Gray Matter Volume (GMV) (and other imaging modalities) within a CCA framework combined with cognition behavioral data. Data were preprocessed in SPM8, and GMV images were smoothed with an 8mm kernel. All images were also normalized to have the same average sum of squares. They do not directly do t-tests on the GM images alone.

3. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4965265/

They use non-linear ICA, Variational Autoencoders, and regular ICA to look at structural abnormalities between SZ and HC. They find group differences in 4 out of 30 non-linear ICA components, and 2/20 VAE/ICA components. Structural data were preprocessed using SPM (unspecified version) and were smoothed with an 8mm Gaussian kernel.

4. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6101481/

They also merged the COBRE dataset with another SZ dataset collected in Germany. They do not perform group difference testing but use a linear SVM to classify SZ vs. HC (68.3% accuracy) and performed further sub-group analysis based on symptom scores and other behavioral variables. Gray Matter Volume images were used and were preprocessed using SPM and DARTEL.

“While there have been reported group differences between controls and schizophrenia in cortical thickness and resting state brain networks in the literature, these studies have used different data sets (of varying sample sizes), different preprocessing pipelines and software versions, and different statistical frameworks (Castro et al., 2016; Di Biase et al., 2019; Dwyer et al., 2018; Geisler et al., 2015; Madre et al., 2019; Sui et al., 2015; van Haren et al., 2011). When reviewing the literature of studies on SZ vs. HC group difference testing, we did not find any study that performed univariate t-testing and multiple comparison correction at the ROI-level or network-level, rather most work used statistical tests and multiple comparison correction at the voxel-level or edge-level. Combined with the known patterns of heterogeneity present in schizophrenia disorder (Lv et al., 2020; Wolfers et al., 2018), it is unsurprising that our results differ from past studies.”

The choice of a linear kernel for the SVM classification task is not clear to me. Since the input vectors are low dimensional, would a non-linear (e.g. rbf) kernel not perform better?

The choice of a linear kernel was rather arbitrary, as we were following along with an example provided on the scikit-learn website of running SVC and evaluating/plotting the ROC within a cross-validation framework, as shown here: https://scikitlearn.org/stable/auto_examples/model_selection/plot_roc_crossval.html#sphx-glr-autoexamples-model-selection-plot-roc-crossval-py. We chose to use the same hyperparameter settings that were used in this example. We have re-run the classification task using the rbf kernel instead of the linear kernel in the SVC. The results were approximately similar for the deviation score and raw models (both imaging modalities). Therefore, we chose to leave the results as is within the manuscript and added to the manuscript an explanation of the hyperparameter choices as following the example provided by scikit-learn. We have also added to the discussion paragraph on future ideas, that hyperparameter tuning and implementing other algorithms may improve the performance.

“These hyperparameters were chosen based on following an example of SVC provided by scikit-learn.”

“Future research directions are likely to include: … (3) replication and refinement of the proposed benchmarking tasks in other datasets including hyperparameter tuning and different algorithm implementations”

Is there any expected reason why classification in the case of the raw data models would work on the functional but not on the structural measures? (Generally, I would expect a higher noise level in the functional data and thus a more challenging classification scenario.)

Our speculation about the cause of this result is that there may be an effect of confounding variable(s) that the normative models help “clean up” and that the structural data is more effected by this than the functional data. This would also explain why there is such an improvement in the classification when using the structural deviation scores compared to the raw structural features. In the functional models, there is not much of a difference between the deviation score and raw models. While we residualized the raw data of age, sex, and motion (for the functional data) before inputting them into the benchmarking tasks, there could still be some confounding effects present in the raw data. Beyond this, we are unsure and cannot say much regarding the causal nature of this result. We leave this up to future work, and mention this in the Discussion section.

In line 415ff, you mention that normative models allow for the interrogation of patterns in classified vs misclassified samples. Have you seen any such patterns in the schizophrenia vs control benchmark test? (If so, I think it would be instructive for readers to include a brief discussion of observed patterns for the schizophrenia vs controls application.)

This idea was mentioned in the Discussion section in the paragraph on future work. We agree with the reviewer that this is an important and exciting next step to pursue but doing so properly requires significant additional analyses. We believe including these additional analyses in the methods, results, and discussion of the current work would make the paper too crowded.

Reviewer #3 (Recommendations for the authors):

I believe this manuscript is in an advanced maturity status; therefore, my recommendations are not particularly major. In no particular order:

– Colab examples and tutorials: for some reason, the pnctoolkit package cannot be installed due to dependencies issues. Unfortunately, that dissuaded me from executing the examples (which I was really looking forward to doing and, I believe, is a significant aspect of the overall work).

We have fixed this issue and the Colab notebooks now run as intended. We have also released a version of our code and data (and all required python packages) in a Docker container. However, we note that only the HCP and COBRE datasets have open sharing allowed (not the UMich data). Therefore, the code will not exactly recreate the task 1 and 2 results because we can only partially share the data needed to re-run the analysis. The notebooks that are hosted on GitHub have been shared with the cell outputs saved so that the results and figures can be seen from using all of the data (both COBRE and UMich samples). We have also built a website (no code required) for transferring the pre-trained normative models to new data that is available here (https://pcnportal.dccn.nl/).

– Abstract. The sentence "Across all benchmarks, we confirm the advantage (i.e., stronger effect sizes, more accurate classification, and prediction) of using normative modeling features" is bordering the overstatement. I would suggest bringing up more specific aspects of the results (more quantitative statements, if you will).

We have toned down the message in the abstract and clarified the main results.

“Across all benchmarks, we show the advantage of using normative modeling features, with the strongest statistically significant results demonstrated in the group difference testing and classification tasks.”

– Introduction (ll. 63--72): this paragraph feels a bit out of place. I think the intent is to connect the last sentence (69--72), which I agree is relevant. However, I am not convinced by the flow of the paragraph (e.g., not sure the references to Gau et al. (2021) and Levitis et al. (2021) are a good fit here, as opposed to a discussion). I would invite the authors to rethink how this paragraph is carried out.

We considered moving this paragraph to the discussion but have decided that the current placement is meant to show the audience that beyond the applications of normative modeling (clinical and psychology as mentioned in the paragraph immediately before), there is also active technical development of the normative modeling framework and there are many available open science resources to help users get started with using normative modeling. We feel that this message is best conveyed in the introduction.

– Table 1. HCP's sex ratios. I believe it should say 46.6% of males.

We have corrected this mistake.

– Figure 1. I suggest annotating the different tasks within (C) and (D), and updating the caption accordingly.

We have updated Figure 1 with these suggestions.

– Methods (l. 203). For ignorant readers like this reviewer, a reduced mention of why this particular test (Mann-Whitney U-test) was chosen would be appreciated. Perhaps some further details could also be given in the appendix.

We have added the following to the manuscript to explain why the Utest was used:

“The U-test was used to test for group differences in the count data (of extreme deviations) because the distribution of count data is skewed (non-Gaussian) which the Utest is designed to account for.” https://www.statstest.com/mann-whitney-u-test/

– Methods (l. 212). Some elaboration on how the hyperparameters were selected (i.e., kernel and C constant). I have looked into the corresponding jupyter notebook (which was of course, available, thanks!), and it seems the defaults were used. Could this have an impact on the advantages of normative modeling?

Please see our response to reviewer 2 comment 7 above for an explanation of the hyperparameter choice.

– Methods (l. 232) Evaluation. I would suggest (i) moving the metrics within the body of each task previously (i.e.,) the definition of $\theta$ in equations. (1, 2, 3); and (ii) keeping a subsection with ll. 244--254 with perhaps a more detailed section title (e.g., statistical inference --or evaluation-- of model comparisons).

We have re-formatted the methods to match this suggestion.

– Results section: I would suggest the authors provide subsection titles that advance the contents of the subsection. E.g., in line 271, something along the lines of "Normative modeling showcased larger effect sizes in mass univariate group differences."

We have updated the result subsections as follows:

Normative Modeling Shows Larger Effect Sizes in Mass Univariate Group Differences

Normative Modeling Shows Highest Classification Performance Using Cortical Thickness Normative Modeling Shows Modest Performance Improvement in Predicting Cognition

– ll. 290--292. I suggest reflecting this in Figure 4A, showcasing the corresponding results without multiple comparisons correction (which should be clearly indicated) to give further support to the idea that they were along the same lines.

We considered including the null results on the raw data into Figure 4, however, because we are displaying the t-statistic we ultimately decided it would be misleading to show uncorrected t-statistic maps. The images shown in Figure 4A are showing only the models that survived multiple comparison correction. We have made this more apparent in the cortical thickness map by making the regions that did not survive correction be colorless. We also updated the within 4A figure caption to highlight that there are no significant group differences in the raw maps. We have added to the supplementary files (3 and 4) for all models (deviation and raw, structural, and functional) that contain the group difference test statistics including the t-stat, p-value, and FDR corrected p-value for each ROI.

“For full statistics including the corrected and uncorrected p-values and test-statistic of every ROI, see supplementary files 3 and 4.”

– ll. 285--286. "The lack of any group differences in the raw data was initially a puzzling finding…" Can this be interpreted as normative models being more powered in the discussion? This is sideways mentioned when indicating that normative models have a harmonization effect on the individual metrics, more effective than regressing out obvious confounders. A more frontal/intentional incursion into this discussion would be appreciated.

Please see our detailed response to Reviewer 2 Comment 3. We have added this paragraph to the group difference Results section.

“While there have been reported group differences between controls and schizophrenia in cortical thickness and resting state brain networks in the literature, these studies have used different data sets (of varying sample sizes), different preprocessing pipelines and software versions, and different statistical frameworks (Castro et al., 2016; Di Biase et al., 2019; Dwyer et al., 2018; Geisler et al., 2015; Madre et al., 2019; Sui et al., 2015; van Haren et al., 2011). When reviewing the literature of studies on SZ vs. HC group difference testing, we did not find any study that performed univariate t-testing and multiple comparison correction at the ROI-level or network-level, rather most work used statistical tests and multiple comparison correction at the voxel-level or edge-level. Combined with the known patterns of heterogeneity present in schizophrenia disorder (Lv et al., 2020; Wolfers et al., 2018), it is unsurprising that our results differ from past studies.”

– l. 301. The subsection hierarchy seems broken. It is confusing that this particular result is not at the same level as the other (e.g., l. 311).

We have fixed this formatting.

– l. 365. That "from this work" seemed out of place.

We have reordered the phrasing to say:

“Rather, our results from this work (especially in the comparisons of group difference maps to individual difference maps (Figure 4)) show that the outputs of normative modeling can be used to validate, refine, and further understand some of the inconsistencies in previous findings from case-control literature.”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Supplementary file 1. Functional Normative Model Demographics.

    Description: For each included site, we show the sample size (N), age (mean, standard deviation), and sex distribution (Female/Male percent) in the training set (shown in blue) and testing set (shown in green) of the normative models of functional connectivity between large scale resting-state brain networks from the Yeo 17 network atlas.

    elife-85082-supp1.xlsx (12.2KB, xlsx)
    Supplementary file 2. Surface Area Normative Model Demographics.

    Description: For each included site, we show the sample size (N), age (mean, standard deviation), and sex distribution (Female/Male percent) of the normative models of surface area extracted for all regions of interest in the Destrieux Freesurfer atlas.

    elife-85082-supp2.xlsx (11.9KB, xlsx)
    Supplementary file 3. Structural Group Difference Testing Statistics.

    Description: We show for all cortical thickness and subcortical volume from the Destrieux and aseg Freesurfer atlases regions of interest (ROIs from a two-sample t-test between Schizophrenia versus Healthy Controls) the t-statistic (T-stat), False Discovery Rate corrected p-value (FDRcorr_pvalue), and uncorrected p-value (uncorr_pvalue) for both the raw data (shown in green) and the deviation scores (shown in blue).

    elife-85082-supp3.xlsx (19.2KB, xlsx)
    Supplementary file 4. Functional Connectivity Group Difference Testing Statistics.

    Description: We show for all Yeo-17 between network connectivity regions of interest (ROIs) from a two-sample t-test between Schizophrenia versus Healthy Controls the t-statistic (T-stat), False Discovery Rate corrected p-value (FDRcorr_pvalue), and uncorrected p-value (uncorr_pvalue) for both the raw data (shown in green) and the deviation scores (shown in blue).

    elife-85082-supp4.xlsx (15.9KB, xlsx)
    MDAR checklist

    Data Availability Statement

    Pre-trained normative models are available on GitHub (https://github.com/predictive-clinical-neuroscience/braincharts, (copy archived at swh:1:rev:299e126ff053e2353091831a888c3ccd1ca6edeb)) and Google Colab (https://colab.research.google.com/github/predictive-clinical-neuroscience/braincharts/blob/master/scripts/apply_normative_models_yeo17.ipynb). Scripts for running the benchmarking analysis and visualizations are available on GitHub (https://github.com/saigerutherford/evidence_embracing_nm, (copy archived at swh:1:rev:1b4198389e2940dd3d10055164d68d46e0a20750)). An online portal for running models without code is available (https://pcnportal.dccn.nl).


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES