Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

letter

. 2023 Mar 8;615(7951):E8–E12. doi: 10.1038/s41586-023-05746-w

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2023

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

PMC Copyright notice

Fig. 2 — a, Example bootstrapped BWAS of total cognitive ability (green) and null distribution (black) (y axis), as a function of sample size (x axis) from the suggested method of Spisak et al.⁸ (RSFC by partial correlation; prediction by ridge regression) in the HCP dataset (n = 1,200, 1 site, 1 scanner, 60 min RSFC/participant, 76% white). Sample sizes were log₁₀-transformed for visualization. b, Out-of-sample correlation (between true scores and predicted scores) from ridge regression (y axis; code from Spisak et al.⁸) as a function of training sample size (x axis, log₁₀ scaling) for 33 cognitive and mental health phenotypes (Supplementary Information) in the HCP dataset. Each line displays a smoothed fit estimate (through penalized splines in general additive models) for a brain (RSFC (partial correlations, as proposed by Spisak et al.⁸), cortical thickness) phenotype pair (66 total) that has 100 bootstrapped iterations from sample sizes of 25 to 500 (inclusive) in increments of 25 (20 total bins). Sample sizes were log₁₀-transformed (for visualization) before general additive model fitting. c, The same as in b, but in the ABCD dataset (n = 11,874, 21 sites, 3 scanner manufacturers, 20 min RSFC/participant, 56% white) using 32 cognitive and mental health phenotypes at sample sizes of 25, 50, 75 and from 100 to 1,900 (inclusive) in increments of 100 (22 total bins). d, The percentage of brain–phenotype pairs (BWAS) from b and c with significant replication on the basis of the method of Spisak et al.⁸ (Supplementary Information). e, Comparison of our original method in our previous study¹ and the method proposed by Spisak et al.⁸ at the full split-half sample size of HCP (left) and ABCD (right). Out-of-sample correlations (RSFC with total cognitive ability, y axis) for the method used in our previous study¹ (dark green; RSFC by correlation, PCA, SVR) and by Spisak et al.⁸ (light green; RSFC by partial correlation, ridge regression). Repeating the method proposed by Spisak et al.⁸ in ABCD (right) and comparing this to the method used in our previous study¹ results in a very similar out-of-sample r. f, Simulated individual studies (light green circles; n = 1,000 per sample size) and meta-analytic estimates (black dot, ±1 s.d.) using the method of Spisak et al.⁸ (partial correlations in the HCP dataset) for the largest univariate association (left; y axis, bivariate correlation) and multivariate association (right; y axis, out-of-sample correlation) for total cognitive ability versus RSFC, as a function of total sample size (x axis; bivariate correlation for sample sizes of 50, 200 and 1,000, and multivariate sum of train and test samples, each 25, 100 and 500). For univariate approaches, studies of any sample size, when appropriately aggregated to a large total sample size, can correctly estimate the true effect size. However, for multivariate approaches, even when aggregating across 1,000 independent studies, studies with a small sample size produce prediction accuracies that are downwardly biased relative to large sample studies, highlighting the need for large samples in multivariate analyses.