Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2024 Apr 10;4(4):100539. doi: 10.1016/j.xgen.2024.100539

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2024 The Authors

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

PMC Copyright notice

Prediction R² of the PRS trained based on GWAS summary data from AoU on non-EUR validation individuals from UKBB

Discovery GWASs from AoU include GWAS on EUR (N_GWAS = 48,229–48,332), AFR (N_GWAS = 21,514–21,550), and Hispanic/Latino (N_GWAS = 15,364–15,413). The validation dataset consists of individuals of AFR origin in UKBB (N = 9,026–9,042). The LD reference data are from either (A) the 1000 Genomes Project (498 EUR, 659 AFR, 347 AMR, 503 EAS, and 487 SAS) or (B) UKBB data (PRS-CSx: default UKBB LD reference data, which overlap with our testing samples including 375,120 EUR, 7,507 AFR, 687 AMR, 2,181 EAS, and 8,412 SAS; all other methods: UKBB tuning samples including 10,000 EUR, 4,585 AFR, 1,010 EAS, and 5,427 SAS). The ancestry of UKBB individuals was determined by a genetic ancestry prediction approach (supplemental information). Due to the low prediction accuracy of genetic component analysis and extremely small validation sample size of UKBB AMR, prediction R² on UKBB AMR is unreliable and thus is not reported here. All methods were evaluated on the ∼2.0 million SNPs that are available in HapMap3 + MEGA, except for PRS-CSx, which is evaluated based on the HapMap 3 SNPs only, as implemented in their software. Ancestry- and trait-specific GWAS sample sizes, number of SNPs included, and validation sample sizes are summarized in Table S11. A random half of the validation individuals is used as the tuning set to tune model parameters as well as train the SL in CT-SLEB and MUSSEL or the linear combination model in weighted LDpred2, PRS-CSx, and weighted MUSS. The other half of the validation set is used as the testing set to report R² values for each ancestry, after adjusting for age, sex, and the top ten genetic principal components. Detailed 95% bootstrap CIs are reported in Table S17. In (B), PRS-CSx and other methods do not have a fair comparison because the UKBB LD reference data provided by the PRS-CSx software (UKBB_PRS-CSx) is much larger than that for other methods, and thus the R² of PRS-CSx may be inflated due to a large overlap between UKBB_PRS-CSx and the UKBB testing sample.