Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2024 Apr 10;4(4):100539. doi: 10.1016/j.xgen.2024.100539

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2024 The Authors

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

PMC Copyright notice

Prediction R² of the PRS trained based on GWAS summary data from GLGC on non-EUR validation individuals from UKBB

Discovery GWASs from GLGC include GWAS on EUR (N_GWAS = 842,660–930,671), AFR or admixed AFR (N_GWAS = 87,760–92,555), Hispanic/Latino (N_GWAS = 46,040–49,582), EAS (N_GWAS = 82,587–146,492), and SAS (N_GWAS = 33,658–34,135). The validation dataset consists of individuals of EUR (N = 17,457–19,030), AFR (N = 7,954–8,598), EAS (N = 1,752–1,921), or SAS (N = 9,385–10,288) origin in UKBB. The LD reference data are from either (A) the 1000 Genomes Project (498 EUR, 659 AFR, 347 AMR, 503 EAS, and 487 SAS), or (B) UKBB data (PRS-CSx: default UKBB LD reference data which overlap with our testing samples including 375,120 EUR, 7,507 AFR, 687 AMR, 2,181 EAS, and 8,412 SAS; all other methods: UKBB tuning samples including 10,000 EUR, 4,585 AFR, 1,010 EAS, and 5,427 SAS). The ancestry of UKBB individuals was determined by a genetic ancestry prediction approach (supplemental information). Due to the low prediction accuracy of genetic component analysis and extremely small validation sample size of UKBB AMR, prediction R² on UKBB AMR is unreliable and thus is not reported here. All methods were evaluated on the ∼2.0 million SNPs that are available in HapMap 3 + MEGA, except for PRS-CSx, which is evaluated based on the HapMap 3 SNPs only, as implemented in their software. Ancestry- and trait-specific GWAS sample sizes, number of SNPs included, and validation sample sizes are summarized in Table S10. A random half of the validation individuals is used as the tuning set to tune model parameters as well as train the SL in CT-SLEB and MUSSEL or the linear combination model in weighted LDpred2, PRS-CSx, and weighted MUSS. The other half of the validation set is used as the testing set to report R² values and the corresponding 95% bootstrap CIs for each ancestry, after adjusting for age, sex, and the top ten genetic principal components. In (B), PRS-CSx and other methods do not have a fair comparison because the UKBB LD reference data provided by the PRS-CSx software (UKBB_PRS-CSx) is much larger than that for other methods, and thus the R² of PRS-CSx PRS may be inflated due to a large overlap between UKBB_PRS-CSx and the UKBB testing sample. Detailed results are reported in Table S17.