Prediction R2 of the PRS trained based on GWAS summary data from AoU on non-EUR validation individuals from UKBB
Discovery GWASs from AoU include GWAS on EUR (NGWAS = 48,229–48,332), AFR (NGWAS = 21,514–21,550), and Hispanic/Latino (NGWAS = 15,364–15,413). The validation dataset consists of individuals of AFR origin in UKBB (N = 9,026–9,042). The LD reference data are from either (A) the 1000 Genomes Project (498 EUR, 659 AFR, 347 AMR, 503 EAS, and 487 SAS) or (B) UKBB data (PRS-CSx: default UKBB LD reference data, which overlap with our testing samples including 375,120 EUR, 7,507 AFR, 687 AMR, 2,181 EAS, and 8,412 SAS; all other methods: UKBB tuning samples including 10,000 EUR, 4,585 AFR, 1,010 EAS, and 5,427 SAS). The ancestry of UKBB individuals was determined by a genetic ancestry prediction approach (supplemental information). Due to the low prediction accuracy of genetic component analysis and extremely small validation sample size of UKBB AMR, prediction R2 on UKBB AMR is unreliable and thus is not reported here. All methods were evaluated on the ∼2.0 million SNPs that are available in HapMap3 + MEGA, except for PRS-CSx, which is evaluated based on the HapMap 3 SNPs only, as implemented in their software. Ancestry- and trait-specific GWAS sample sizes, number of SNPs included, and validation sample sizes are summarized in Table S11. A random half of the validation individuals is used as the tuning set to tune model parameters as well as train the SL in CT-SLEB and MUSSEL or the linear combination model in weighted LDpred2, PRS-CSx, and weighted MUSS. The other half of the validation set is used as the testing set to report R2 values for each ancestry, after adjusting for age, sex, and the top ten genetic principal components. Detailed 95% bootstrap CIs are reported in Table S17. In (B), PRS-CSx and other methods do not have a fair comparison because the UKBB LD reference data provided by the PRS-CSx software (UKBBPRS-CSx) is much larger than that for other methods, and thus the R2 of PRS-CSx may be inflated due to a large overlap between UKBBPRS-CSx and the UKBB testing sample.