Skip to main content
[Preprint]. 2023 Apr 13:2023.04.12.536510. [Version 1] doi: 10.1101/2023.04.12.536510

Figure 5: Prediction R2 on UKBB validation individuals of AFR (N=9,0269,042) origin based on discovery GWAS from AoU on EUR (NGWAS=48,22948,332), AFR (NGWAS=21,51421,550), and Hispanic/Latino (NGWAS=15,36415,413).

Figure 5:

The LD reference data is either (a) 1000 Genomes Project (498 EUR, 659 AFR, 347 AMR, 503 EAS, 487 SAS), or (b) UKBB data (PRS-CSx: default UKBB LD reference data which overlap with our testing samples including 375,120 EUR, 7,507 AFR, 687 AMR, 2,181 EAS, and 8,412 SAS; all other methods: UKBB tuning samples including 10,000 EUR, 4,585 AFR, 1,010 EAS, and 5,427 SAS). The ancestry of UKBB individuals were determined by a genetic ancestry prediction approach (Supplementary Notes). Due to the low prediction accuracy of genetic component analysis and extremely small validation sample size of UKBB AMR, prediction R2 on UKBB AMR is unreliable and thus is not reported here. All methods were evaluated on the ~2.0 million SNPs that are available in HapMap3 + MEGA, except for PRS-CSx which is evaluated based on the HapMap 3 SNPs only, as implemented in their software. Ancestry- and trait-specific sample sizes of GWAS, number of SNPs included, and validation sample sizes are summarized in Supplementary Table 5.1. A random half of the validation individuals is used as the tuning set to tune model parameters, as well as train the SL in CT-SLEB and ME-Bayes SL or the linear combination model in weighted LDpred2, PRS-CSx, and weighted ME-Bayes. The other half of the validation set is used as the testing set to report R2 values for each ancestry, after adjusting for age, sex, and the top 10 genetic principal components. In (b), PRS-CSx and other methods do not have a fair comparison because the UKBB LD reference data provided by the PRS-CSx software (UKBBPRS-CSx) is much larger than that for other methods, and thus the R2 of PRS-CSx may be inflated due to a big overlap between UKBBPRS-CSx and the UKBB testing sample.