Prediction results on 23andMe validation individuals based on discovery GWAS from 23andMe on EUR, AFR, AMR, EAS, and SAS
The performance of the various methods is evaluated by (A) residual R2 for two continuous traits, heart metabolic disease burden and height, and (B) residual AUC for five binary traits, any CVD, depression, migraine diagnosis, morning person, and SBMN, with LD reference data from the 1000 Genomes Project. The dataset is randomly split into 70%, 20%, and 10% for training GWAS, model tuning (tuning model parameters and training the SL in CT-SLEB and MUSSEL or the linear combination model in weighted LDpred2 and PRS-CSx), and testing (to report residual R2 or AUC values after adjusting for the top five genetic principal components, sex, and age), respectively. All methods were evaluated on the ∼2.0 million SNPs that are available in HapMap3 + MEGA, except for PRS-CSx, which is evaluated based on HapMap 3 SNPs only, as implemented in their software. Ancestry- and trait-specific GWAS sample sizes, number of SNPs included, and validation sample sizes are summarized in Table S14.