. 2020 May 28;107(1):46–59. doi: 10.1016/j.ajhg.2020.05.004

Table 2.

Accuracy of Polygenic Prediction in Real Data

Discovery GWAS	Training (UK Biobank)	Validation (UK Biobank)	Method	R²_Nag	5% Tail OR
Breast cancer 2015 (n = ∼120,000)	n = 3,956/3,956	n = 3,957/73,652	P+T	0.021	2.28
			LDPred	0.026	2.42
			PRS-CS	0.030	2.60
			NPS	0.030	2.53
Breast cancer 2017 (n = ∼230,000)			P+T	0.027	2.37
			LDPred	0.026	2.33
			PRS-CS	0.043	2.96
			NPS	0.045	3.01
Inflammatory bowel disease (n = ∼35,000)	n = 2,483/2,483	n = 2,482/157,272	P+T	0.028	3.00
			LDPred	0.027	2.77
			PRS-CS	0.040	3.67
			NPS	0.035	3.60
Type 2 diabetes (n = ∼160,000)	n = 7,298/7,298	n = 7,298/144,020	P+T	0.046	3.04
			LDPred	0.059	3.51
			PRS-CS	0.066	3.99
			NPS	0.065	3.81
Coronary artery disease (n = ∼330,000)	n = 2,000/2,000	n = 773/62,512	P+T	0.063	5.17
			LDPred	0.078	5.65
			PRS-CS	0.075	4.92
			NPS	0.073	5.21

Non-parametric shrinkage (NPS) and PRS-CS outperform both pruning and thresholding (P+T) and LDPred in real data. Both training and validation cohorts were sampled from UK Biobank. The tail odds ratio (OR) stands for the odds ratios of case subjects over control subjects at the 5% tail in polygenic score distribution compared to the rest. For CAD and T2D, all prediction models were trained and validated with the sex covariate to account for the difference of disease prevalence by sex.