. Author manuscript; available in PMC: 2021 Sep 7.

Published in final edited form as: Lancet Neurol. 2019 Dec;18(12):1091–1102. doi: 10.1016/S1474-4422(19)30320-5

Table 2.

Summary of genetic predictive model performance.

Study	Max P threshold	pseudo R2 from PRS^*	Beta	SE	P	OR, highest quartile PRS	95% CI, highest quartile PRS	N SNPs	N samples	AUC	95% CI (DeLong)	Sensitivity	Specificity	Positive predictive value (PPV)	Negative predictive value (NPV)	Balanced accuracy
Training dataset: IPDGC - Neurox	1.35E–03	0.029	0.553	0.022	8.99E–135	3.74	3.35 – 4.18	1809	11,243	0.640	0.630 – 0.650	0.569	0.632	0.591	0.611	0.601
Test dataset: HBS	4.00E–02	0.054	0.709	0.072	8.28E–23	6.25	4.26 – 9.28	1805	999	0.692	0.660 – 0.725	0.628	0.686	0.691	0.623	0.657

These are estimates of performance for predictive models including single study estimates, estimates from meta-analyses across studies, as well as a two stage design. Here the best P value threshold column denotes the filtering value for SNP inclusion to achieve the maximal pseudo (Nagelkerke's) R2. The odds ratio (OR) colum is the exponent of the regression coefficient (beta) from logistic regression of the polygenic risk score (PRS) on case status, with the standard error (SE) representing the precision of these estimates. These same metrics are derived across array types and datasets using random-effects meta-analyses. The area under the curve (AUC) is included as the most common metric for predictive model performance. In the table

denotes R2 approximation adjusted for an estimated prevalence of 0.5%, equivalent to roughly half of the unadjusted R2 estimates for the PRS. All calculations and reported statistics include only the PRS and no other parameters after adjusting for principal components 1-5, age and sex at variant selection in the NeuroX-dbGaP dataset.