Skip to main content
. Author manuscript; available in PMC: 2021 Sep 7.
Published in final edited form as: Lancet Neurol. 2019 Dec;18(12):1091–1102. doi: 10.1016/S1474-4422(19)30320-5

Table 2.

Summary of genetic predictive model performance.

Study Max P threshold pseudo R2 from PRS* Beta SE P OR, highest quartile PRS 95% CI, highest quartile PRS N SNPs N samples AUC 95% CI (DeLong) Sensitivity Specificity Positive predictive value (PPV) Negative predictive value (NPV) Balanced accuracy
Training dataset: IPDGC - Neurox 1.35E–03 0.029 0.553 0.022 8.99E–135 3.74 3.35 – 4.18 1809 11,243 0.640 0.630 – 0.650 0.569 0.632 0.591 0.611 0.601
Test dataset: HBS 4.00E–02 0.054 0.709 0.072 8.28E–23 6.25 4.26 – 9.28 1805 999 0.692 0.660 – 0.725 0.628 0.686 0.691 0.623 0.657

These are estimates of performance for predictive models including single study estimates, estimates from meta-analyses across studies, as well as a two stage design. Here the best P value threshold column denotes the filtering value for SNP inclusion to achieve the maximal pseudo (Nagelkerke's) R2. The odds ratio (OR) colum is the exponent of the regression coefficient (beta) from logistic regression of the polygenic risk score (PRS) on case status, with the standard error (SE) representing the precision of these estimates. These same metrics are derived across array types and datasets using random-effects meta-analyses. The area under the curve (AUC) is included as the most common metric for predictive model performance. In the table

*

denotes R2 approximation adjusted for an estimated prevalence of 0.5%, equivalent to roughly half of the unadjusted R2 estimates for the PRS. All calculations and reported statistics include only the PRS and no other parameters after adjusting for principal components 1-5, age and sex at variant selection in the NeuroX-dbGaP dataset.