Table 2.
Summary of genetic predictive model performance.
Study | Max P threshold | pseudo R2 from PRS* | Beta | SE | P | OR, highest quartile PRS | 95% CI, highest quartile PRS | N SNPs | N samples | AUC | 95% CI (DeLong) | Sensitivity | Specificity | Positive predictive value (PPV) | Negative predictive value (NPV) | Balanced accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Training dataset: IPDGC - Neurox | 1.35E–03 | 0.029 | 0.553 | 0.022 | 8.99E–135 | 3.74 | 3.35 – 4.18 | 1809 | 11,243 | 0.640 | 0.630 – 0.650 | 0.569 | 0.632 | 0.591 | 0.611 | 0.601 |
Test dataset: HBS | 4.00E–02 | 0.054 | 0.709 | 0.072 | 8.28E–23 | 6.25 | 4.26 – 9.28 | 1805 | 999 | 0.692 | 0.660 – 0.725 | 0.628 | 0.686 | 0.691 | 0.623 | 0.657 |
These are estimates of performance for predictive models including single study estimates, estimates from meta-analyses across studies, as well as a two stage design. Here the best P value threshold column denotes the filtering value for SNP inclusion to achieve the maximal pseudo (Nagelkerke's) R2. The odds ratio (OR) colum is the exponent of the regression coefficient (beta) from logistic regression of the polygenic risk score (PRS) on case status, with the standard error (SE) representing the precision of these estimates. These same metrics are derived across array types and datasets using random-effects meta-analyses. The area under the curve (AUC) is included as the most common metric for predictive model performance. In the table
denotes R2 approximation adjusted for an estimated prevalence of 0.5%, equivalent to roughly half of the unadjusted R2 estimates for the PRS. All calculations and reported statistics include only the PRS and no other parameters after adjusting for principal components 1-5, age and sex at variant selection in the NeuroX-dbGaP dataset.