Skip to main content
. 2020 May 28;107(1):46–59. doi: 10.1016/j.ajhg.2020.05.004

Figure 1.

Figure 1

Overview of Non-Parametric Shrinkage (NPS)

(A) For unlinked markers, NPS partitions SNPs into K subgroups splitting the GWAS effect sizes (βˆj) at cut-offs of b0,b1,,bK. Partitioned risk scores Gik are calculated for each partition k and individual i using an independent genotype-level training cohort. The per-partition shrinkage weights ωk are determined by the separation of Gik between training case subjects and control subjects. Estimating the per-partition shrinkage weights is a far easier problem than estimating per-SNP effects. The training sample size is small but still larger than the number of partitions, whereas for per-SNP effects, the GWAS sample size is considerably smaller than the number of markers in the genome. This procedure “shrinks” the estimated effect sizes not relying on any specific assumption about the distribution of true effect sizes.

(B) For markers in LD, genotypes and estimated effects are decorrelated first by a linear projection P in non-overlapping windows of ∼2.5 Mb in length, and then NPS is applied to the data. The size of black dots indicates genotype frequencies in population. Before projection, genotypes at SNP 1 and 2 are correlated due to LD (D), and thus sampling errors of estimated effects (βˆj|βj) are also correlated between adjacent SNPs. The projection P neutralizes both correlation structures. The axes of projection are marked by red dashed lines. βj denotes the true genetic effect at SNP j. Ng is the sample size of GWAS cohort.