Ensembled best subset selection using summary statistics for polygenic risk prediction

Tony Chen; Haoyu Zhang; Rahul Mazumder; Xihong Lin

doi:10.1101/2023.09.25.559307

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Sep 27:2023.09.25.559307. [Version 2] doi: 10.1101/2023.09.25.559307

Ensembled best subset selection using summary statistics for polygenic risk prediction

Tony Chen, Haoyu Zhang, Rahul Mazumder, Xihong Lin

PMCID: PMC10602024 PMID: 37886515

Abstract

Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, yet existing methods face a tradeoff between predictive power and computational efficiency. We introduce ALL-Sum, a fast and scalable PRS method that combines an efficient summary statistic-based L ₀ L ₂ penalized regression algorithm with an ensembling step that aggregates estimates from different tuning parameters for improved prediction performance. In extensive large-scale simulations across a wide range of polygenicity and genome-wide association studies (GWAS) sample sizes, ALL-Sum consistently outperforms popular alternative methods in terms of prediction accuracy, runtime, and memory usage. We analyze 27 published GWAS summary statistics for 11 complex traits from 9 reputable data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen, evaluated using individual-level UKBB data. ALL-Sum achieves the highest accuracy for most traits, particularly for GWAS with large sample sizes. We provide ALL-Sum as a user-friendly command-line software with pre-computed reference data for streamlined user-end analysis.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.

PERMALINK

This is a preprint.

Ensembled best subset selection using summary statistics for polygenic risk prediction

Tony Chen

Haoyu Zhang

Rahul Mazumder

Xihong Lin

Abstract

Full Text Availability

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

This is a preprint.

Ensembled best subset selection using summary statistics for polygenic risk prediction

Tony Chen

Haoyu Zhang

Rahul Mazumder

Xihong Lin

Abstract

Full Text Availability

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases