Skip to main content
Human Genetics and Genomics Advances logoLink to Human Genetics and Genomics Advances
. 2025 Mar 25;6(3):100427. doi: 10.1016/j.xhgg.2025.100427

Enhancing polygenic scores for cardiometabolic traits through tissue- and cell-type-specific functional annotations

Kristjan Norland 1, Daniel J Schaid 2, Iftikhar J Kullo 1,3,4,5,
PMCID: PMC12059674  PMID: 40143549

Summary

Functional genomic annotations can improve polygenic scores (PGS) within and between genetic ancestry groups. While general annotations are commonly used in PGS development, tissue- and cell-type-specific annotations derived from open chromatin and gene expression experiments may further enhance PGS for cardiometabolic traits. We developed PGS for 14 cardiometabolic traits in the UK Biobank using SBayesRC. We integrated GWAS summary statistics from FinnGen and GLGC with three annotation sources: (1) Baseline-LD model version 2.2 (general annotations), (2) cell-type-specific snATAC-seq peaks, and (3) tissue-specific eQTLs/sQTLs. We created PGS using two EUR LD reference panels (1.2 million [1.2M] HapMap3 variants and 7M imputed variants). Tissue- and cell-type-specific annotations showed stronger heritability enrichment than Baseline-LD annotations on average, particularly coronary snATAC-seq peaks and fine-mapped eQTLs. Without annotations, HapMap3 and 7M variant PGS performed similarly. However, with all annotations, 7M variant PGS outperformed HapMap3 variant PGS (8% average increase in relative performance in EUR). Compared to using no annotations, modeling Baseline-LD annotations improved performance by 5% for HapMap3 and 11% for 7M variant PGS, while modeling all annotations yielded improvements of 5% and 13%, respectively. Although annotations provided greater relative improvement for cross-ancestry prediction, they did not decrease the disparity in PGS performance between genetic ancestry groups. In conclusion, functional annotations improved PGS for cardiometabolic traits. Despite strong heritability enrichment, tissue- and cell-type-specific snATAC-seq and eQTL annotations provided marginal performance gains beyond general genomic annotations.

Keywords: polygenic scores, cardiometabolic traits, functional annotations, UK Biobank


Functional annotations improve polygenic scores (PGSs) for cardiometabolic traits. Tissue- and cell-type-specific annotations such as snATAC-seq peaks and eQTLs show heritability enrichment but offer modest PGS performance gains when added to general annotations. These findings provide insight into functional annotation as a means of enhancing PGS accuracy across populations.

Introduction

Genetic variants associated with complex traits in genome-wide association studies (GWASs) are enriched for functional genomic annotations.1,2,3 Prioritizing variants using trait-relevant functional annotations can enhance polygenic score (PGS) performance, likely by reducing the confounding effects of linkage disequilibrium (LD). This approach has proven particularly effective for cross-ancestry prediction as LD patterns diverge between continental genetic ancestry groups,4,5 but heritability enrichment of functional annotations is consistent across diverse populations.4,6,7,8

Clinical risk algorithms for cardiometabolic traits, including coronary heart disease (CHD), atrial fibrillation, and type 2 diabetes (T2D), show improved accuracy when incorporating PGS.9,10 However, substantial gaps remain in PGS performance and cross-ancestry portability, especially for traits lacking GWAS data from diverse populations.9 As PGS approach clinical implementation, it is crucial to optimize their performance and portability across diverse populations. While conducting GWASs in diverse populations is the most direct solution, leveraging functional annotations offers a complementary approach that may partially mitigate PGS performance disparities in populations underrepresented in GWASs. Specifically, incorporating biologically relevant functional annotations for cardiometabolic traits could enhance PGS performance when integrated with existing GWAS data.

The Baseline-LD model, a comprehensive database of functional genomic annotations derived from Roadmap and ENCODE data,11 is commonly used in PGS development. It includes, for example, annotations related to histone modifications, chromatin states, and minor allele frequency (MAF)/LD features. However, it lacks more recent tissue- and cell-type-specific annotations from RNA sequencing (RNA-seq) or single-nucleus assay for transposase accessible chromatin (ATAC)-seq (snATAC-seq) experiments. Such annotations can be derived from tissue-specific expression quantitative trait locus (eQTL) datasets like GTEx12 or STARNET.13 Open chromatin peaks across a broad range of tissues and cell types are available in published chromatin accessibility atlases.14 For cardiometabolic traits specifically, datasets of cell-type-specific snATAC-seq peaks in atherosclerotic tissue provide particularly relevant annotations.15,16 While previous studies have demonstrated that tissue-/cell-type-specific annotations can improve PGS for complex diseases,4,15 it remains unclear whether they confer additional improvements when modeled jointly with more general genomic annotations.

The selection of LD reference panels is important when developing PGS with Bayesian methods.17 The HapMap3 reference panel, comprising 1.2 million (1.2M) SNPs, is widely used as its variants have good coverage and can be reliably imputed. However, this panel may lack sufficient density for effectively modeling functional annotations. Causal variants absent from the HapMap3 panel may have different annotations than their tagging variants that are included in the panel, potentially leading to the prioritization of irrelevant annotations. A denser reference panel could yield more accurate results by better capturing the functional characteristics of true causal variants.

In this study, we developed functionally informed PGS for 14 cardiometabolic traits with SBayesRC, integrating both general genomic annotations and recent tissue-/cell-type-specific annotations of open chromatin and gene expression. We evaluated PGS performance across different genetic ancestry groups in the UK Biobank (UKB), addressing both predictive accuracy and cross-ancestry portability.

Material and methods

Datasets

We used 487,253 UKB participants in our analyses.18 We estimated genetic ancestry using ADMIXTURE with reference populations from the 1000 Genomes Project.19 Based on genetic similarity patterns, we classified participants into four groups of genetic ancestry: African (AFR; n = 9,000), East Asian (EAS; n = 2,804), European (EUR; n = 465,523), and South Asian (SAS; n = 9,760). These classifications represent genetic similarity to reference populations rather than self-identified race/ethnicity or geographic origin.20 We analyzed these genetic similarity groups separately because differences in LD patterns and allele frequency distributions can affect PGS performance and portability.21

We studied 14 cardiometabolic traits and diseases: CHD, peripheral artery disease (PAD), abdominal aortic aneurysm (AAA), ischemic stroke, hypertension, T2D, atrial fibrillation, heart failure, calcific aortic valve stenosis (CAVS), body mass index (BMI), and lipids (total cholesterol, high-density lipoprotein-cholesterol, low-density lipoprotein-cholesterol, and triglycerides). Disease definitions are provided in Table S1. For continuous traits, we used the earliest measurement (first assessment). To develop PGS, we used GWAS summary statistics from FinnGen for binary traits and BMI and multi-ancestry meta-analysis GWAS summary statistics from the Global Lipids Genetics Consortium22 for lipid traits (excluding the UKB samples). We conducted this research under UKB application ID 79990.

Functional genomic annotations

We used various functional genomic annotation datasets (Tables 1 and S2). We used the Baseline-LD model version 2.2 (Baseline-LD) annotation database from the SBayesRC authors, which includes 96 general annotations related to regional, histone, chromatin, and MAF/LD features.11

Table 1.

Functional genomic annotation datasets used for PGS development

Dataset Description Annotations Reference
Baseline-LD version 2.2 comprehensive set of general genomic annotations 96 annotations, including coding variants, promoters, histone marks, and conserved elements Gazal et al., 201711
snATAC-seq (Örd et al.) cell-type-specific chromatin accessibility in atherosclerotic tissue 9 cell-type-specific peak annotations Örd et al., 202123
snATAC-seq (Turner et al.) cell-type-specific chromatin accessibility in CAD 12 cell-type-specific peak annotations Turner et al., 202216
Human Enhancer Atlas comprehensive atlas of tissue- and cell type-specific regulatory elements 47 cell-type-specific peak annotations relevant to cardiometabolic traits Zhang et al., 202114
GTEx version 8 tissue-specific eQTLs/sQTLs eQTLs and sQTLs from 15 tissues relevant to cardiometabolic traits GTEx Consortium, 202024
STARNET CAD-specific eQTLs eQTLs from 8 tissues Franzén et al., 202113

CAD, coronary artery disease; eQTL, expression quantitative trait locus; PGS, polygenic score; sQTL, splicing quantitative trait locus.

For chromatin accessibility data, we used an atherosclerotic lesion snATAC-seq dataset from Örd et al.23 We called cell-type-specific peaks by integrating these data with a published human coronary artery single-cell RNA-seq dataset25 using the R packages Signac and Seurat following standard protocols. We also used coronary artery snATAC-seq peaks from Turner et al. (“reproducible peaks” generated with the R package ArchR).16 Additionally, we used 47 annotations that we considered the most relevant for cardiometabolic traits from the Human Enhancer Atlas, a broad atlas of snATAC-seq peaks from 222 cell types from 30 adult and 15 fetal tissues (Table S2).14 To create binary annotations from snATAC-seq bed files, we converted all files to hg19 using snp_modifyBuild from the bigsnpr R package and created annotations using make_annot.py from ldsc.

For eQTLs, we used GTEx version 8 single-tissue cis-eQTL data from 15 cardiometabolic-relevant tissues (Table S2),24 including both significant variant-gene associations and the CAVIAR fine-mapping results. We also used significant variant-gene splicing QTLs (sQTLs) in the same tissues. Furthermore, we included STARNET eQTLs from eight CHD-related tissues (false discovery rate <0.1; Table S2).13 For each QTL dataset, we annotated variants associated with the expression of at least one gene as “1” and others as “0”.

Statistical methods

We used SBayesRC version 0.2.3 to develop all PGS.26 SBayesRC jointly models GWAS summary statistics and functional annotations, with SBayesR (i.e., SBayesRC without annotations) performing similarly to other Bayesian PGS methods, such as polygenic prediction via Bayesian regression and continuous shrinkage.17 We ran SBayesRC with default parameters using a docker image provided by the authors. We developed PGS using both the 1.2M variant HapMap3 and the 7M imputed variant EUR LD reference panels provided by the authors. For all traits, we created different PGS from seven annotation sets: (1) none (standard PGS), (2) only snATAC-seq, (3) only eQTL/sQTL, (4) Baseline-LD, (5) Baseline-LD + snATAC-seq, (6) Baseline-LD + eQTL/sQTL, and (7) all annotations. In total, we developed 196 PGS (14 traits × 7 annotation sets × 2 LD reference panels). We computed PGS with plink2, restricted to sequence variants with MAF ≥1% and imputation information ≥0.6 (Figure S1).

For measuring correlations between annotations, we used the Pearson correlation coefficient, except when comparing continuous and binary annotations, where we used biserial correlation. We used per-SNP heritability estimates calculated from the GWAS training data by SBayesRC during PGS development. For a binary annotation c, the per-SNP heritability enrichment was defined as σc2/mcσg2/m, where σg2 represents SNP heritability, σc2 the SNP heritability explained by variants in category c, m the total variants count, and mc the number of variants in annotation c.

For PGS performance evaluation, we fitted linear and logistic regression models for continuous and binary traits, respectively. All models included age, sex, and 10 genetic principal components as covariates. We standardized all PGS to have a zero mean and unit variance within each genetic ancestry group. We applied a rank-based inverse normal transformation to the quantitative traits. We reported regression coefficient per 1-SD increase in PGS and R2 for all traits (liability/Lee R2 for binary traits), and area under the receiver operating curve for binary traits. When computing R2 on the liability scale, we used prevalence from the full UKB dataset. We evaluated PGS performance for binary traits with ≥50 cases within each non-EUR group (Table S3). We defined RPGS2 (added R2) as the difference between R2 from a full model (including PGS and covariates) and a covariate-only model. We calculated relative PGS performance as RPGS22/RPGS121. When exploring the relative performance of standard versus functionally informed PGS in non-EUR groups, we restricted analyses to traits where the standard PGS had added R2 ≥0.005. To compare the difference in performance between two PGS, we used the pgsmetrics R package with 1,000 bootstrap replications.

Results

We compiled various genomic functional annotations relevant to cardiometabolic traits: general annotations (Baseline-LD), cell-type-specific snATAC-seq peaks, and tissue-specific eQTL/sQTL annotations (Table 1). Correlation analysis showed that annotations of similar types (e.g., eQTLs, snATAC-seq peaks) were moderately correlated (Figure S3). The Baseline-LD annotations were generally weakly correlated with the tissue- and cell-type-specific annotations; the strongest correlations were for transcription start sites (TSS_Hoffman, r = 0.39), background selection (Backgrd_Selection_Stats, r = −0.37), and promoters (Human_Promoter_Villar, r = 0.33).

We explored which annotations showed the greatest per-SNP heritability enrichment, as this metric strongly correlates with prediction improvement.26 Functional annotations were more enriched with the 7M reference panel compared to the HapMap3 panel (Figures S3 and S4). GTEx fine-mapped eQTL annotations had the highest mean enrichment across cardiometabolic traits: 5.4 (7M panel) versus 3.7 (HapMap3 panel), followed by snATAC-seq peaks from Turner: 3.9 (7M panel) versus 2.8 (HapMap3 panel) (Figure 1). The Baseline-LD annotations showed lower mean enrichment of 1.9 (7M panel) and 1.6 (HapMap3 panel), although some of its annotations ranked among the top 20 most enriched annotations averaged across traits: non-synonymous variants, human or ancient enhancers/promoters, and conserved variants, consistent with previous reports (Figures 1 and S5). Notably, fine-mapped eQTLs in arteries, liver, pancreas, skeletal muscle, and snATAC-seq peaks in pericytes also ranked highly in enrichment.

Figure 1.

Figure 1

Heritability enrichment patterns across functional annotations

(A) Annotations with the highest mean per-SNP heritability enrichment across all cardiometabolic traits, comparing results from 7M (orange) and HapMap3 (blue) variant reference panels. Error bars represent 95% CIs for the mean enrichment.

(B) Distribution of heritability enrichment across all traits, grouped by annotation source (y axis) and reference panel (color).

We observed trait-dependent variability in per-SNP heritability enrichment. For CHD, the top annotations were fine-mapped eQTLs in coronary artery (enrichment = 12.1), aorta (enrichment = 11.3), and tibial artery (enrichment = 8.7) tissue, as well as snATAC-seq peaks in relevant cell types, including fibroblasts (enrichment = 7), smooth muscle cells (enrichment = 6.9), and endothelial cells (enrichment = 5.6) (Figure 2). Total cholesterol showed more enrichment for fine-mapped eQTLs, particularly in the liver (enrichment = 48.9) and pancreas (enrichment = 23.8), compared to snATAC-seq peaks. Many coronary snATAC-seq annotations were enriched for traits such as AAA, CAVS, and PAD. For atrial fibrillation, hypertension, and T2D, fine-mapped eQTLs in relevant tissues (atrial appendage, adrenal gland, pituitary gland) ranked among the top enriched annotations (Figure S5).

Figure 2.

Figure 2

Tissue-specific patterns of heritability enrichment for CHD and total cholesterol

Top annotations ranked by per-SNP heritability enrichment for coronary heart disease (CHD, A) and total cholesterol (B). We ordered the annotations by their enrichment values in the 7M reference panel (orange). Error bars represent 95% CIs for the enrichment value.

PGS developed using the HapMap3 and 7M reference panels performed similarly without annotations (average relative performance in EUR, 0%; range, −10% to +11%). However, when modeling all annotations, the 7M variant PGS outperformed HapMap3 variant PGS (average relative performance, 8%; range, 0%–21%, Table S4). The greatest relative improvements were for CAVS (21%), PAD (20%), and heart failure (14%). Notably, modeling annotations improved 7M variant PGS more than did HapMap3 variant PGS (Figure 3). Compared to using no annotations, Baseline-LD annotations increased average relative performance by 5% for HapMap3 variant PGS and 11% for 7M variant PGS (Table S5). Further inclusion of snATAC-seq and eQTL/sQTL annotations additionally improved the 7M variant PGS (13% average increase relative to no annotations) but not the HapMap3 variant PGS. Excluding Baseline-LD annotations consistently resulted in reduced PGS performance. For subsequent analyses, we focused on 7M variant PGS due to their superior performance and greater sensitivity to functional genomic annotations.

Figure 3.

Figure 3

Performance of PGS across reference panels and annotation sets in EUR

(A) Comparison of added R2 between HapMap3 variant PGS (x axis) and 7M variant PGS (y axis) across different annotation sets. The dashed line represents equal performance.

(B) Comparison of added R2 between 7M variant standard PGS (x axis) and functionally informed PGS (y axis).

(C) Distribution of added R2 for different annotation sets, stratified by reference (HapMap3 left, 7M right).

We observed greater improvements in functionally informed PGS relative to standard PGS in non-EUR groups (cross-ancestry prediction). In AFR, modeling Baseline-LD annotations increased PGS performance by 21% on average, while incorporating all annotations improved performance by 36% (relative to standard PGS). In EAS and SAS, Baseline-LD annotations improved performance by 23% and 14% on average, respectively. Additional annotations beyond Baseline-LD did not yield further improvements in EAS and SAS. Despite these relative improvements, the baseline performance of standard PGS remained lower in non-EUR compared to EUR, and modeling annotations did not reduce the absolute performance disparity across genetic ancestry groups (Figures S6 and S7).

Functionally informed PGS demonstrated larger effects per 1-SD increase in PGS compared to standard PGS for most traits (Figure 4). For example, including Baseline-LD annotations improved PGS for CHD (odds ratio [OR] = 1.71, 95% confidence interval [CI] 1.68–1.73 versus OR = 1.61, 95% CI 1.59–1.63) and CHD risk factors, including hypertension (OR = 1.60, 95% CI 1.59–1.61 versus OR = 1.56, 95% CI, 1.55–1.57) and T2D (OR = 1.79, 95% CI 1.77–1.82 versus OR = 1.76, 95% CI 1.74–1.78), as well as related traits such as atrial fibrillation (OR = 1.68, 95% CI 1.66–1.70 versus OR = 1.62, 95% CI 1.60–1.63). We observed similar improvements for other atherosclerotic cardiovascular diseases (ASCVD), including PAD and AAA. Adding snATAC-seq and eQTL annotations to Baseline-LD annotations yielded comparable estimates for most traits (Figure 4). We observed similar association patterns in non-EUR groups, albeit with wider confidence intervals (Figures S6 and S8).

Figure 4.

Figure 4

Effect sizes of PGS with different annotation sets in EUR

Regression estimates per 1-SD increase in PGS ordered by the magnitude of effect for the standard PGS (without annotations). Colors indicate the annotation set used for PGS development. We standardized all PGS to have a zero mean and unit variance. We applied a rank-based inverse normal transformation to the quantitative traits. Error bars represent 95% CIs.

We examined which annotations produced the greatest relative increase in R2 across genetic ancestry groups compared to standard PGS. Certain traits, such as atrial fibrillation, CHD, T2D, and total cholesterol, showed more substantial improvements across all genetic ancestry groups, although the magnitude varied between groups (Figures 5 and S9). Including at least the Baseline-LD annotations consistently yielded the best performance for most traits, while using only eQTL/sQTL annotations produced the smallest improvements.

Figure 5.

Figure 5

Relative performance improvements of functionally informed PGS across genetic ancestry groups

Heatmaps showing the relative (Rel.) improvement in added R2 of functionally informed PGS relative to standard PGS across genetic ancestry groups for 7M variant PGS. We included only traits where standard PGS achieve added R2 ≥ 0.005. Rows represent different annotation combinations used in PGS development. AFR, African genetic ancestry; EAS, East Asian genetic ancestry; EUR, European genetic ancestry; SAS, South Asian genetic ancestry.

Discussion

In this study, we developed functionally informed PGS for 14 cardiometabolic traits using SBayesRC with two EUR LD reference panels (1.2M HapMap3 and 7M imputed variants) and evaluated them in the UKB. We integrated recent tissue- and cell-type-specific annotations (snATAC-seq peaks and eQTLs/sQTLs) alongside general genomic annotations from the Baseline-LD dataset. We found that fine-mapped eQTLs and coronary snATAC-seq peak annotations showed higher per-SNP heritability enrichment than the Baseline-LD annotations on average, particularly when using the 7M variant LD reference panel. While HapMap3 and 7M variant PGS performed comparably without annotations, the 7M variant PGS showed better performance when all annotations were incorporated (8% average increase in relative performance in EUR). Modeling tissue- and cell-type-specific annotations provided minimal improvements over Baseline-LD annotations alone (13% versus 11% increase in relative performance compared to standard PGS, respectively). However, for cross-ancestry prediction, these specific annotations yielded more improvements, particularly in AFR (36% versus 21% increase in relative performance compared to standard PGS, respectively).

Several previous studies have shown that accounting for functional genomic annotations can improve PGS performance.5,27,28 The commonly used Baseline-LD annotation dataset provides a foundation of general annotations but lacks tissue- or cell-type-specific information.11 Amariuta et al. demonstrated that cell-type-specific annotations of predicted transcription factor binding capture high proportions of complex disease heritability and enhance cross-ancestry portability of PGS.4 Similarly, coronary artery cell-type-specific snATAC-seq peaks have been shown to improve standard PGS for CHD.15 Our study extends this work by employing an existing Bayesian method to develop PGS for cardiometabolic traits, jointly modeling both general and tissue-/cell-type-specific annotations. Among Baseline-LD annotations, non-synonymous variants and ancient promoters/enhancers showed the highest per-SNP heritability enrichment, consistent with previous studies.2,26 Notably, annotations were more enriched for per-SNP heritability when we used a denser 7M variant LD reference panel than a 1.2M HapMap3 variant reference panel commonly used in PGS development. We observed high enrichment for fine-mapped eQTL annotations in cardiometabolic-relevant tissues (arteries, liver, and skeletal muscle) and for coronary snATAC-seq peaks in key cell types for ASCVD (fibroblasts, endothelial, and smooth muscle cells).

Despite high heritability enrichments, adding cell-type-specific and tissue-specific annotations to the Baseline-LD annotations resulted in minimal improvements in PGS performance. The Baseline-LD dataset already includes general non-cell-type-/tissue-specific annotations of enhancers, promoters, and DNase I hypersensitive site peaks, as well as one eQTL annotation (GTEx_eQTL_MaxCPP), although these annotations were not strongly correlated with our additional annotations (correlation ranging from −0.4 to 0.4). However, biologically relevant functional annotations may still offer value in PGS development. Tissue- and cell-type-specific annotations can highlight disease-relevant tissues and cell types via heritability/predictability enrichment. Although not explored in our study, tissue- or cell-type-specific “pathway PGS” could aid in the interpretation of disease subtypes or risk heterogeneity, even when these approaches do not enhance overall prediction performance.29,30

Incorporating functional annotations into PGS development can improve PGS portability by prioritizing likely causal variants and reducing population-specific LD confounding.4 In our study, we observed greater relative improvements in non-EUR groups when modeling functional annotations. However, because we used EUR GWAS training data, the baseline PGS performance was substantially lower in these groups, and modeling functional annotations did not meaningfully reduce performance disparities across genetic ancestry groups. This observation aligns with recent studies on PGS portability. Hu et al. demonstrated that differences in local LD and allele frequencies, rather than differences in causal effects, are the main driver of low PGS portability in the UKB.31 Similarly, using gene expression data from AFR and EUR individuals, Saitou et al. found that allele frequency differences of causal variants significantly impact prediction portability, even when controlling for LD.32 These findings suggest that while functional annotations can improve cross-ancestry PGS prediction by prioritizing likely causal variants, they cannot fully solve the portability problem. Population-specific differences in allele frequencies and LD patterns between EUR and other populations, particularly AFR, continue to limit PGS transferability, even when causal variants are correctly identified. If GWAS results from diverse populations are available, then alternative approaches that integrate diverse GWAS data, such as PROSPER,33 represent a more promising strategy for creating portable PGS across populations.

Our study has several limitations. First, we focused on cell-type-specific open chromatin peaks (snATAC-seq) and tissue-specific eQTLs/sQTLs. While we included numerous annotations relevant to cardiometabolic traits, additional recent cell-type-specific epigenetic annotations34,35 and other QTLs, such as protein QTLs,36 could further improve PGS performance. Second, we created binary annotations of snATAC-seq peaks and eQTLs; more sophisticated integration methods may confer additional improvements. Third, smaller sample sizes in non-EUR groups limited some analyses, particularly for traits with low prevalence. The EAS group (≈3,000 people) is smaller than the AFR and SAS groups (≈9,000–10,000 people) in the UKB, and we restricted our analyses to binary traits with ≥50 cases. Fourth, we focused specifically on cardiometabolic traits, so the generalizability of our results to other trait categories requires further investigation. Finally, for most traits, we used GWAS data from a single EUR cohort (FinnGen) as training data for PGS.

In conclusion, functional genomic annotations improved PGS for cardiometabolic traits, including CHD and lipid levels. Cell-type-specific snATAC-seq and tissue-specific eQTL annotations showed high per-SNP heritability enrichment for cardiometabolic traits but provided only marginal improvements in PGS performance when added to more general functional annotations.

Data and code availability

Access to the UKB is available through an application: https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access. The corresponding author may provide the custom code essential to the conclusions upon reasonable request.

Acknowledgments

This study was funded by grant U01HG011710 from the National Human Genome Research Institute.

Declaration of interests

The authors declare no competing interests.

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.xhgg.2025.100427.

Web resources

Supplemental information

Document S1. Figures S1–S9
mmc1.pdf (618.5KB, pdf)
Data S1. Tables S1–S5
mmc2.xlsx (88KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (2.5MB, pdf)

References

  • 1.Schaid D.J., Chen W., Larson N.B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 2018;19:491–504. doi: 10.1038/s41576-018-0016-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.-R., Anttila V., Xu H., Zang C., Farh K., et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Cano-Gamez E., Trynka G. From GWAS to Function: Using Functional Genomics to Identify the Mechanisms Underlying Complex Diseases. Front. Genet. 2020;11:424. doi: 10.3389/fgene.2020.00424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Amariuta T., Ishigaki K., Sugishita H., Ohta T., Koido M., Dey K.K., Matsuda K., Murakami Y., Price A.L., Kawakami E., et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 2020;52:1346–1354. doi: 10.1038/s41588-020-00740-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Weissbrod O., Kanai M., Shi H., Gazal S., Peyrot W.J., Khera A.V., Okada Y., Biobank Japan Project. Martin A.R., Finucane H.K., Price A.L. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat. Genet. 2022;54:450–458. doi: 10.1038/s41588-022-01036-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kichaev G., Pasaniuc B. Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies. Am. J. Hum. Genet. 2015;97:260–271. doi: 10.1016/j.ajhg.2015.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kanai M., Akiyama M., Takahashi A., Matoba N., Momozawa Y., Ikeda M., Iwata N., Ikegawa S., Hirata M., Matsuda K., et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 2018;50:390–400. doi: 10.1038/s41588-018-0047-6. [DOI] [PubMed] [Google Scholar]
  • 8.Luo Y., Li X., Wang X., Gazal S., Mercader J.M., Auton A., 23 and Me Research Team. SIGMA Type 2 Diabetes Consortium. Neale B.M., Florez J.C., et al. Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations. Hum. Mol. Genet. 2021;30:1521–1534. doi: 10.1093/hmg/ddab130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.O’Sullivan J.W., Raghavan S., Marquez-Luna C., Luzum J.A., Damrauer S.M., Ashley E.A., O’Donnell C.J., Willer C.J., Natarajan P., American Heart Association Council on Genomic and Precision Medicine; Council on Clinical Cardiology; Council on Arteriosclerosis Thrombosis and Vascular Biology; Council on Cardiovascular Radiology and Intervention; Council on Lifestyle and Cardiometabolic Health; and Council on Peripheral Vascular Disease Council on Arteriosclerosis, Thrombosis and Vascular Biology; Council on Cardiovascular Radiology and Intervention; Council on Lifestyle and Cardiometabolic Health; and Council on Peripheral Vascular Disease. Polygenic Risk Scores for Cardiovascular Disease: A Scientific Statement From the American Heart Association. Circulation. 2022;146:e93–e118. doi: 10.1161/CIR.0000000000001077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Naderian M., Norland K., Schaid D.J., Kullo I.J. Development and Evaluation of a Comprehensive Prediction Model for Incident Coronary Heart Disease Using Genetic, Social, and Lifestyle-Psychological Factors: A Prospective Analysis of the UK Biobank. Ann. Intern. Med. 2025;178:1–10. doi: 10.7326/ANNALS-24-00716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gazal S., Finucane H.K., Furlotte N.A., Loh P.-R., Palamara P.F., Liu X., Schoech A., Bulik-Sullivan B., Neale B.M., Gusev A., Price A.L. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 2017;49:1421–1427. doi: 10.1038/ng.3954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Aguet F., Brown A.A., Castel S.E., Davis J.R., He Y., Jo B., Mohammadi P., Park Y., Parsana P., Segrè A.V., et al. Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Franzén O., Ermel R., Cohain A., Akers N.K., Di Narzo A., Talukdar H.A., Foroughi-Asl H., Giambartolomei C., Fullard J.F., Sukhavasi K., et al. Cardiometabolic risk loci share downstream cis- and trans-gene regulation across tissues and diseases. Science. 2016;353:827–830. doi: 10.1126/science.aad6970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zhang K., Hocker J.D., Miller M., Hou X., Chiou J., Poirion O.B., Qiu Y., Li Y.E., Gaulton K.J., Wang A., et al. A single-cell atlas of chromatin accessibility in the human genome. Cell. 2021;184:5985–6001.e19. doi: 10.1016/j.cell.2021.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Örd T., Lönnberg T., Nurminen V., Ravindran A., Niskanen H., Kiema M., Õunap K., Maria M., Moreau P.R., Mishra P.P., et al. Dissecting the polygenic basis of atherosclerosis via disease-associated cell state signatures. Am. J. Hum. Genet. 2023;110:722–740. doi: 10.1016/j.ajhg.2023.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Turner A.W., Hu S.S., Mosquera J.V., Ma W.F., Hodonsky C.J., Wong D., Auguste G., Song Y., Sol-Church K., Farber E., et al. Single-nucleus chromatin accessibility profiling highlights regulatory mechanisms of coronary artery disease risk. Nat. Genet. 2022;54:804–816. doi: 10.1038/s41588-022-01069-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pain O., Glanville K.P., Hagenaars S.P., Selzam S., Fürtjes A.E., Gaspar H.A., Coleman J.R.I., Rimfeld K., Breen G., Plomin R., et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Alexander D.H., Novembre J., Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.National Academies of Sciences, Engineering, and Medicine. Division of Behavioral and Social Sciences and Education. Health and Medicine Division. Committee on Population; Board on Health Sciences Policy. Committee on the Use of Race, Ethnicity, and Ancestry as Population Descriptors in Genomics Research . National Academies Press (US); 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. [PubMed] [Google Scholar]
  • 21.Kachuri L., Chatterjee N., Hirbo J., Schaid D.J., Martin I., Kullo I.J., Kenny E.E., Pasaniuc B., Polygenic Risk Methods in Diverse Populations PRIMED Consortium Methods Working Group. Witte J.S., Ge T. Principles and methods for transferring polygenic risk scores across global populations. Nat. Rev. Genet. 2024;25:8–25. doi: 10.1038/s41576-023-00637-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Graham S.E., Clarke S.L., Wu K.-H.H., Kanoni S., Zajac G.J.M., Ramdas S., Surakka I., Ntalla I., Vedantam S., Winkler T.W., et al. The power of genetic diversity in genome-wide association studies of lipids. Nature. 2021;600:675–679. doi: 10.1038/s41586-021-04064-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Örd T., Õunap K., Stolze L.K., Aherrahrou R., Nurminen V., Toropainen A., Selvarajan I., Lönnberg T., Aavik E., Ylä-Herttuala S., et al. Single-Cell Epigenomics and Functional Fine-Mapping of Atherosclerosis GWAS Loci. Circ. Res. 2021;129:240–258. doi: 10.1161/CIRCRESAHA.121.318971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wirka R.C., Wagh D., Paik D.T., Pjanic M., Nguyen T., Miller C.L., Kundu R., Nagao M., Coller J., Koyano T.K., et al. Atheroprotective roles of smooth muscle cell phenotypic modulation and the TCF21 disease gene as revealed by single-cell analysis. Nat. Med. 2019;25:1280–1289. doi: 10.1038/s41591-019-0512-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zheng Z., Liu S., Sidorenko J., Wang Y., Lin T., Yengo L., Turley P., Ani A., Wang R., Nolte I.M., et al. Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries. Nat. Genet. 2024;56:767–777. doi: 10.1038/s41588-024-01704-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hu Y., Lu Q., Powles R., Yao X., Yang C., Fang F., Xu X., Zhao H. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput. Biol. 2017;13 doi: 10.1371/journal.pcbi.1005589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Márquez-Luna C., Gazal S., Loh P.-R., Kim S.S., Furlotte N., Auton A., 23andMe Research Team. Price A.L. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat. Commun. 2021;12:6052. doi: 10.1038/s41467-021-25171-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Choi S.W., García-González J., Ruan Y., Wu H.M., Porras C., Johnson J., Bipolar Disorder Working group of the Psychiatric Genomics Consortium. O’Reilly P.F., O'Reilly P.F. PRSet: Pathway-based polygenic risk score analyses and software. PLoS Genet. 2023;19 doi: 10.1371/journal.pgen.1010624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yang H.-S., Teng L., Kang D., Menon V., Ge T., Finucane H.K., Schultz A.P., Properzi M., Klein H.-U., Chibnik L.B., et al. Cell-type-specific Alzheimer’s disease polygenic risk scores are associated with distinct disease processes in Alzheimer’s disease. Nat. Commun. 2023;14:7659. doi: 10.1038/s41467-023-43132-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hu S., Ferreira L.A.F., Shi S., Hellenthal G., Marchini J., Lawson D.J., Myers S.R. Fine-scale population structure and widespread conservation of genetic effect sizes between human groups across traits. Nat. Genet. 2025;57:379–389. doi: 10.1038/s41588-024-02035-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Saitou M., Dahl A., Wang Q., Liu X. Allele frequency impacts the cross-ancestry portability of gene expression prediction in lymphoblastoid cell lines. Am. J. Hum. Genet. 2024;111:2814–2825. doi: 10.1016/j.ajhg.2024.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zhang J., Zhan J., Jin J., Ma C., Zhao R., O’Connell J., Jiang Y., 23andMe Research Team. Koelsch B.L., Zhang H., Chatterjee N. An ensemble penalized regression method for multi-ancestry polygenic risk prediction. Nat. Commun. 2024;15:3238. doi: 10.1038/s41467-024-47357-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.ENCODE Project Consortium. Moore J.E., Purcaro M.J., Pratt H.E., Epstein C.B., Shoresh N., Adrian J., Kawli T., Davis C.A., Dobin A., et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020;583:699–710. doi: 10.1038/s41586-020-2493-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Breeze C.E., Haugen E., Reynolds A., Teschendorff A., van Dongen J., Lan Q., Rothman N., Bourque G., Dunham I., Beck S., et al. Integrative analysis of 3604 GWAS reveals multiple novel cell type-specific regulatory associations. Genome Biol. 2022;23:13. doi: 10.1186/s13059-021-02560-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yao C., Chen G., Song C., Keefe J., Mendelson M., Huan T., Sun B.B., Laser A., Maranville J.C., Wu H., et al. Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease. Nat. Commun. 2018;9:3268. doi: 10.1038/s41467-018-05512-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S9
mmc1.pdf (618.5KB, pdf)
Data S1. Tables S1–S5
mmc2.xlsx (88KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (2.5MB, pdf)

Data Availability Statement

Access to the UKB is available through an application: https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access. The corresponding author may provide the custom code essential to the conclusions upon reasonable request.


Articles from Human Genetics and Genomics Advances are provided here courtesy of Elsevier

RESOURCES