Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

[Preprint]. 2023 Dec 14:2023.12.13.23299909. [Version 1] doi: 10.1101/2023.12.13.23299909

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License, which allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.

PMC Copyright notice

Figure 1: — Panel a: the proposed ensemble model framework. The ensemble is composed of two models. The baseline model, trained on covariates $(X_{b})$ only for prediction of SBP and DBP $({\hat{y}}_{b})$ . To assess the accuracy of the baseline model we calculated the residuals (baseline residuals $r_{b}$ ) by subtracting the predicted value of SBP/DBP from the actual value of SBP/DBP. The genetic model was trained on a subset of the covariates, and genetic components (global PRSs) for prediction of the baseline model residuals $r_{b}$ . We measured the accuracy of the genetic model by subtracting predicted genetic residuals ${\hat{r}}_{g}$ from baseline residuals $r_{b}$ . The overall prediction of BP by the ensemble model is the sum of the predicted baseline BP ${\hat{y}}_{b}$ (by the baseline model) and the predicted baseline residuals ${\hat{r}}_{b}$ (by the genetic model). The accuracy of the ensemble model was assessed by calculating percent variance explained (PVE) by two models jointly. Panel b: the split of the primary, TOPMed dataset, into training and testing sets followed by the 5-fold cross validation procedure where the training dataset is further split into 5 equal parts with one part designated for testing (repeated 5 times with 1/5 of the training data being designated at random for testing at each iteration). Panel c: increasing levels of genetic models’ complexity where each new model included additional PRSs. Panel d: the process of calculating local PRSs per LD-blocks (secondary analysis).

BBJ: BioBank Japan. BP: blood pressure. GWAS: genome wide association study. LD: linkage disequilibrium. Level: model complexity level. MVP: Million Veteran Program. P: p-value threshold. PRS: polygenic risk score. SNPs: single-nucleotide polymorphisms. TOPMed: Trans-Omics for Precision Medicine project. UKBB+ICBP: UK Biobank and International Consortium for Blood Pressure.