Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 18.
Published in final edited form as: Pac Symp Biocomput. 2024;29:611–626.

Polygenic risk scores for cardiometabolic traits demonstrate importance of ancestry for predictive precision medicine

Rachel L Kember 1, Shefali S Verma 2, Anurag Verma 3, Brenda Xiao 4, Anastasia Lucas 5, Colleen M Kripke 6, Renae Judy 7, Jinbo Chen 8, Scott M Damrauer 9, Daniel J Rader 10, Marylyn D Ritchie 11
PMCID: PMC10947742  NIHMSID: NIHMS1971012  PMID: 38160310

Abstract

Polygenic risk scores (PRS) have predominantly been derived from genome-wide association studies (GWAS) conducted in European ancestry (EUR) individuals. In this study, we present an in-depth evaluation of PRS based on multi-ancestry GWAS for five cardiometabolic phenotypes in the Penn Medicine BioBank (PMBB) followed by a phenome-wide association study (PheWAS). We examine the PRS performance across all individuals and separately in African ancestry (AFR) and EUR ancestry groups. For AFR individuals, PRS derived using the multi-ancestry LD panel showed a higher effect size for four out of five PRSs (DBP, SBP, T2D, and BMI) than those derived from the AFR LD panel. In contrast, for EUR individuals, the multi-ancestry LD panel PRS demonstrated a higher effect size for two out of five PRSs (SBP and T2D) compared to the EUR LD panel. These findings underscore the potential benefits of utilizing a multi-ancestry LD panel for PRS derivation in diverse genetic backgrounds and demonstrate overall robustness in all individuals. Our results also revealed significant associations between PRS and various phenotypic categories. For instance, CAD PRS was linked with 18 phenotypes in AFR and 82 in EUR, while T2D PRS correlated with 84 phenotypes in AFR and 78 in EUR. Notably, associations like hyperlipidemia, renal failure, atrial fibrillation, coronary atherosclerosis, obesity, and hypertension were observed across different PRSs in both AFR and EUR groups, with varying effect sizes and significance levels. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals. Our study underscores the need for future research to prioritize 1) conducting GWAS in diverse ancestry groups and 2) creating a cosmopolitan PRS methodology that is universally applicable across all genetic backgrounds. Such advances will foster a more equitable and personalized approach to precision medicine.

Keywords: Polygenic risk scores, multi-ancestry GWAS, cardiometabolic phenotypes, precision medicine

1. Introduction

The era of precision medicine has been marked by significant efforts to identify the genetic and environmental factors that influence the risk of disease as well as the disease prognosis and treatment. Advance knowledge of these factors can provide a major health benefit to individuals, as preventative strategies and tailored therapies can be targeted toward individuals at higher risk. Results from genome-wide association studies (GWAS) have highlighted the polygenic nature of most common, complex diseases in that they have identified a large number of loci with small genetic effects1,2. The polygenic risk score (PRS) has thus emerged as a promising factor for predicting disease risk. PRS is the cumulative, mathematical aggregation of risk derived from the contributions of many DNA variants across the genome3.

Recent studies have shown the high prevalence of cardiometabolic conditions among adults in the United States4, and together they are the leading cause of mortality around the world5,6. GWAS have identified hundreds of loci associated with common diseases such as coronary artery disease (CAD)7, obesity8, hypertension9 (measured using systolic blood pressure [SBP] and diastolic blood pressure [DBP]), and type 2 diabetes (T2D)10. Among the individuals that are diagnosed with one disease (for example, T2D), the prevalence of comorbidities such as hypertension, CAD, heart failure, and chronic kidney disease is also increased. To fully evaluate disease risk in an individual, it is therefore essential to also consider comorbid or secondary conditions related to the primary disease. There are several GWAS that have identified shared genetic associations between cardiometabolic conditions, demonstrating similarity in the underlying genetic architecture11,12. Pathophysiology of these conditions also shows the cross-talk between organ systems and its effect on disease progression, such as hemodynamic interaction between heart and kidney in heart failure13. With PRS, it is possible to derive an individuals’ disease risk for each cardiometabolic condition using GWAS summary statistics. PRS represents an aggregate measure of the cumulative effect of numerous genetic variants on a particular disease, capturing an individual’s genetic predisposition. As such, PRS can be instrumental in assessing the genetic interplay among coexisting or comorbid conditions.

Numerous methodologies exist for constructing PRS targeted at specific diseases. Conventionally, genetic risk scores (GRS) were derived using the genome-wide significant SNPs from a GWAS; however, recent studies show that using association results with much lower p-value significance (p<0.05) segregate individuals risk with better accuracy1. The development and clinical utility of PRS is under active investigation, especially in globally diverse populations1416. Most large-scale GWAS have been conducted in individuals from European ancestry populations and most PRS are derived from these studies. Subsequently, the majority of PRS investigations published to date have been conducted in populations of European ancestry17. There can be several differences such as linkage disequilibrium (LD) structure and allele frequency of the variants, which can lead to inaccurate PRS for non-European populations17. This is not unique to PRS studies, and the majority of human genetic research suffers from this same phenomenon18. To ensure the successful clinical implementation of PRS, it is imperative to evaluate its performance in diverse global populations that closely reflect the healthcare population being treated. Moreover, for PRS to become a truly inclusive and effective tool for precision medicine, they must be applicable to individuals of all genetic backgrounds, including those with mixed ancestral backgrounds. Achieving this level of equity and broad usability will contribute significantly to the advancement of personalized healthcare practices.

In this study, we investigated the implementation of PRS for cardiometabolic conditions in individuals in the Penn Medicine BioBank (PMBB). PMBB is a cohort of >250,000 individuals established for genomic and precision medicine research. Approximately 45,000 of the individuals have genetic data imputed using the Trans-Omics for Precision Medicine (TOPMed) v2 dataset19. 20% of the PMBB study population is classified as African (AFR) ancestry based on genetic similarity to the 1000 genome (1KGP)20 AFR superpopulation group. We calculated PRS in the PMBB based on GWAS summary statistics generated in multi-ancestry data to evaluate 1) risk prediction accuracy among all individuals, and among AFR and European (EUR) subpopulations; and 2) the utility of PRS in determining genetic overlap among cardiometabolic conditions.

2. Methods

2.1. Penn Medicine BioBank

The Penn Medicine BioBank (PMBB) recruits participants through the University of Pennsylvania Health System by enrolling at the time of appointment21. Patients participate by donating either blood or a tissue sample and allowing researchers access to their electronic health record (EHR) information. This academic biobank provides researchers with centralized access to a large number of blood and tissue samples with extensive health information from the EHR. The facility banks both blood specimens (i.e., whole blood, plasma, serum, buffy coat, and DNA isolated from leukocytes) and tissues (i.e., formalin-fixed paraffin-embedded, fresh, and flash frozen).

2.2. Genotyping and Quality Control and Imputation

The DNA extracted from blood samples was genotyped using the Illumina Global Screening Array. To ensure data integrity, we conducted quality control measures, excluding SNPs with a marker call rate of less than 95% and samples with a call rate of less than 90%. Additionally, individuals with sex discrepancies were removed from the analysis. Imputation was carried out using the Michigan Imputation server, leveraging the TOPMed Reference panel19. To determine genetic ancestry, we employed principal component analysis (PCA) using the smartpca tool22 and the 1KGP dataset20. Genetic ancestry was inferred through a k-means clustering approach, utilizing the 1KGP super populations as genetic ancestry labels.

2.3. Polygenic Risk Scores

To derive PRS, we used the multi-ancestry summary statistics from the largest and/or most recent GWAS studies for each trait (See Table 1).

Table 1.

Multi-ancestry GWAS

Phenotype Sample size (N cases) PMID

BMI 241,258 284436258
CAD 547,261 (122,733) 292127787
Hypertension (DBP, SBP) 318,891 305784189
T2D 1,407,282 (228,499) 3254192510

Weights for each SNP were calculated using PRS-CS23 (version from April 24, 2020), a method that performs Polygenic Prediction via Bayesian regression and continuous shrinkage priors. PRS-CS requires a reference panel that matches the ancestry distribution of the target data set. We generated multiple reference panels for analyses: a multi-ancestry LD reference panel using the HapMap SNPs from the entire 1KGP populations (2504 individuals), an African-only reference panel from the 1KGP African ancestry population, and a European-only reference panel from 1KGP European ancestry population. We identified LD patterns within the 1KGP population by using PLINK (version 1.90) to determine LD blocks and calculate the LD between the SNPs in each block. For PRS-CS, the global shrinkage parameter φ was fixed to 0.01, and default values were selected for all other parameters. PRSs were then calculated using the weights with PLINK. Only the SNPs in the target data set, summary statistics, and LD reference panel were included in the PRSs.

2.4. Phenotypes

We focused on four primary phenotypes to derive and evaluate the PRS association: CAD, hypertension (for DBP and SBP PRS), T2D, and BMI. Cases and controls for each binary phenotype were defined using International Classification of Diseases (ICD-9 and ICD-10) diagnosis codes (CAD: 414.0*, I25.1*; T2D: 250*, E11*; hypertension: 401*, I10*). Participants were coded as cases of a given phenotype if their records contained at least 1 of the corresponding ICD-9 or ICD-10 codes. The median value for BMI was extracted from the EHR.

For Phenome-wide Association Study (PheWAS) analysis, we derived phenotypes using ICD-9 and ICD-10 data from individuals from the Penn Medicine EHR. ICD-9 codes were aggregated to phecodes using the phecode ICD-9 map 1.224,25; ICD-10 codes were aggregated to phecodes using the phecode ICD-10 map 1.2 (beta)26. Individuals are considered cases for the phenotype if they have at least 2 instances of the phecode on unique dates, controls if they have no instance of the phecode, and ‘other/missing’ if they have one instance of the phecode or a related phecode.

2.5. Statistical Analysis

PRS were normalized (mean of 0 and standard deviation of 1) for each analysis separately (stratified by ancestry and overall). Logistic or linear regression models accounting for age, sex, and the first 5 within-ancestry principal components (PCs) were used to test for association of PRS with each of the primary phenotypes (T2D, BMI, hypertension, and CAD). Area under the receiver operator curve (AUC) and DeLong test was determined using the R package pROC, using the full logistic regression model as above. AUC was also calculated for a reduced logistic regression model including covariates alone (age, sex, and the first 5 PCs). The DeLong test27 is a non-parametric approach used to compare the AUCs of two correlated ROC curves, especially when the models are applied to the same set of samples. This test was used to compare null model and full model that includes PRS and obtain a p-value indicating the statistical significance of the difference between the two AUCs. For BMI, we treated it as a continuous trait and provided the R^2 value for all analyses.

A PheWAS was performed using logistic regression models with each PRS as the independent variable, phecodes as the dependent variables, and age, sex, and the first 10 PCs as covariates. A phenome-wide Bonferroni significance threshold of 4.2 × 10−5 (0.05/1190) in AFR and 3.6 × 10−5 (0.05/1377) in EUR was applied to account for multiple testing.

3. Results

3.1. Penn Medicine BioBank (PMBB) Demographics

PMBB currently consists of >250,000 consented individuals. Approximately 45,000 of these participants have been genotyped to date. Demographics of the sample included in this study are shown in Table 2.

Table 2.

Demographics of PMBB sample

All AFR EUR

Total patients 43,530 11,189 30,094
 % Female 50.1% 62.8% 44.9%
 Mean age 55.2 51.7 57.3
 % CAD 23.8% 18.8% 26.4%
 % Hypertension 54.4% 65.2% 51.7%
 % T2D 23.5% 35.1% 19.3%
Patients with BMI data 40,043 10,619 27,489
 % Female 50.4% 63.4% 44.9%
 Mean age 55.6 51.9 57.7

3.2. Determining the effect of linkage disequilibrium panel on PRS in the overall sample

Using publicly available multi-ancestry GWAS data (Table 1), we generated a PRS for each primary phenotype of interest: type 2 diabetes, body mass index, hypertension (SBP and DBP), and coronary artery disease. We assessed the impact of using a multi-ancestry LD panel, akin to the GWAS data, and compared it with an AFR LD panel (in all PMBB individuals and in AFR PMBB individuals) and an EUR LD panel (in all PMBB individuals and in EUR PMBB individuals). AUC values were computed for each binary phenotype PRS in all individuals (Table 3) and contrasted between the full model (AUC, covariates + PRS) and the model containing covariates alone (AUC Null). The addition of PRS consistently improved the covariate model for all phenotypes, showing an average AUC improvement of 0.014. Across the entire dataset, the PRS created with the multi-ancestry LD panel (DBP, BMI) or the EUR LD panel (CAD, SBP, T2D) demonstrated the strongest association with their respective primary phenotypes (Table 3).

Table 3.

Comparison of LD panel for PRS in all

PRS LD Panel AUC1 Null AUC1 DeLong P Model OR Model P-value

CAD Multi-ancestry 0.795 0.808 1.22E-53 1.495 5.82E-186
AFR 0.807 1.22E-52 1.472 7.11E-182
EUR 0.807 2.33E-52 1.515 1.00E-184
DBP Multi-ancestry 0.770 0.773 8.90E-06 1.236 1.65E-49
AFR 0.772 1.32E-15 1.219 1.59E-49
EUR 0.772 6.15E-14 1.226 6.32E-43
SBP Multi-ancestry 0.770 0.775 4.47E-23 1.365 2.48E-83
AFR 0.775 3.74E-22 1.338 2.78E-80
EUR 0.775 7.40E-23 1.376 2.31E-83
T2D Multi-ancestry 0.727 0.730 5.41E-88 2.223 1.24E-286
AFR 0.695 2.68E-79 2.095 3.18E-266
EUR 0.731 2.44E-91 2.263 1.46E-297

PRS LD Panel R2 Null R2 R2 difference Model Beta Model P-value

BMI Multi-ancestry 0.067 0.110 0.043 2.205 0
AFR 0.110 0.043 2.125 0
EUR 0.108 0.042 2.198 0

3.3. Determining the effect of linkage disequilibrium panel on PRS within ancestry

In both AFR (Table 4) and EUR (Table 5) individuals, the addition of PRS to the covariate model enhances model performance. However, it is noteworthy that PRS performance was relatively stronger in EUR individuals compared to AFR individuals. In AFR, the full model shows a somewhat smaller improvement over the covariate-based model (average improvement in AUC=0.011) compared to the improvement observed in EUR (average improvement in AUC=0.021).

Table 4.

Comparison of LD panel for PRS in AFR individuals

PRS LD Panel AUC Null AUC DeLong P Model OR Model P-value

CAD AFR 0.764 0.770 1.33E-06 1.261 2.75E-18
Multi-ancestry 0.770 4.52E-06 1.253 2.45E-17
DBP AFR 0.793 0.797 1.72E-05 1.208 4.56E-15
Multi-ancestry 0.797 1.25E-05 1.214 2.56E-15
SBP AFR 0.793 0.797 3.82E-06 1.252 3.00E-18
Multi-ancestry 0.797 1.11E-06 1.277 9.65E-20
T2D AFR 0.681 0.710 3.03E-25 1.630 5.73E-77
Multi-ancestry 0.711 4.21E-26 1.689 1.73E-79

PRS LD Panel R2 Null R2 R2 difference Model Beta Model P-value

BMI AFR 0.041 0.065 0.024 1.449 1.02E-59
Multi-ancestry 0.063 0.022 1.462 6.84E-56

Table 5.

Comparison of LD panel for PRS in EUR individuals

PRS LD Panel AUC Null AUC DeLong P Model OR Model P-value

CAD EUR 0.796 0.812 9.49E-48 1.533 5.65E-166
Multi-ancestry 0.812 2.38E-48 1.531 5.73E-165
DBP EUR 0.747 0.750 6.17E-11 1.173 9.17E-34
Multi-ancestry 0.750 1.51E-12 1.158 9.43E-29
SBP EUR 0.747 0.753 6.64E-21 1.251 1.49E-64
Multi-ancestry 0.753 1.61E-20 1.255 2.40E-66
T2D EUR 0.651 0.708 8.26E-87 1.721 5.68E-243
Multi-ancestry 0.710 1.12E-82 1.757 8.59E-258

PRS LD Panel R2 Null R2 R2 difference Model Beta Model P-value

BMI EUR 0.006 0.076 0.070 1.637 0
Multi-ancestry 0.075 0.069 1.626 0

Notably, in AFR individuals, the PRS calculated using the multi-ancestry LD panel exhibited a higher effect size in four out of the five PRSs (DBP, SBP, T2D, and BMI) compared to the AFR LD panel (Table 4). This indicates the potential benefits of using a multi-ancestry LD panel to derive PRS in populations with diverse genetic backgrounds.

In EUR individuals, the PRS calculated using the multi-ancestry LD panel demonstrated a higher effect size in two out of the five PRSs (SBP and T2D) when compared to the EUR LD panel (Table 5). This observation highlights the potential advantages of leveraging a multi-ancestry LD panel in deriving PRS for certain phenotypes in populations with European ancestry.

3.4. PheWAS of polygenic risk scores

We conducted a PheWAS of each multi-ancestry LD panel PRS in AFR and EUR individuals, identifying additional phenotypes associated with the PRS for our primary phenotypes (Figure 1, full results in Supplemental Tables Online: https://shorturl.at/uBDSX). The results reveal significant associations between the PRS and various phenotypic categories, shedding light on the potential implications of PRS in predicting disease susceptibility. All PRS exhibited associations with other phenotypes. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals.

Figure 1.

Figure 1.

Phenome-wide Association Study (PheWAS) Results for Polygenic Risk Scores (PRS) for coronary artery disease (CAD), Diastolic Blood Pressure (DBP), Systolic Blood Pressure (SBP), Type 2 Diabetes (T2D), and Body Mass Index (BMI). The x-axis represents the phecode categories, and the y-axis shows the −log10 p-values, color-coded by category.

In our analysis, the CAD PRS in AFR individuals was associated with 18 distinct phenotypes, including notable associations with hyperlipidemia (OR=1.12, p=1.1×10−6) and renal failure (OR=1.12, p=1.0×10−5). In contrast, EUR individuals exhibited associations with a broader range of 82 phenotypes, with hyperlipidemia (OR=1.23, p=7.3×10–45) and renal failure (OR=1.10, p=2.1×10−8) being among them.

For the DBP and SBP PRS, AFR individuals showed associations with 9 and 20 phenotypes respectively. Specific associations of interest included atrial fibrillation for DBP (OR=1.20, p=1.4×10−5) and both coronary atherosclerosis (OR=1.20, p=3.7×10−7) and T2D (OR=1.12, p=3.2×10−5) for SBP. EUR individuals, on the other hand, had DBP and SBP PRS associated with 12 and 27 phenotypes, respectively. This encompassed associations like coronary atherosclerosis for both DBP (OR=1.09, p=4.9×10−7) and SBP (OR=1.13, p=1.6×10−13), and T2D specifically for SBP (OR=1.17, p=1.0×10−17).

The T2D PRS in AFR individuals was linked with a vast array of 84 phenotypes. Key associations here were hyperlipidemia (OR=1.30, p=6.0×10−16), obesity (OR=1.20, p=6.6×10−10), and hypertension (OR=1.22, p=4.5×10−9). EUR individuals had a slightly lesser range with 78 phenotypes, but with significant associations like hyperlipidemia (OR=1.31, p=9.2×10−17), obesity (OR=1.29, p=9.9×10−57), and hypertension (OR=1.22, p=3.2×10−38). Lastly, the BMI PRS in AFR was associated with 19 phenotypes, including T2D (OR=1.17, p=1.6×10−8) and hypertension (OR=1.18, p=8.6×10−8). In EUR individuals, this PRS was linked with a more extensive 72 phenotypes, with notable associations being T2D (OR=1.26, p=4.6×10−39) and hypertension (OR=1.19, p=2.2×10–32).

4. Discussion

We generated five polygenic risk scores representing genetic liability for cardiometabolic diseases and assessed their performance across different ancestry groups in the Penn Medicine BioBank (PMBB), a biobank including DNA linked with electronic health records. For all PRS tested, we identified a statistically significant association with the primary phenotype in both ancestry groups, as validated by the DeLong test comparing the null and the full model.

Type 2 diabetes consistently exhibited the highest effect size, reflecting the large number of cases in the GWAS used to generate this PRS and the PMBB dataset. Contrarily, the hypertension PRSs (DBP and SBP) showed a weaker effect size, even with a larger GWAS and over 50% of PMBB patient participants with hypertension. These observations suggest that factors beyond sample size, such as disease heterogeneity, prevalence, and non-additive effects, influence PRS associations. Consequently, understanding the interplay of these factors will be pivotal in refining and optimizing the application of PRS in disease prediction and risk assessment.

Our PheWAS analyses were conducted to explore the broader phenotypic landscape associated with each PRS with an EHR-linked biobank. Many of the identified phenotypes could be linked to broader effects of known disease risk factors and established comorbidities. For instance, risk for Type 2 diabetes was associated with hypertension, a known commonly co-occurring trait28. Similarly, the BMI PRS was associated with sleep apnea, diabetes, and hypertension, all of which are known to be more prevalent in individuals with higher BMI2932. However, these associations don’t necessarily imply causality. The high prevalence of comorbidities among these phenotypes complicates the task of discerning whether the genetic risk for one condition directly influences the onset of another.

Our findings underscore a significant challenge in the future implementation of PRS into routine clinical care. While PRS derived from multi-ancestry GWAS can be associated with phenotypes in individuals of African ancestry (AFR), their impact is not as pronounced as those generated in European ancestry (EUR). This observation, although expected, has been a topic of extensive discussion in recent years, emphasizing a notable disparity in genetic research15,17. Our results here affirm that these expectations persist even in large-scale, diverse ancestry datasets. Furthermore, our study suggests that PRS for cardiometabolic diseases based on multi-ancestry GWAS data might not perform as robustly for the primary disease and its associated secondary cardiometabolic traits.

Our utilization of a multi-ancestry LD panel to compute PRS for all individuals from multi-ancestry GWAS demonstrated robust performance across all populations. This was especially true for African ancestry individuals, emphasizing the potential advantages of leveraging a multi-ancestry reference panel in PRS generation. As the field of precision medicine continues to evolve, advocating for the adoption of such panels becomes increasingly important. By addressing these challenges, we can pave the way for more inclusive and accurate personalized healthcare strategies.

One notable limitation of our study is the modest gain in predictive performance over the null model across all categories, as reflected in the AUC values. While we observed differences in AUC between the ancestry groups, the absolute increase in AUC over the null model was relatively small. This underscores the need for further refinement in PRS methodologies to achieve more substantial improvements in predictive performance. Additionally, in our PheWAS approach, there are inherent challenges when comparing results between AFR and EUR groups. The difference in sample sizes between these groups can lead to variations in statistical power, potentially influencing the observed associations. Moreover, the generally lower PRS performance in the AFR group, as highlighted in our results, can further compound these challenges. It’s essential to interpret the PheWAS results with these considerations in mind.

In conclusion, while there’s considerable enthusiasm surrounding PRS in clinical care, there remains a significant amount of research to be conducted to determine its optimal implementation. It is essential to explore how PRS can be incorporated alongside other commonly used predictors33, such as family history, clinical comorbidities, and environmental/lifestyle factors. By combining PRS with established clinical guidelines, we can aim for a more comprehensive risk assessment, leading to personalized interventions. Another important issue to address is whether we will ultimately need ancestry-specific PRS models or if we can develop the statistical framework to integrate global and local LD patterns into the PRS model to produce a cosmopolitan PRS approach. For clinical implementation, a cosmopolitan PRS approach will be easier for clinicians to adopt; however, it is unclear how this can be done effectively, given the heterogeneity in LD patterns, effect sizes, and causal variants in different ancestry groups. Our work here suggests that the use of multi-ancestry GWAS and LD panels may be a step towards this goal. The ultimate success of PRS in precision medicine lies in integrating it seamlessly with published clinical guidelines and incorporating an individual’s ancestry within the PRS framework. This integration will empower clinicians to make informed decisions based on a comprehensive and personalized risk profile for each patient. By addressing these key aspects and enhancing our understanding of PRSs role in precision medicine, we can unlock its full potential as a transformative tool in healthcare, facilitating early interventions and preventive measures that cater to each individual’s unique genetic makeup and health needs.

Supplementary Material

Supplementary Table CAD
Supplementary Table T2D
Supplementary Table BMI
Supplementary Table SBP
Supplementary Table DBP

Acknowledgements

We acknowledge the Penn Medicine BioBank (PMBB) for providing data and thank the patient-participants of Penn Medicine who consented to participate in this research program. We would also like to thank the Penn Medicine BioBank team and Regeneron Genetics Center for providing genetic variant data for analysis. The PMBB is approved under IRB protocol# 813913 and supported by Perelman School of Medicine at University of Pennsylvania, a gift from the Smilow family, and the National Center for Advancing Translational Sciences of the National Institutes of Health under CTSA award number UL1TR001878. The authors thank Million Veteran Program (MVP) staff, researchers, and volunteers, who have contributed to MVP, and especially participants who previously served their country in the military and now generously agreed to enroll in the study. (See https://www.research.va.gov/mvp/ for more details). The citation for MVP is Gaziano, J.M. et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 70, 214–23 (2016). This research is based on data from the Million Veteran Program, Office of Research and Development, Veterans Health Administration, and was supported by the Veterans Administration (VA) Million Veteran Program (MVP) award #000. We have accessed the MVP summary statistics via dbGaP phs001672.

Footnotes

1

AUC rounded to three decimal points

Contributor Information

Rachel L. Kember, Department of Psychiatry, University of Pennsylvania, 3535 Market Street Philadelphia, PA 19104, USA

Shefali S. Verma, Department of Pathology and Laboratory Medicine, University of Pennsylvania, 3700 Hamilton Walk Philadelphia, PA 19104, USA

Anurag Verma, Department of Medicine, University of Pennsylvania, 3700 Hamilton Walk Philadelphia, PA 19104, USA.

Brenda Xiao, Graduate Program in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.

Anastasia Lucas, Graduate Program in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.

Colleen M. Kripke, Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA.

Renae Judy, Department of Surgery, Division of Vascular Surgery and Endovascular Therapy, University of Pennsylvania, Philadelphia, PA 19104, USA..

Jinbo Chen, Department of Biostatistics and Epidemiology, University of Pennsylvania, 203 Blockley Hall, Philadelphia, PA 19104, USA.

Scott M. Damrauer, Department of Surgery, Division of Vascular Surgery and Endovascular Therapy, University of Pennsylvania, Philadelphia, PA 19104, USA.

Daniel J. Rader, Department of Medicine and Genetics, Institute for Translational Medicine and Therapeutics, University of Pennsylvania, 3801 Filbert St Philadelphia, PA 19104, USA

Marylyn D. Ritchie, Department of Genetics, Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA 19104, USA

References

  • 1.Abraham G et al. Genomic prediction of coronary heart disease. Eur. Heart J. 37, 3267–3278 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Tada H et al. Risk prediction by genetic risk scores for coronary heart disease is independent of self-reported family history. Eur. Heart J. 37, 561–567 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.International Schizophrenia Consortium et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Arnold SV et al. Burden of cardio-renal-metabolic conditions in adults with type 2 diabetes within the Diabetes Collaborative Registry. Diabetes Obes. Metab. 20, 2000–2003 (2018). [DOI] [PubMed] [Google Scholar]
  • 5.Wang H et al. Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015. The Lancet 388, 1459–1544 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ogurtsova K et al. IDF Diabetes Atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Res. Clin. Pract. 128, 40–50 (2017). [DOI] [PubMed] [Google Scholar]
  • 7.van der Harst P & Verweij N Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease. Circ. Res. 122, 433–443 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Justice AE et al. Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits. Nat. Commun. 8, 14977 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Giri A et al. Trans-ethnic association study of blood pressure determinants in over 750,000 individuals. Nat. Genet. 51, 51–62 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Vujkovic M et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat. Genet. 52, 680–691 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ma RC Genetics of cardiovascular and renal complications in diabetes. J. Diabetes Investig. 7, 139–154 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Regele F et al. Genome-wide studies to identify risk factors for kidney disease with a focus on patients with diabetes. Nephrol. Dial. Transplant. 30, iv26–iv34 (2015). [DOI] [PubMed] [Google Scholar]
  • 13.Rangaswami J et al. Cardiorenal Syndrome: Classification, Pathophysiology, Diagnosis, and Treatment Strategies: A Scientific Statement From the American Heart Association. Circulation 139, (2019). [DOI] [PubMed] [Google Scholar]
  • 14.Kim MS, Patel KP, Teng AK, Berens AJ & Lachance J Genetic disease risks can be misestimated across global populations. Genome Biol. 19, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.De La Vega FM & Bustamante CD Polygenic risk scores: a biased prediction? Genome Med. 10, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Popejoy AB & Fullerton SM Genomics is failing on diversity. Nature 538, 161–164 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Martin AR et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gurdasani D, Barroso I, Zeggini E & Sandhu MS Genomics of disease risk in globally diverse populations. Nat. Rev. Genet. (2019) doi: 10.1038/s41576-019-0144-0. [DOI] [PubMed] [Google Scholar]
  • 19.Taliun D et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Auton A et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Verma A et al. The Penn Medicine BioBank: Towards a Genomics-Enabled Learning Healthcare System to Accelerate Precision Medicine in a Diverse Population. J. Pers. Med. 12, 1974 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Price AL et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006). [DOI] [PubMed] [Google Scholar]
  • 23.Ge T, Chen C-Y, Ni Y, Feng Y-CA & Smoller JW Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Denny JC et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Denny JC et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1110 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wu P et al. Developing and Evaluating Mappings of ICD-10 and ICD-10-CM Codes to PheCodes. bioRxiv (2019) doi: 10.1101/462077. [DOI] [Google Scholar]
  • 27.DeLong ER, DeLong DM & Clarke-Pearson DL Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988). [PubMed] [Google Scholar]
  • 28.Sun D et al. Type 2 Diabetes and Hypertension: A Study on Bidirectional Causality. Circ. Res. 124, 930–937 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Romero-Corral A, Caples SM, Lopez-Jimenez F & Somers VK Interactions Between Obesity and Obstructive Sleep Apnea. Chest 137, 711–719 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Dua S, Bhuker M, Sharma P, Dhall M & Kapoor S Body mass index relates to blood pressure among adults. North Am. J. Med. Sci. 6, 89 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Xiang B-Y et al. Body mass index and the risk of low bone mass–related fractures in women compared with men: A PRISMA-compliant meta-analysis of prospective cohort studies. Medicine (Baltimore) 96, e5290 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gray N, Picone G, Sloan F & Yashkin A Relation between BMI and Diabetes Mellitus and Its Complications among US Older Adults. South. Med. J. 108, 29–36 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Arnett DK et al. 2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation (2019) doi: 10.1161/CIR.0000000000000678. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table CAD
Supplementary Table T2D
Supplementary Table BMI
Supplementary Table SBP
Supplementary Table DBP

RESOURCES