Coronary artery disease (CAD) is a common disease globally attributable to the interplay of complex genetic and lifestyle factors. Here, we review how genomic sequencing advances have broadened the fundamental understanding of the monogenic and polygenic contributions to CAD and how these insights can be utilized, in part by creating polygenic risk estimates, for improved disease risk stratification at the individual patient level.
Initial studies linking premature CAD with rare familial cases of elevated blood lipids highlighted high-risk monogenic contributions, predominantly presenting as familial hypercholesterolemia (FH). More commonly CAD genetic risk is a function of multiple, higher frequency variants each imparting lower magnitude of risk, which can be combined to form polygenic risk scores (PRS) conveying significant risk to individuals at the extremes. However, gaps remain in clinical validation of PRSs, most notably in non-European populations.
With improved and more broadly utilized genomic sequencing technologies, the genetic underpinnings of coronary artery disease are being unraveled. As a result, polygenic risk estimation is poised to become a widely used and powerful tool in the clinical setting. While the use of PRSs to augment current clinical risk stratification for optimization of cardiovascular disease risk by lifestyle change or therapeutic targeting is promising, we await adequately powered, prospective studies, demonstrating the clinical utility of polygenic risk estimation in practice.
Keywords: heart disease, coronary artery disease, genetics, polygenic risk assessment
The tools available to address and optimize preventative strategies and medical therapies for emergent and lifechanging cardiovascular events have greatly improved over time. Yet, while the number of deaths and burden of disease due to coronary artery disease (CAD) has decreased dramatically in the past few decades it remains the number one cause of death in the United States and worldwide[1]. The global prevalence of coronary heart disease (CHD) is estimated at 197.2 million individuals (20.1 million for US in 2018) and this disease accounted for 9.1 million annual deaths. In the US alone, 8.3% of men and 6.2% women are living with CHD with 720,000 individuals with new events and 335,000 individuals with recurrent events each year[1]. Gaps in preventative strategies exist not only in structuring care delivery across oftentimes disconnected healthcare and societal systems, but also in our identification and understanding of the most prescient signals that may act as early warning signs for acute events or indicate a higher lifetime (long-term) risk of disease for an individual for which appropriate actions should be taken.
Advances in data science have been vital to unveiling the factors related to early disease risk and prevention [2]. For CAD there has been great interest in understanding how genetics can augment existing risk stratification tools. Insights into the biologic pathways and intricate networks involved in the development and progression of CAD gained from the discovery of individual genes has been expertly reviewed in the past [3–5]. Here, we aim to review how genomic sequencing advances and modeling have improved our fundamental understanding of CAD and has set the stage for an exciting future in improved risk stratification aligned with the promise of personalized preventive medicine.
Coronary Artery Disease and Clinical Risk
Coronary artery disease (CAD) results from the complex interplay of a person’s individual environmental, lifestyle and genetic factors. CAD is the accumulation of atherosclerotic plaque within the arteries that supply the heart muscle with oxygen and nutrient rich blood that it requires for optimal sustained function. Symptoms of CAD manifest when the obstruction to blood flow created by a plaque exceeds the myocardial energy demands. This can occur incrementally over a period of time (months to years) eventually leading to chest pain (angina) or shortness of breath on exertion when that supply-demand mismatch threshold is achieved or can progress rapidly and unpredictably as in the case of unstable plaque rupture resulting in an acute heart attack.
Disruptions to multiple pathways of healthy cellular and physiologic homeostasis created by a person’s risk factors are responsible for initial plaque formation and propagation. Co-morbid disease risks such as elevated blood pressure, dyslipidemia, diabetes mellitus, chronic kidney disease and chronic inflammatory conditions such as rheumatoid arthritis and systemic lupus erythematosus are well established. Additionally, lifestyle practices including sedentary behavior, cigarette smoking, stress and unhealthy diet have also been implicated in the development of CAD. For patients, these modifiable risk factors have been included in an effort by the American Heart Association known as “Life’s Simple 7” to serve as a general guidepost for patients to achieve optimal cardiovascular health in the areas of smoking cessation, blood pressure control, lipid management, blood glucose control, healthy diet and body weight, and physical activity [1]. The tools that medical professionals utilize, mainly to determine the need or intensity of statin therapy, are typically more granular. These include, but are not limited to, the AHA/ACC Atherosclerotic Cardiovascular Disease (ASCVD) Pooled Cohort Equations, Framingham Risk Score – Cardiovascular Disease (FRS-CVD), and QRISK2 risk assessment tools which delineate 10-year risk for disease and adverse events based on inputs of age, sex, ethnicity, blood pressure, cholesterol levels, as well as smoking and diabetes status [6, 7]. While the QRISK2 assessment tool models for disease in 1st degree relatives < 60 years of age, the most used tools do not account for some of the most important factors – the genetic contribution to coronary disease. As such, in using the current risk stratification tools the individual genetic environments in which these clinical risk factors exist and which also likely influence are not considered, and the promise of individualized risk factor optimization is not fully actualized.
Heritability of CAD
The positive relationship between a family history for CAD has long been recognized through various epidemiologic studies and thus hinting at a genetic component and heritability for this disease separate [8]. Heritability can be thought of as the amount of variation in the disease phenotype resulting purely form genetic effects[9]. Twin studies have been helpful in the understanding of heritability as related to CAD. Results from the Swedish Twins Registry, following 21,004 monozygotic or dizygotic Swedish twins, clearly illustrated the link between genetics and premature CAD [10]. Here, monozygotic male twins had an odds ratio for death of 8.1, and 3.8 for dizygotic twins, if one twin died of CAD prior to reaching 55 years of age. The increased risk for death decreased as the age of the index case advanced in age. The Framingham Offspring Study, as part of the landmark Framingham Heart Study, further demonstrated that family history, beyond twins, as related to parental history of premature CAD is an independent and prospective risk factor for the future development of CAD [11]. In this study of 2,302 offspring of participants there was ~ 2 times the risk of adverse cardiovascular events (using multivariable adjustment including age) for the sons and daughters of parents who had experienced premature CAD. Adding family history to overall risk assessment was independent of existing lifestyle and clinical factors.
Soon the underpinnings of family history and our understanding of the molecular basis of CAD and influences of specific gene mutations and variants expanded. This occurred as our ability to probe the human genome progressed from classic Sanger sequencing, which is limited in scale and read length, to DNA microarrays, which enabled large-scale genome-wide association studies (GWAS), and finally to high-throughput next-generation sequencing [12]. Through these technological advances the link between our genes and biology ushered in a new era of genomic medicine for primary risk prediction, disease diagnosis and prognostication, and therapeutic targeting and discovery.
The influence of genetics on a person’s risk of developing a disease exists along a spectrum[9]. There are specific single-gene mutations that are highly penetrant that alter the function of the encoded product to such degree that disease is of the highest likelihood, such as in the case of LDLR mutations, encoding low-density lipoprotein receptor, in the development of familial hypercholesterolemia (FH). These are termed monogenic diseases as the disease is the result of just a single gene mutation. Monogenic diseases are typically rare. However, common complex diseases like coronary artery disease are most often the result of the cumulative effects of multiple gene variants each with a small individual effect but that in concert may exert higher genetic risk of disease. These are aptly named polygenic diseases. CAD has both monogenic and polygenic origins (Figure 1). In the case of polygenic CAD, it is thought that genetics explains in the range of 50% of an average individual’s overall risk of disease, with the other half attributed to lifestyle and environment, though significant gaps remain in the accounting for the entirety of the genetic contribution.
Figure 1. Basic overview of CAD Genetics.
Genetic risk for CAD, and many common diseases, can be simplified into a monogenic (left) and polygenic (right) component. Monogenic risk variants tends to be rare, with small numbers of genetic mutations with large impacts on genetic risk. Polygenic risk variants tend to be more common, with more modest impacts on risk individually, and with cumulative effects that can be significant.
Familial Hypercholesterolemia and Monogenic Contributors to CAD
One of the most common genetic diseases in mankind, and the main monogenic driver of CAD, is familial hypercholesterolemia (FH) [13]. FH is typically an autosomal dominant disease that leads to massively elevated levels of low-density lipoprotein (LDL) cholesterol and results in increased risk of premature CAD and myocardial infarction (men < 55 years, women < 60 years). Strikingly, if left untreated 50% of men and 15% of women with FH will succumb to this disease in their early years[14]. Heterozygous FH – which results from carrying a defect in only one allele - is far more common (~1:250 persons) though presenting with a less severe phenotype, while homozygous FH is rare (1:150,000–300,000 persons) and can present dramatically with heart disease at very early ages [15, 16]. Patients were first diagnosed with FH based on laboratory values, family history and physical exam findings of cholesterol deposits in several tissues - tendon xanthomas, eyelid or skin xanthelasmata and arcus cornealis. It wasn’t until the LDL receptor (LDLR) from a patient with FH was sequenced in 1,985 showing a large deletion that the first insights into the molecular defects responsible for this disease were made known [17]. The disruption in receptor-mediated clearance as well as additional mechanisms leading to elevations in LDL cholesterol in FH have been elegantly reviewed elsewhere [18].
Known pathogenic mutations within LDLR now number over a thousand and account for the vast majority of FH cases (~95%). Shortly after the report of the LDLR truncation mutant, mutations in APOB effecting apolipoprotein-B100 and leading to defective binding of LDL to LDLR and elevated cholesterol in six unrelated individuals were described [19]. There are currently over 30 individual mutations within APOB identified in FH though account for roughly 5% of the disease [20]. Gain-of-function mutations in proprotein convertase subtilisin/kexin type 9 (PCSK9) responsible for FH were later identified from positional cloning in 23 French families [21]. Mutations in PCSK9 however are rare and account for less than 1% of FH. The discovery of PCSK9’s role in cholesterol metabolism through well-designed family studies illustrating both gain-of-function and loss-of-function variants and subsequently leading to an entirely new class of hypolipidemic agents has been heralded as a pivotal moment in translational medicine – moving from bedside to benchtop [22]. Several other genes have been implicated in monogenic cholesterol disorders culminating in premature CAD including LDLRAP1 (LDL receptor adapter protein), APOE (apolipoprotein E), STAP1 (signal-transducing adapter protein 1), ABCG5 (sterolin 1), and ABCG8 (sterolin 2)[23–26]. The identification of causal mutations in individuals with suspected FH is critical for downstream cascade testing of family members as well as for determination of the timing and intensity of therapeutics[13, 27]. As sequencing efforts continue in large, multi-ethnic populations additional causal mutations in other genes are expected and multiple genes (polygenic hypercholesterolemia) are estimated to account for ~15% of this disease [28].
Given the profound increased risk for adverse cardiovascular outcomes for individuals with FH and the beneficial impact of start lipid-lowering therapy in these patients, identifying affected individuals and cascade testing of relatives is increasingly important[29]. Unfortunately, if relaying solely on clinical factors, many cases of FH go undetected and undiagnosed. A study of 50,726 patients with linked electronic health records and exome sequencing data found that only 24% of carriers for FH mutations would have been flagged for as FH based on current clinical criteria for probable of definite diagnosis [30]. Implementing standardized genetic testing in affected individuals and their family members as was done in an Estonian population has been shown to increase the diagnosis of FH and successfully guide them to initiating appropriate treatment[31]. As such, a recent Expert Consensus Panel recommended genetic testing for casual mutations in LDLR, APOB and PCSK9 in individuals with known or probable FH with subsequent cascade testing of family members[32].
The Polygenic Nature of Coronary Artery Disease
Unlike monogenic Mendelian diseases, the majority of CAD within the population is influenced by polygenic inheritance attributed to numerous common and rare variants with small effects throughout the genome [5]. As DNA sequencing technologies and analytic platforms improved, and in-turn became more cost-effective and scalable, research efforts using genomics broadened greatly and expanded in size. This enabled the proliferation of GWAS [33]. GWAS is an experimental design that is aimed at identifying relationships between traits or diseases with DNA variants and whose power to do so improves with increasing sample size. The first GWAS in CAD was reported simultaneously by three independent groups in 2007 and identified a region on chromosome 9p21 associated with CAD [34–36]. Taken together these three reports included over 11,000 individuals as cases with CAD and 37,000 individuals as controls without heart disease for discovery and validation. Despite these impressive numbers, the population studied was predominantly of European ancestry. Interestingly, the variants at 9p21 are found in a region that does not encode for a specific protein product or regulatory region and set off a hunt for what biologic role the variant could play to increase the risk of CAD by ~50% for persons with 2 copies of the risk block. While the definitive role of 9p21 in CAD is controversial, this risk region is known to act in vascular smooth muscle cells via increased expression of the long non-coding RNA ANRIL leading to altered cell adhesion, contraction and proliferation associated with CAD and other 9p21 associated disease phenotypes [37].
Based on the clinically observed phenotypic relationships between dyslipidemias and CAD, the next wave of studies assessed for genetic associations not only to CAD but also to lipid traits using a variety of approaches. Several GWAS increasingly identified and validated the significance of a growing number of variants including SORT1, LPA, MRAS, PHACTR1 and novel variants within genes already known to be involved in the disease process such as LDLR and PCSK9 [38–43]. Interestingly, results from a mendelian randomization study illustrated a disconnect between genetic mechanisms that increase circulating HDL and any protective effect for myocardial infarction a presage for the lack of benefit observed in forthcoming clinical trials focused on therapies raising HDL [44]. However, several studies used whole-genome or exome sequencing to identify rare and low-frequency variants effecting non-LDL pathways that highlight the role of triglycerides in the development of CAD[45–49]. Though despite these well-designed studies in select populations the portion of disease potentially explained by the collection of risk variants remained low and each newly discovered variant only imparting small risk increase over that seen in control populations.
Understanding the strength in GWAS and ability to identify variants of interest with genome-wide significance comes in part with larger sample sizes and increasing diversity, new consortiums were formed to combine their data sets into meta-analysis. From these efforts came nearly 20 additional novel associated variants with CAD plus independent validation of previously associated variants [50, 51]. In 2013 taking this even further with the inclusion of 63,746 CAD cases and 130,681 controls, the largest CAD GWAS meta-analysis performed up to that time reported an additional 15 loci associated with CAD (bringing total number of observed loci associated with CAD to 46) [52]. Given that less than half of these variants (17 of 46) were known to be involved in pathways related to blood pressure and blood lipids, these data further supported the impact of genomics on CAD risk even outside of traditional clinical risk factors and helped broaden the understanding of the complex biology of CAD. The success of these early meta-GWAS in CAD risk variant discovery and validation and the entry of new datasets, namely the UK Biobank, soon grew the number of CAD associated variants to more than 160 independent loci [53–57]. Most recently, combining an additional 25,892 CAD cases and 142,336 controls from a Japanese population in a trans-ancestry meta-analysis of existing CAD cohorts from CARDIoGRAMplusC4D and UK Biobank, an additional 40 new loci associated with CAD were identified[58]. Though, the addition of these 40 new loci only increased the known heritability of CAD by 1.12%.
Development of Polygenic Risk Scores for CAD
In an effort to utilize the combined disease risk associations of the newly appreciated genetic variants discovered from GWAS and other studies discussed above, the use of polygenic risk scores for CAD (CAD PRS) became increasingly popular (Table 1). A PRS is created by the weighted sum of detected risk loci and aims to quantify an individuals’ underlying genetic predisposition to disease [9, 59]. PRSs have been developed and shown clinical utility for various diseases such as atrial fibrillation, diabetes mellitus, breast cancer and prostate cancer[60, 61]. The CAD PRS emerged as a promising genetic CAD metric with personal and clinical utility especially given the shortcomings of traditional risk assessment tools that do not account or any genetic contribution to disease [62, 63]. Initiated by early efforts of CAD GWASs, geneticists derived the prototype of CAD PRSs with dozens of common variants, explaining ~10% of heritability overall [51, 52]. Initial CAD PRSs utilized limited variants and the work focused on specific populations, mainly of European ancestry, but despite only modest gains in risk prediction, the association of CAD PRS with adverse cardiovascular events showed promise[64, 65]. Despite the moderate statistical power due to limited cohort size and the use of a genotyping array targeting pre-selected loci, the re-analysis of statin prevention trials illustrated the value of CAD-PRSs with 27 or 57 SNPs as an orthogonal predictor capable to stratify high-risk group with greater benefit from initiation of statin therapy and adherence to a favorable lifestyle practices [66–69].
Table 1:
Description of leading studies used to develop and validate various CAD PRS models
Derivation | Target | |||||||||||||
Discovery GWAS | Training/Testing | Validation | ||||||||||||
Method | Description | Cohort | Ncase | Ncontrol | PRS (NSNP) | Cohort | Ncase | Ncontrol | Ref | Cohort | Ncase or Ntreatment | Ncontrol or Nplacebo | Performance | Ref |
P+T | pruning and p-value thresholding | CARDIoGRAMplusC4D | 22,333 | 64,762 | PRSMega (27) | community case: MDCS 1st prevention: JUPITER, ASCOT-LLA 2nd prevention: CARE, PROVE-IT |
3,477 | 44,944 | 67 | FOURIER | 7,163 | 7,135 | 6 | |
ODYSSEY | 9,462 | 9,462 | 72 | |||||||||||
CARDIoGRAMplusC4D | 22,333 | 64,762 | PRSTada (50) | Malmö Diet and Cancer study (MDCS) | 2,213 | 21,382 | 69 | ARIC | 1,230 | 6,584 | HR: 1.75 (1.46–2.10) | 66 | ||
WGHS | 971 | 20,251 | HR: 1.94 (1.58–2.39) | |||||||||||
MDCS | 2,903 | 19,486 | HR: 1.98 (1.76–2.23) | |||||||||||
eMERGE Phase III (European) | 5,887 | 39,758 | OR: 1.28 (1.25–1.32) HR: 1.20 (1.15–1.25) |
73 | ||||||||||
eMERGE Phase III (African) | 527 | 7,070 | OR: 1.05 (0.98–1.14) HR: 1.05 (0.94–1.17) |
eMERGE Phase III (Hispanic) | 299 | 2,194 | OR: 1.20 (1.05–1.35) HR: 1.13 (0.93–1.36) |
CADIoGRAMplusC4D | 63,746 | 130,681 | PRSNataragan (57) | WOSCOPS | 604 | 4,288 | 68 | ODYSSEY | 9,462 | 9,462 | 72 | |||
JUPITER | 108 | 8603 | ||||||||||||
ASCOT-LLA | 4070 | 149 | ||||||||||||
CARDIA | 1154 | |||||||||||||
BioImage | 4392 | |||||||||||||
LDpred | Bayesian PRS method adjusting marginal SNP effect size with LD reference | CADIoGRAMplusC4D | 63,746 | 130,681 | PRSKhera (6,630,150) | UK Biobank Phase II | 8,676 | 260,302 | 61 | MHI Biobank Phase 1 (IcWGS) | 974 | 976 | CAD Prevalence OR: 1.64 (1.48–1.81) | 77 |
WHI Biobank Phase 2 (array) | 2,492 | 817 | CAD Prevalence OR: 1.55 (1.38–1.73) | |||||||||||
CARTaGENE | 173 | 5,589 | CAD Prevalence OR: 1.69 (1.44–1.99) | |||||||||||
MDCS | 4,122 | 24,434 | HR: 1.45 (1.40–1.49) | 75 | ||||||||||
eMERGE Phase III (European) | 5,887 | 39,758 | OR: 1.66 (1.62–1.71) HR: 1.50 (1.43–1.56) |
73 | ||||||||||
eMERGE Phase III (African) | 527 | 7,070 | OR: 1.30 (1.21–1.41) HR: 1.19 (1.07–1.33) |
eMERGE Phase III (Hispanic) | 299 | 2,194 | OR: 1.42 (1.25–1.61) HR: 1.16 (0.96–1.41) |
Meta-analysis | reciprocal two-stage sequential discovery and replication design with independent SNPs | CADIoGRAMplusC4D | 88,192 | 162,544 | 161 | UK Biobank | 34,541 | 261,984 | 57 | |||||
UK Biobank | 34,541 | 261,984 | ||||||||||||
Meta-analysis | meta-score based on 3 previous scores | CARDIoGRAMplusC4D | 63,746 | 130,681 | PRSInouye (1,745,179) | UK Biobank | 22,242 | 460,387 | 70 | MHI Biobank Phase 1 (IcWGS) | 974 | 976 | CAD Prevalence OR: 1.74 (1.57–1.93) | 77 |
WHI Biobank Phase 2 (array) | 2,492 | 817 | CAD Prevalence OR: 1.60 (1.43–1.80) | |||||||||||
WTCCC-CAD | 1,926 | 2,938 | CARTaGENE | 173 | 5,589 | CAD Prevalence OR: 1.75 (1.49–2.05) | ||||||||
eMERGE Phase III (European) | 5,887 | 39,758 | OR: 1.73 (1.68–1.78) HR: 1.53 (1.46–1.60) |
73 | ||||||||||
MIGEN-Harps | 488 | 531 | eMERGE Phase III (African) | 527 | 7,070 | OR: 1.40 (1.30–1.52) HR: 1.27 (1.13–1.43) |
eMERGE Phase III (Hispanic) | 299 | 2,194 | OR: 1.93 (1.67–2.22) HR: 1.53 (1.23–1.90) |
Trans-ancestry meta-GWAS | fixed-effect and random-effect meta-analyses | CARDIoGRAMplusC4D | 60,801 | 123,504 | PRSKoyama (75,028) | Independent Japanese cohort | 1,827 | 9,172 | 58 | |||||
UK Biobank | 34,541 | 261,984 | ||||||||||||
Biobank Japan | 25,892 | 142,336 | ||||||||||||
AnnoPred | Bayesian PRS framework leveraging functional annotation | UK Biobank | 4,746 | 88,182 | PRSYe (2,994,054) | UK Biobank | 3,467 | 172,771 | 107 | |||||
SCT | stacked clumping and thresholding | UK Biobank | 7,912 | 121,941 | PRSBolli (300,238) | UK Biobank | 15,433 | 262,900 | 108 |
With the ever-increasing efficiency of DNA sequencers, genotyping technology, and statistical genetics techniques, as well as investment in national biobanks of genotypic and phenotypic information, the development of more complex genetic risk models is now possible. The development of many of these large (100,000+ individuals) datasets and prediction models rely upon imputation, which allows for the inference of whole genome genotypes for common genetic variation from an initially sparse genotyping assay. While these statistical techniques are mature and accurate, they can lead to bias in application across disparate genetic ancestry. With this in mind, Inouye et al. developed a genomic risk score (metaGRSCAD) by meta-analysis to model variant effect sizes (with 1.7M variants), improving to explain up to 26% of CAD heritability [70]. Alternatively, Khera et al. utilized a Bayesian framework to comprise a genome-wide polygenic score (GPSCAD) with 6.6M variants, estimating variant effect sizes with linkage disequilibrium (LD) adjustment [61]. As a complement to conventional risk factors, both PRSs were validated to predict the fold change of disease susceptibility, stratify individuals with different trajectories of risk, and inform tailored therapeutic intervention in independent datasets or clinical trials [71–77]. Although not unexpected given the multiple shared biologic pathways across diseases, the use of CAD PRSs even extends outside of strict coronary disease events alone as there are independent associations with peripheral arterial disease, aortic aneurysm, stroke and carotid artery disease as well [78, 79].
The Clinical Utility of Polygenic Risk Scores
While the ability to stratify future disease risk on a population level based on CAD PRS has been well established with several groups illustrating improved risk stratification and net reclassification in large datasets, there does remain a healthy debate about if it is time and how best to integrate a CAD PRS into clinical practice[80–85]. Accordingly, current guidelines for the prevention of cardiovascular disease from professional societies in the US and Europe lack the inclusion of CAD PRS [70]. This could be due to the absence of convincing outcome data from large-scale, multi-ethnic, prospective studies incorporating CAD PRS into risk optimization and therapeutic guidance strategies. When used prospectively in small sized, limited scope studies, CAD PRS may lead to increased use of statin therapy but inconclusive improvements in diet and exercise[86, 87]. Though, as has been recognized in other studies delivering genomic information to patients, the knowledge of genetic risk did not increase personal health anxieties. Rather than instilling fear of disease in a patient, CAD PRS has the opportunity to provide a patient with an important orthogonal risk enhancer, similar to the role of coronary artery calcium (CAC), to motivate and empower healthy lifestyle and optimal medical therapies. We have seen that individuals with high CAD PRS achieve the largest risk reduction on statin therapy for both primary and secondary prevention compared to persons with low CAD PRS risk, significantly lowering the number needed to treat, and lifestyle remains an incredibly vital component of lowering event risk over time[66, 67]. The relative risk reduction with statin therapy in low versus high CAD PRS individuals was 34% vs 50% for primary prevention and 3% vs 47% for secondary prevention[67]. Individuals are not destined for disease just because they have a high CAD PRS.
Apart from somatic mutations that accumulate with age and environment and are known to drive certain cancer risks and sometimes risk for CAD, our genetic risk for CAD is largely set from birth. Unlike traditional clinical risk stratification tools based on age and chronic conditions such as hypertension and metabolic disease that may not manifest until mid or late life, a person will know their CAD PRS from the first day of life[75]. Even in the absence of traditional risk factors a lifetime risk of CAD can be established early with the use of CAD PRS. Incorporating this concept, recent work suggested that CAD PRS could be used in concert with widely used CAC testing to help establish the optimal time of first CAC screening based on tiers of CAD PRS[88, 89]. Additionally, a CAD PRS appears to act as a potentiometer for monogenic CVD either dialing up or dialing down the combined impact for disease[90].
Mirroring the improvements on CAD relative risk reduction that high CAD PRS has with statin therapy, recent analysis of landmark secondary prevention trials, ODYSSEY and FOURIER, with PCSK9 inhibitor therapy illustrated that these therapies had the greatest benefit for individuals with elevated genetic risk [72, 76]. In these studies, the relative risk reduction for clinical events in low versus high CAD PRS groups was 13% vs 31% (ODYSSEY) and 13% vs 37% (FOURIER). This is on top of high intensity statin therapy for which 70% of patients in FOURIER and 90% of patients in ODYSSEY were already taking.
Though lacking prospective validation, in this era of higher cost for novel therapies CAD PRS may prove key to matching the right patients with the most cost-effective therapies to achieve the greatest individualized benefit. While the promise of incorporating CAD PRS into clinical pathways is strong, the impact of CAD PRS is not fully understood as it does appear to have variation across sex, certain risk factors such as cigarette smoking, and most importantly, across ethnicities[91–94].
The Performance of Polygenic Risk Scores Across Ethnicities
Recent studies urged a significant attenuation of cross-population prediction accuracy to hamper the utility of CAD PRSs [73, 93]. Most of the studies predominantly conducted with European-decent population and heavily underrepresented the global demographic diversity or human evolutionary history [95]. The current Eurocentric sampling is inadequate to discover the disparate genetic architectures, differentiated LD patterns and non-genetic factors with gene-by-environment (GxE) interactions among the populations [73, 94, 96]. Recent studies have reported foreseeable improvement by trans-ethnic meta-analysis[58]. Growing efforts were also promoted to diversify the exploratory samples in genomic research with harmonized phenotypic definition and case ascertainment [97, 98]. To leverage current large-scale datasets and understanding of genetic studies, efforts attempted to bridge this gap by functional fine-mapping, the goal of which was to identify causal variants shared across populations [99, 100]. A recent study demonstrated that regulatory annotation partitioning can maintain the portability of PRS models from Europeans applying towards East Asians [101]. However, all current polygenic prediction method only allowed for input from one to multiple homogeneous subpopulations. Future work is needed for admixed population with higher genome complexity. Promising direction could extend to incorporate ancestry-specific effect size estimation and local ancestry adjustment [102, 103].
In addition, increased adoption and connectivity of personal electronic health records (EHRs) can expedite genomic discovery and personalized medicine implementation[97]. Wearable sensors and devices also enable the impending collection of comprehensive exogenous factors to quantify individual envirotypes [104]. Recent perspective has outlined the potentials for developing and integrating risk predictions with PRSs and biobank-linked EHR data [105]. Few cohort studies demonstrated the potential improvement by delineating the multiplicative interaction of modifiable risk factors with CAD PRSs [91, 106–108]. High-level risk stratification considering environmental factors and GxE association would potentially further disentangle the bias between different populations.
It is without a doubt that the rapid advances in genomic medicine have now enabled a more complete assessment of cardiovascular risk that had previously remained unaccounted for. By using readily available, increasingly affordable and clinically meaningful genetic assessment tools, clinicians now can better inform patients about their own baseline genetic risks in order to create personalized and strategic risk optimization strategies. As our genetic knowledge and analytical techniques expand, these risk optimization strategies will be further tailored to each individual. Our group has developed the MyGeneRank platform (https://mygenerank.scripps.edu/) specifically to assist an individual in determining their CAD PRS using their smartphone device which can be determined in minutes and shared with their clinical care team. PRSs will continue to improve as genomic research expands to include a better representation of ethnicities and populations under-represented in current biomedical research. In addition, the call for more standardized reporting, data sharing, especially metadata, within the scientific community is essential to improve accuracy in the real-world and the ability to translate findings into the clinical setting[109]. Studies investigating the role of CAD in diverse populations must be conducted and shared openly to ensure that our tools to intervene early to prevent clinical disease are accessible to all.
