Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 1.
Published in final edited form as: Arterioscler Thromb Vasc Biol. 2024 Jan 24;44(2):323–327. doi: 10.1161/ATVBAHA.123.319480

Genome wide genetic associations prioritize evaluation of causal mechanisms of atherosclerotic disease risk

Thomas Quertermous 1, Dan Li 1, Chad Weldy 1, Markus Ramste 1, Disha Sharma 1, João P Monteiro 1, Wenduo Gu 1, Matthew Worssam 1, Brian Palmisano 1, Chong Park 1, Paul Cheng 1
PMCID: PMC10857784  NIHMSID: NIHMS1948362  PMID: 38266112

Abstract

Objective:

The goal of this review is to discuss the implementation of genome wide association studies (GWAS) to identify causal mechanisms of vascular disease risk.

Approach and Results:

The history of GWAS is described, the use of imputation and the creation of consortia to conduct meta-analyses with sufficient power to arrive at consistent associated loci for vascular disease. Genomic methods are described that allow the identification of causal variants and causal genes, and how they impact the disease process. The power of single cell analyses to promote study of GWAS causal gene function is described.

Conclusion:

GWAS represent a paradigm shift in the study of cardiovascular disease, providing identification of genes, cellular phenotypes, and disease pathways that empower the future of targeted drug development.

Fundamentals of GWAS methodology -

Genome wide association studies (GWAS) have mapped human traits and complex human disease to specific loci in the human genome, providing actionable information for risk assessment as well as leads on genetic pathways that may be targeted to mitigate disease risk.1 GWAS were made possible by two related contemporaneous breakthroughs. The first was characterization of the haplotype structure of the human genome and mapping of allelic variation that allowed investigation of haplotype associations with human traits and diseases.2 The second GWAS enabling advance was the development of high-density genotyping arrays that allowed simultaneous genotyping of hundreds of thousands of variants across large numbers of human subjects, enough variants to map the genomic haplotype structure of an individual. The Affymetrix GeneChip System uses arrays fabricated by direct synthesis of oligonucleotides (probes) on the glass surface using the photolithographic technology employed in the semiconductor industry. The Illumina BeadArray technology uses silica microbeads that are coupled to oligonucleotides. Currently, over 100 million single nucleotide polymorphisms (SNPs) have been identified in the human genome, 5% of which are present at frequencies of 1% or higher across different populations worldwide. All of these SNPs are not genotyped in their entirety but rather a basic set of SNPs queried with genotyping assays are used with imputation methods to expand the basic information obtained through GWAS studies. Imputation is a statistical method that allows the characterization of variants not directly genotyped during the study, but rather inferred from known linkage disequilibrium structure, thus greatly expanding the utility of the genome scan conducted.3 Importantly, imputation has tremendously facilitated the sharing of genetic data by providing a common complement of variants from differing genotyping platforms with differing collections of variant assays.

Early GWAS studies took advantage of banked DNA and blood samples suitable for DNA isolation, primarily collected as longitudinal studies of cardiovascular and other complex diseases. Because of the large number of hypotheses being tested with whole genome genotyping, GWAS were required to reach a 5×10e-8 level of association, and associations required to be verified with similar genome-wide studies in a second population, i.e., replication. With a few notable exceptions, e.g., macular degeneration, it became clear that initial GWAS efforts with individual cohorts were under powered to identify associated loci that contributed a sizable portion of the heritability estimated from epidemiology studies. Unprecedented global collaborative efforts were initiated with groups self-organized around different human diseases and phenotypes to conduct meta-analyses, e.g., CARDIoGRAM and CARDIoGRAM+C4D for coronary artery disease (CAD),4 DIAGRAM for type 2 diabetes,5 and GIANT6 for anthropometric traits. These collaborations boosted the number of study subjects, and meta-analyses promoted progress toward identifying disease associated loci. Today, there are over 280 CAD replicated loci with genome wide significance, and many more loci that are approaching this level of significance.4

Despite this progress, GWAS has been considered by many to be a failure because most of the significant variants are localized in non-coding regions of the genome without clear functional implications. Also, many scientists were skeptical of GWAS because the heritability explained by associated loci was much lower than predicted by classical heritability studies,7 and the results initially suggested that the GWAS findings would not be sufficient to provide for risk assessment for individual patients. However, GWAS data are now being used to assess the genetic component of risk, with the goal of directing future risk lowering strategies.8 These polygenic risk scores (PRS) are an age-independent tool that can be used to estimate the risk of developing complex traits such as CAD based on an individual’s genotype at a number of loci. The individual risks conferred by GWAS genetic variants identified to confer risk are operationalized into an equation, which then collectively appraises risk of the complex trait. Intense current efforts are underway to determine the best utilization of this approach for the implementation of preventive strategies to mitigate future CAD risk. Further it seemed that GWAS findings would not lead to drug targets, since disease associated variation almost exclusively resided in intergenic non-coding regions, where large numbers of SNPs in LD obscured the identity of the true causal gene. However, pharmaceutical firms are now investigating the therapeutic effect of agents that are validated causal genes for treatment of atherosclerosis. For instance, this lab has collaborated with Amgen to investigate the possible therapeutic effect of blocking CAD associated gene PDGFD in a murine model.9

Importance of causality and utility of GWAS for causal inference -

The lack of progress in developing drugs that target the primary disease process in the vessel wall has not been due to a dearth of data regarding vascular wall and bone-marrow derived cell genes that are differentially expressed in these cells during the disease process. The problem has been the lack of knowledge regarding which cells, genes, and molecular pathways are causal. The value of GWAS study results, and genetic studies in general, is the concept of causality and the distinction between association and causality. Causality is commonly defined as a relationship between two events, or variables, in which one event or process causes an effect on the other event or process. Randomized clinical trials are frequently cited as an example of how cause can be linked to effect, as the process of randomization removes incidental associations that confound causality, elevating correlative association to proven causality. Further, it is useful to consider the difference between association, which represents a correlation between events but does not provide evidence of causality, as is common for epidemiology studies, and the method of Mendelian randomization (MR) which incorporates genetic information to establish causality.10 MR implements a genetic signature that is closely linked to an exposure or other measurement, and investigates whether the assortment of the genetic variable is random or biased in groups of individuals that are distinguished on the basis of the disease or phenotype under study, e.g. cardiovascular disease. It is important to note that this method depends on a number of assumptions that must be closely observed to prevent confounding and erroneous results. As an example, MR employing genetic variation in the endothelial lipase (LIPG) gene that regulates HDL levels failed to show any association with a reduced risk of myocardial infarction.11

So how do GWAS association studies help with the critical issue of identifying genes and cellular pathways that are causal and modulate disease risk? GWAS provide what is essentially a randomized genetic trial—alleles segregate randomly, and each individual carries a complement of risk alleles, and by examining allele frequencies in well matched case versus control groups of humans it is possible to identify those loci and alleles that are associated with, and likely causal for, disease risk in that specific location of the genome. Everyone would agree that a segregating highly penetrant mutation in a highly conserved coding region of a myocardial cell specific contractile gene can be causal for a Mendelian myocardial disease. Thus, changes in the sequence of the human genome can determine risk for a human phenotype. With Mendelian disease risk is carried in one locus, is highly penetrant, and can easily be mapped in families, and the molecular basis of the disease studied with mechanistic laboratory investigation. But common human disease is more complicated – responsible common polymorphisms are frequent in the human genome, carry a small component of risk, and must be mapped with large numbers of humans with large case-control association studies relating phenotype to allele frequencies by GWAS.1

However, GWAS data alone are not sufficient to establish causality, additional genomic and genetic assays are commonly required to establish the mechanism by which a single basepair change, and perturbed regulation of the identified causal gene, can arguably be linked to disease related cell state changes.12 While a cardiovascular disease association may point to a causal gene, it is true that the association may represent not a disease relationship but rather confounding, due for example to inclusion of population samples of different racial ethnic background not accounted for in the study design and analysis. Molecular genomic assays have been developed to provide additional support for the causal relationship of individual variants to genes and disease.

Methods for mapping causal variants and genes in GWAS associated loci -

There is no single simple approach that provides a means for establishing a mechanism of causality in an associated GWAS locus that has been associated and replicated at genome-wide significance. This is due in large part to our lack of understanding regarding mechanisms of disease risk. Ongoing studies of complex diseases in a number of labs have identified one prominent mechanism of disease risk. A single DNA nucleotide polymorphism may mediate an alteration in regulatory protein-DNA interactions that modulate a critical enhancer function, by inhibiting or promoting transcription factor (TF) binding that results in a change in expression of a gene in a disease relevant cell type.13 That modulation of expression has a critical effect that leads to loss of disease inhibiting functions or promotion of disease initiating functions. In this scenario, it would thus be desirable to show that an associated variant resides in a region of open chromatin that identifies an enhancer element, disrupts a TF binding motif or other protein interaction motif, and that this binding modulates the temporal or cell-specific expression of a gene in a disease related pathway that can regulate cellular and molecular disease features.

It is desirable to validate each aspect of this causal relationship from variant to enhancer to gene identification and function. There have been numerous reviews that discuss the types of data that can be gathered to establish variant, enhancer and gene causality. Identification of causal variation is arguably the most difficult aspect of the process.12 The majority of causal variants affect gene function through regulation of causal gene expression through transcriptional or epigenetic effects, usually in a cell- and context-dependent fashion. This is best accomplished through synergistic multi-modality functional genomics approaches. One highly important type of genomic data links expression of the target causal gene to allelic differences for one or more of the disease-associated variants. Mapping of these functional variants, termed expression quantitative loci (eQTLs) are identified by correlating whole genome genotype data with whole transcriptome gene expression in disease relevant cells or tissues. Extensive human tissue eQTL data for many tissues in now available through the GTEx effort. We and others have used primary human cells and tissues to map genomic features such as histone modifications and chromatin accessibility measures (assay of transposase accessible chromatin sequencing, ATACseq) that identify regions of open chromatin that are accessible for TF interaction, are thus functional, and are highly likely to harbor causal variation. Further, methods that assess physical association of disease-associated chromatin elements with promoter regions, localization of binding sites for TFs that are themselves GWAS gene products, and the binding of disease TFs in other loci that mediate disease risk are highly useful.14

In recent years, these methods have been supported by the implementation of clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system RNA targetable CRISPR-based genome and epigenetic editing,15 and classical transcriptional assays that further implicate allelic variation that mediate transcription in human cells.16 In cases where one could identify the causal variant, studies have led to understanding of the upstream epigenetic pathways and identified families of transcriptional regulators. For instance, smooth muscle cell (SMC) CAD causal gene PDGFD was found to be regulated in an allele-specific fashion by a causal variant that modulated FOXC factor binding.9 Interestingly, FOXC1 and FOXC2 both reside in GWAS CAD loci, and are known to be critical for arterial development. In some cases, it is difficult or impossible to target all candidate variants in linkage disequilibrium in an associated locus, and alteration of an entire enhancer region can be employed.17 This approach indicates that it is possible to identify the causal gene without identifying the specific causal variant in a haplotype block where multiple variants are in LD, and may be equally revealing of disease biology.

Mechanistic studies of CAD GWAS genes –

Two focused early post-GWAS studies are worth consideration. In the early days of GWAS, a collaborative group undertook detailed investigation of the 1p13 locus that had been associated by GWAS with plasma LDL cholesterol and also to MI.18 An associated non-coding variant rs12740374 was identified that regulated nearby SORT1 expression and lipoprotein particle production by the liver. This variant was linked to a C/EBP TF binding site that was the mechanism of SORT1 expression and disease risk. This work represented one of the first studies to show that common noncoding DNA variants identified by GWAS can directly contribute to clinical phenotypes. A second exemplary study investigated the GWAS CAD association at 6p24, which had been associated with five different vascular diseases. Bioinformatic fine-mapping identified the variant rs9349379 in an epigenetic chromatin accessible region of an intron in the PHACTR1 gene, and CRISPR-mediated genome editing of this variant was found to regulate express of the endothelin-1 (EDN1) gene located over 500 kilobases upstream of PHACTR1.19 EDN1 had been well studied as a regulator of vascular tone and vascular disease processes, and provided a highly informative link to the five associated vascular diseases. These striking data illustrate the integration of genetic and phenotypic data, along with epigenetic analyses to identify a prominent disease causal mechanism for the pathogenesis of multiple vascular diseases.

The application of single cell transcriptomic profiling (scRNAseq) to the study of CAD risk genes in mouse atherosclerosis models, have revolutionized our thinking about disease pathophysiology.16,17,20,21 These studies have employed cell-specific gene targeting, and because of the dramatic cell state changes in endothelial cells and SMC, lineage tracing has been critical for tracking these cells as they undergo phenotypic transitions. The ability to map expression of thousands of genes has allowed the characterization of these novel cell phenotypes. Studies in this lab and others have focused on vascular smooth muscle cells (SMC), because algorithms identifying gene expression patterns in CAD loci, including linkage disequilibrium score regression and Multi-marker Analysis of GenoMicAnnotation, have shown that CAD heritability is greater in SMC than other vascular wall or bone marrow derived cells.2224 Recent studies have begun to confirm that disease risk associated with this cell type resides in those cells undergoing phenotypic transitions. Study of cell state changes in the disease setting for SMC lineage cells has been particularly informative. In this regard, single cell transcriptomics has led to the characterization of two SMC transition phenotypes, one representing cells that adopt a fibroblast-like (fibromyocyte) phenotype, and a second phenotype for cells that adopt a osteochondrogenic (chondromyocyte) phenotype.17,21 Published and ongoing studies are correlating the transcriptomic and epigenetic changes, as well as cellular anatomy, between the disease promoting versus disease inhibiting CAD gene functions in these cells.

Importantly, findings from scRNAseq studies of CAD gene knockouts in mice have also raised questions regarding the link between atherosclerotic disease burden in mice and human disease risk. Although historically changes in disease burden in mouse knockout models has been the hallmark of disease causality and direction of effect for candidate genes, mouse disease burden has not been a phenotype commonly found to track with CAD gene expression. The addition of single cell RNA sequencing to lesion cellular anatomy and in situ lesion feature characterization has provided high resolution phenotyping of cells that could not be distinguished or characterized at a genetic level previously. A striking feature of a number of the CAD genes studied in this and other laboratories indicate that extensive cell state changes can occur in the plaque cells without producing an obvious change in plaque volume.9,17,20,21 A failure to find changes in lesion size for a CAD gene knockout does not mean that the gene is not causal, and not important for the initiation or expansion of disease, and thus disease risk. For genes in CAD loci identified through characterization of the causal variant and linking that variant to the causal gene through transcriptional and epigenetic studies firmly establish causality. At that point, mouse model studies are performed to identify the mechanism and direction of effect, not to establish causality. At least for SMC, the function of CAD causal genes appears to be through regulation of cell state and phenotype, and the related transcriptomic and epigenomic changes in the diseased vessel wall. This is exemplified by recent studies investigating the mechanism of disease causality for the CAD gene PDGFD.9 These studies showed this CAD gene promotes the SMC chondromyocyte phenotype, vascular calcification, macrophage content, and general inflammatory profile in the plaque and adventitia. These traits that have all been identified as high probability mechanisms by which plaque is destabilized to promote myocardial infarction. But these findings did not correlate with disease burden.

The long-term goal of studies of GWAS identified causal genes in a number of labs is to map the causal gene regulatory networks that mediate disease risk for CAD, to facilitate the discovery of novel cellular and molecular mechanisms that mediate disease pathology that can be targeted for therapeutic development. Further, by intersecting information from a number of causal genes an integrated picture can be obtained regarding which of the individual gene functions are critical to the disease process. Comparing the cellular functions of genes that promote versus genes that inhibit disease risk will be most beneficial in this regard. Ongoing efforts to comprehensively map genome-wide the gene targets of all GWAS identified enhancer regions using CRISPR targeted epigenomic inhibition (CRISPRi) screening strategies will dramatically expand our ability to build regulatory networks and infer pathway involved in disease risk.25 finally allowing effective drug development targeted to the vascular wall.

The authors apologize to the numerous investigators working in this field whose work could not be discussed in this limited editorial.

Funding sources

This work was funded by NIH NHLBI grants HL139478, HL145708, HL134817, HL151535, HL156846, HL158525, NHGRI grants UM1 HG011972 and U01HG011762, and grant from the Chan Zuckerberg Foundation ZF2019-002437.

References

  • 1.McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008;9:356–369. [DOI] [PubMed] [Google Scholar]
  • 2.International HapMap C The International HapMap Project. Nature 2003;426:789–796. [DOI] [PubMed] [Google Scholar]
  • 3.Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 2007;39:906–913. [DOI] [PubMed] [Google Scholar]
  • 4.Aragam KG, Jiang T, Goel A, et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat Genet 2022;54:1803–1815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Dupuis J, Langenberg C, Prokopenko I, et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet 2010;42:105–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Liu F, Hendriks AE, Ralf A, et al. Common DNA variants predict tall stature in Europeans. Hum Genet 2014;133:587–597. [DOI] [PubMed] [Google Scholar]
  • 7.Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases. Nature 2009;461:747–753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Patel AP, Wang M, Ruan Y, et al. A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease. Nat Med 2023;29:1793–1803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kim HJ, Cheng P, Travisano S, et al. Molecular mechanisms of coronary artery disease risk at the PDGFD locus. Nat Commun 2023;14:847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Emdin CA, Khera AV, Kathiresan S. Mendelian Randomization. JAMA 2017;318:1925–1926. [DOI] [PubMed] [Google Scholar]
  • 11.Voight BF, Peloso GM, Orho-Melander M, et al. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 2012;380:572–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tak YG, Farnham PJ. Making sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome. Epigenetics Chromatin 2015;8:57–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mathelier A, Shi W, Wasserman WW. Identification of altered cis-regulatory elements in human disease. Trends Genet 2015;31:67–76. [DOI] [PubMed] [Google Scholar]
  • 14.Miller CL, Pjanic M, Wang T, et al. Integrative functional genomics identifies regulatory mechanisms at coronary artery disease loci. Nat Commun 2016:12092–12108. [DOI] [PMC free article] [PubMed]
  • 15.La Russa MF, Qi LS. The New State of the Art: Cas9 for Gene Activation and Repression. Mol Cell Biol 2015;35:3800–3809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Cheng P, Wirka RC, Kim JB, et al. Smad3 regulates smooth muscle cell fate and mediates adverse remodeling and calcification of the atherosclerotic plaque. Nat Cardiovasc Res 2022;1:322–333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Cheng P, Wirka RC, Shoa Clarke L, et al. ZEB2 Shapes the Epigenetic Landscape of Atherosclerosis. Circulation 2022;145:469–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Musunuru K, Strong A, Frank-Kamenetsky M, et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 2010;466:714–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gupta RM, Hadaya J, Trehan A, et al. A Genetic Variant Associated with Five Vascular Diseases Is a Distal Regulator of Endothelin-1 Gene Expression. Cell 2017;170:522–533 e515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Newman AAC, Serbulea V, Baylis RA, et al. Multiple cell types contribute to the atherosclerotic lesion fibrous cap by PDGFRbeta and bioenergetic mechanisms. Nat Metab 2021;3:166–181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wirka R DW, Paik DT, et al. Single cell analysis of smooth muscle cell phenotypic modulation in vivo reveals a critical role for coronary disease gene TCF21 in mice and humans. Nat Med 2019;25:1280–1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Tcheandjieu C, Zhu X, Hilliard A, Clarke S, Sun TV, Tsao PS, O’Donnell CJ, Assimes T. A large-scale multi-ethnic genome-wide association study of coronary artery disease. Nat Medicine 2022;28:1679–1692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhang K, Hocker JD, Miller M, et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 2021;184:5985–6001 e5919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Turner AW, Hu SS, Mosquera JV, et al. Single-nucleus chromatin accessibility profiling highlights regulatory mechanisms of coronary artery disease risk. Nat Genet 2022;54:804–816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Califano A, Butte AJ, Friend S, Ideker T, Schadt E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nat Genet 2012;44:841–847. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES