Abstract
The discovery of genetic loci associated with complex diseases has outpaced the elucidation of mechanisms of disease pathogenesis. Here we conducted a genome-wide association study (GWAS) for coronary artery disease (CAD) comprising 181,522 cases among 1,165,690 participants of predominantly European ancestry. We detected 241 associations, including 30 new loci. Cross-ancestry meta-analysis with a Japanese GWAS yielded 38 additional new loci. We prioritized likely causal variants using functionally informed fine-mapping, yielding 42 associations with less than five variants in the 95% credible set. Similarity-based clustering suggested roles for early developmental processes, cell cycle signaling and vascular cell migration and proliferation in the pathogenesis of CAD. We prioritized 220 candidate causal genes, combining eight complementary approaches, including 123 supported by three or more approaches. Using CRISPR–Cas9, we experimentally validated the effect of an enhancer in MYO9B, which appears to mediate CAD risk by regulating vascular cell motility. Our analysis identifies and systematically characterizes >250 risk loci for CAD to inform experimental interrogation of putative causal mechanisms for CAD.
Subject terms: Acute coronary syndromes, Genome-wide association studies
Genome-wide association analyses, functionally informed fine-mapping and complementary gene prioritization approaches identify new risk loci and candidate effector genes for CAD.
Main
Coronary artery disease (CAD) remains the leading global cause of mortality, reflecting both risk behaviors and genetic susceptibility1. Genetic association studies have identified >200 susceptibility loci for CAD. Consistent with other complex diseases, genetic analyses have identified the polygenic architecture of CAD, enabled insights into disease etiology and facilitated the development of new tools for risk prediction2–10. However, with rapid increase in the availability of genetic data linked to health outcomes, the identification of disease-associated loci has outpaced their functional characterization.
Several in silico tools have emerged to elucidate the mechanisms connecting genomic regions to disease risk11,12. Nonetheless, it remains challenging to identify causal genes as these tools frequently lack consensus13. Recent analyses have suggested the value of integrating ‘locus-based’ approaches with more global (similarity-based) assessments of shared pathways and functions to enhance the prediction of causal genes13–15. The use of orthogonal and disease-specific resources to aid variant and gene classifications may expedite the transition from gene maps to disease mechanisms.
To extend these approaches to CAD, we analyzed imputed data from nine studies not previously included in genome-wide association study (GWAS) meta-analyses (86,847 cases and 417,789 controls) and combined results with data from UK Biobank, the CARDIoGRAMplusC4D Consortium and Biobank Japan, achieving a total sample of 210,842 CAD cases among 1,378,170 participants2,3,7,10,16. Our objectives were to (1) discover new associations with CAD; (2) determine the impact of expanded genetic discovery for identifying biologically relevant loci and improving risk prediction; (3) implement a systematic, integrative approach to prioritize likely causal variants, genes and biological pathways, thereby providing a catalog of testable hypotheses for experimental follow-up and (4) experimentally validate a new locus as proof of principle for our prioritization framework.
Results
Discovery of known and new CAD loci
Participants were largely (>95%) of European ancestry and 46% were female (Supplementary Table 1). In total, 20,073,070 variants were included in the discovery meta-analysis (Online Methods). We replicated 150 (69.4%) of 216 previously reported CAD loci at conventional genome-wide significance (P ≤ 5.0 × 10−8) and 38 (17.6%) at nominal significance (P ≤ 1.0 × 10−5; Supplementary Table 2). Approximate conditional analysis using Genome-wide Complex Trait Analysis (GCTA) identified 241 conditionally independent associations exceeding genome-wide significance at 198 loci (Supplementary Table 3, Extended Data Fig. 1 and Supplementary Data 1). In total, 54 sentinel variants were new, including 30 outside genomic regions previously reported for CAD (Table 1).
Table 1.
Nearest gene | Lead variant rsID | Chr | Position | Effect allele | Non-effect allele | Odds ratio | 95% CI | P value |
---|---|---|---|---|---|---|---|---|
KDF1 | rs79598313 | 1 | 27,284,913 | T | C | 1.10 | 1.06–1.14 | 3.6 × 10−8 |
LOC100131060 | rs71646019 | 1 | 59,433,354 | T | C | 1.04 | 1.03–1.05 | 6.1 × 10−10 |
OTUD7B | rs67807996 | 1 | 149,995,265 | A | G | 1.04 | 1.03–1.05 | 1.1 × 10−12 |
MIR4432 | rs243071 | 2 | 60,619,028 | A | G | 1.03 | 1.02–1.04 | 2.7 × 10−8 |
SAP130 | rs114192718 | 2 | 128,785,663 | T | C | 1.06 | 1.04–1.08 | 2.6 × 10−8 |
ACVR2A | rs35611688 | 2 | 148,377,860 | T | C | 0.97 | 0.96–0.98 | 1.5 × 10−8 |
LNX1 | rs17083333 | 4 | 54,572,066 | T | G | 0.97 | 0.96–0.98 | 1.2 × 10−8 |
ITGA1 | rs4074793 | 5 | 52,193,125 | A | G | 0.95 | 0.93–0.97 | 1.6 × 10−8 |
FER | rs112949822 | 5 | 108,085,190 | A | G | 0.95 | 0.93–0.96 | 1.1 × 10−9 |
DMXL1 | rs13169691 | 5 | 118,448,279 | T | C | 1.04 | 1.03–1.06 | 2.6 × 10−8 |
FBN2 | rs6883598 | 5 | 127,926,190 | A | C | 0.97 | 0.96–0.98 | 9.7 × 10−10 |
PTK7 | rs1034246 | 6 | 43,068,370 | T | G | 0.97 | 0.96–0.98 | 6.4 × 10−10 |
MACC1 | rs10486389 | 7 | 20,300,416 | A | G | 0.97 | 0.96–0.98 | 6.5 × 10−10 |
C9orf146 | rs10961206 | 9 | 13,724,051 | A | T | 1.05 | 1.04–1.07 | 8.1 × 10−10 |
ACER2 | rs10811183 | 9 | 19,436,055 | A | G | 1.04 | 1.02–1.05 | 1.6 × 10−8 |
C5 | rs41312891 | 9 | 123,726,749 | G | GCAAA | 0.94 | 0.92–0.96 | 5.9 × 10−9 |
PLCE1 | rs55753709 | 10 | 96,029,170 | T | C | 0.96 | 0.95–0.97 | 2.2 × 10−13 |
R3HCC1L | rs884811 | 10 | 99,923,763 | C | G | 1.03 | 1.02–1.04 | 3.1 × 10−9 |
MMP13 | rs1892971 | 11 | 102,795,606 | A | G | 0.96 | 0.95–0.97 | 5.1 × 10−10 |
ST3GAL4 | rs10790800 | 11 | 126,262,638 | A | G | 1.03 | 1.02–1.04 | 9.1 × 10−9 |
TBX3 | rs34606058 | 12 | 115,353,368 | T | C | 0.97 | 0.96–0.98 | 7.7 × 10−9 |
DOCK9 | rs8000794 | 13 | 99,434,810 | C | G | 1.03 | 1.02–1.04 | 4.3 × 10−8 |
LIPC | rs588136 | 15 | 58,730,498 | T | C | 0.96 | 0.95–0.98 | 7.0 × 10−10 |
UNC13D | rs2410859 | 17 | 73,841,285 | T | C | 1.03 | 1.02–1.04 | 4.3 × 10−9 |
CPLX4 | rs11663411 | 18 | 56,960,510 | T | C | 0.97 | 0.96–0.98 | 2.6 × 10−8 |
MYO9B | rs7246865 | 19 | 17,219,105 | A | G | 1.03 | 1.02–1.05 | 1.9 × 10−8 |
RRBP1 | rs1132274 | 20 | 17,596,155 | A | C | 1.04 | 1.03–1.05 | 1.8 × 10−8 |
MAFB | rs2207132 | 20 | 39,142,516 | A | G | 1.10 | 1.07–1.13 | 6.7 × 10−10 |
ARVCF | rs71313931 | 22 | 19,960,184 | C | G | 0.97 | 0.96–0.98 | 2.3 × 10−9 |
SCUBE1 | rs139012 | 22 | 43,623,972 | A | G | 0.97 | 0.96–0.98 | 2.1 × 10−8 |
Positions are according to GRCh37. Odds ratios (and 95% confidence intervals (CIs)) are for per-allele effect estimates according to the effect allele. Two-sided P values are from Z scores from a fixed-effect inverse-variance weighted meta-analysis.
As in previous CAD GWAS9, we found genetic correlations with several CAD risk factors and other cardiovascular diseases (Supplementary Table 4). To identify potential etiological mechanisms for specific loci, we conducted a phenome-wide association scan (PheWAS) in UK Biobank (Supplementary Table 5). In total, 128 (53%) of the CAD-associated variants had directionally consistent associations with conventional CAD risk factors, such as blood lipids, blood pressure, hyperglycemia or adiposity.
Several new associations (Table 1) were near genes that have not been robustly implicated in CAD via genetic association studies but have strong biological plausibility, including rs6883598 near FBN2, encoding fibrillin-2, which mediates the early stages of elastic fiber assembly and is associated with aortic aneurysms and Beals Syndrome, a Marfan-like disorder17–19 and rs1892971 near MMP13, which encodes matrix metalloproteinase (MMP)-13, an interstitial collagenase that influences the structural integrity of atherosclerotic plaques through regulation and organization of intraplaque collagen20,21. While the sentinel variant near FBN2 was associated with blood pressure in the PheWAS, the lead variant near MMP13 was not associated with conventional CAD risk factors, suggesting it is likely to act through alternative pathways.
Allelic architecture
Of the 54 new associations, 46 sentinel variants were common (minor allele frequency (MAF) > 0.05) with relatively weak effects on CAD (odds ratio (OR) per CAD risk allele: 1.03–1.07; Fig. 1). The others were low frequency (MAF = 0.009–0.036) of which, four had comparatively strong effects (OR = 1.30–1.44) and four had more modest effects (OR = 1.10–1.14; Extended Data Fig. 2). We then conducted gene-based tests of missense and predicted loss-of-function variants in UK Biobank (n = 33,941 CAD cases, 438,394 controls; Supplementary Table 6) and found a strong signal for PCSK9. We did not find evidence for further association with a burden of low-frequency or rare variants (Extended Data Fig. 3 and Supplementary Table 7).
Differential effects by sex
To identify associations that differ by sex, we conducted sex-stratified GWAS in a subset of studies comprising 77,080 CAD cases (Supplementary Table 8). We found ten associations that reached genome-wide significance (P ≤ 5.0 × 10−8) and had evidence (P ≤ 0.01) for between-sex heterogeneity (Supplementary Table 9). Lead variant rs7696877 was the only signal with a stronger effect in females (per-allele OR = 0.94) than in males (per-allele OR = 0.98, heterogeneity P = 0.007).
Subthreshold associations
At a significance level (P < 2.52 × 10−5) approximating a 1% false discovery rate (FDR), we identified a further 656 conditionally independent associations with CAD (Supplementary Table 10). Most (486, 74.1%) were common variants, but almost all had modest effects (per-allele OR < 1.07). Several associations had strong biological priors, including rs41279633 (P = 1.24 × 10−6) in NPC1L1, encoding Niemann-Pick C1-like 1, an important mediator of intestinal cholesterol absorption and the target of ezetimibe, a cholesterol-lowering drug. Other examples included PNPLA3 (rs738408; P = 1.04 × 10−5), the strongest locus for nonalcoholic fatty liver disease22, and TCF7L2 (rs7903146; P = 6.39 × 10−8), the strongest locus for type 2 diabetes23. The percent of heritability for CAD (on the liability scale) explained by the 241 conditionally independent associations reaching genome-wide significance was 15.5%, increasing to 36.1% for the 897 associations with P < 2.52 × 10−5.
Polygenic score associations with incident and recurrent CAD
We evaluated 362 polygenic risk scores (PRS) using combinations of derivation methods (Pruning and Thresholding24 or LDpred algorithm25) and summary statistics (from the current meta-analysis or an earlier 1000 genomes-imputed GWAS involving around 60,000 CAD cases7). We selected the optimal PRS for each combination of the derivation method and GWAS summary statistics based on prediction of incident CAD in a training dataset from the Malmö Diet and Cancer study (MDC; n = 22,872; nincident_cases = 3,307; Supplementary Table 11). The two top-performing scores were those derived with LDpred and comprised 2,324,653 variants (2022 PRS) and 1,532,758 variants (2015 PRS; Supplementary Tables 12–15). In bootstrapping analyses, the 2022 PRS outperformed the 2015 PRS (age- and sex-adjusted mean hazard ratio (HR) per 1 s.d. higher PRS = 1.56 versus 1.49; P = 3.2 × 10−31; age- and sex-adjusted mean area under the receiver operator characteristic curve (AUC) = 0.742 versus 0.736; P = 6.5 × 10−16; Supplementary Table 16).
We validated both scores in a held-out subset of the MDC (n = 5,685; nincident_cases = 815; Supplementary Table 11). The 2022 PRS was more strongly associated with incident CAD (HR = 1.61; 95% CI = 1.50–1.72) than the 2015 PRS (HR = 1.49; 95% CI = 1.39–1.59), providing improved stratification of participants at higher and lower risk for incident CAD (Fig. 2a). After adjustment for established risk factors (Online Methods), the 2022 PRS remained strongly associated with incident events (HR = 1.54; 95% CI = 1.42–1.66). The 2022 PRS yielded a 5.7-fold higher risk of CAD between the top and bottom deciles of the PRS, compared to a 3.8-fold higher risk with the 2015 PRS.
We then evaluated prediction of recurrent coronary events in the placebo arm of the Further Cardiovascular Outcomes Research with PCSK9 Inhibition in Subjects with Elevated Risk (FOURIER; n = 7,135; nincident_cases = 673) clinical trial, a cohort of patients with established atherosclerotic cardiovascular disease26. The 2022 PRS demonstrated better recurrent event prediction (HR = 1.20; 95% CI = 1.11–1.29) than the 2015 PRS (HR = 1.13; 95% CI = 1.04–1.22) and enhanced stratification of participants at higher and lower risk (Fig. 2b). The 2022 PRS yielded a 1.7-fold higher risk of recurrent coronary events between the top and bottom deciles of the PRS versus a 1.4-fold higher risk with the 2015 PRS.
Cross-ancestry comparison and meta-analysis
We used a large CAD GWAS from Biobank Japan to evaluate the genome-wide significant associations in East Asian ancestry participants3. Effect estimates for the 199 sentinel variants in both datasets were strongly positively correlated (r = 0.59) between the predominantly European ancestry meta-analysis and the Biobank Japan GWAS (Extended Data Fig. 4a), as were the effect allele frequencies (r = 0.76; Extended Data Fig. 4b). To assess the potential for enhanced, cross-ancestry discovery, we meta-analyzed the Biobank Japan summary statistics with the current analysis, yielding 38 additional new loci at genome-wide significance (Table 2, Fig. 1, and Supplementary Table 17). The sentinel variants were common (MAF > 5%) with weak effects (per-allele ORs: 1.026–1.059; Fig. 1), with the exception of rs75655731 near LINC005999, which was low-frequency (MAF = 1.4%) with a stronger effect (per-allele OR = 1.090); 36 of these associations were included in the 1% FDR set, including the aforementioned associations at TCF7L2 and PNPLA3.
Table 2.
Nearest gene | Lead variant rsID | Chr | Position | Effect allele | Non-effect allele | Odds ratio | 95% CI | P value |
---|---|---|---|---|---|---|---|---|
CCDC30 | rs6656344 | 1 | 42,948,585 | A | C | 0.97 | 0.97–0.98 | 5.0 × 10−9 |
KIAA0040 | rs2285219 | 1 | 175,130,983 | A | T | 1.03 | 1.02–1.04 | 4.0 × 10−9 |
AAK1 | rs12468870 | 2 | 69,679,537 | C | G | 0.97 | 0.96–0.98 | 5.1 × 10−10 |
CXCR4 | rs4954580 | 2 | 136,986,303 | T | C | 0.96 | 0.95–0.98 | 3.0 × 10−9 |
PDE1A | rs1430158 | 2 | 183,262,128 | T | C | 0.97 | 0.97–0.98 | 2.4 × 10−8 |
ATP1B3 | rs7622417 | 3 | 141,625,999 | C | G | 1.03 | 1.02–1.03 | 1.2 × 10−8 |
MECOM | rs11721038 | 3 | 168,849,576 | T | C | 1.05 | 1.03–1.06 | 3.5 × 10−9 |
GNPDA2 | rs12641981 | 4 | 45,179,883 | T | C | 1.03 | 1.02–1.04 | 2.0 × 10−9 |
LOC285696 | rs2652682 | 5 | 17,113,657 | A | T | 0.97 | 0.96–0.98 | 1.0 × 10−9 |
SKP2 | rs5867305a | 5 | 36,157,262 | CA | C | 0.97 | 0.96–0.98 | 3.9 × 10−8 |
SGCD | rs157333 | 5 | 156,117,200 | C | G | 1.04 | 1.03–1.05 | 1.3 × 10−11 |
TFAP2B | rs62405422 | 6 | 50,796,905 | T | C | 0.97 | 0.96–0.98 | 3.1 × 10−10 |
TRAF3IP2–AS1 | rs9400480 | 6 | 111,850,597 | C | G | 1.03 | 1.02–1.04 | 4.1 × 10−8 |
HBS1L | rs9399136 | 6 | 135,402,339 | T | C | 1.03 | 1.02–1.04 | 2.0 × 10−8 |
PDE1C | rs215634 | 7 | 32,369,148 | A | G | 1.03 | 1.02–1.04 | 1.2 × 10−8 |
SEMA3C | rs1019016 | 7 | 80,570,562 | T | G | 1.03 | 1.02–1.04 | 3.8 × 10−10 |
ZKSCAN1 | rs6953441 | 7 | 99,617,067 | A | G | 1.03 | 1.02–1.04 | 6.8 × 10−9 |
LINC00599 | rs75655731a | 8 | 9,721,394 | C | G | 0.92 | 0.89–0.95 | 4.6 × 10−8 |
DOCK8 | rs1536608 | 9 | 223,613 | T | G | 1.03 | 1.02–1.03 | 3.0 × 10−8 |
GPSM1 | rs3935875 | 9 | 139,238,824 | A | G | 0.97 | 0.96–0.98 | 2.2 × 10−8 |
ARHGAP21 | rs7077962 | 10 | 25,054,674 | T | C | 1.03 | 1.02–1.04 | 1.2 × 10−8 |
NRP1 | rs75082222 | 10 | 33,516,373 | T | TA | 0.97 | 0.96–0.98 | 3.5 × 10−8 |
TCF7L2 | rs7903146 | 10 | 114,758,349 | T | C | 1.03 | 1.02–1.04 | 6.2 × 10−9 |
AFAP1L2 | rs646668 | 10 | 116,138,034 | A | G | 1.03 | 1.02–1.04 | 2.8 × 10−9 |
PPP2R1B | rs11410951 | 11 | 111,621,399 | CA | C | 0.97 | 0.96–0.98 | 2.6 × 10−8 |
ACVRL1 | rs2277383 | 12 | 52,314,388 | T | G | 0.97 | 0.95–0.98 | 3.0 × 10−8 |
CNPY2 | rs62956461 | 12 | 56,706,178 | A | AT | 1.06 | 1.04–1.08 | 2.7 × 10−9 |
PAWR | rs8176893 | 12 | 79,999,309 | A | T | 1.03 | 1.02–1.04 | 7.9 × 10−10 |
CDK8 | rs12864131 | 13 | 27,045,939 | A | G | 0.97 | 0.97–0.98 | 1.1 × 10−8 |
GP2 | rs10852238 | 16 | 20,253,374 | A | T | 1.04 | 1.03–1.05 | 4.1 × 10−9 |
XPO6 | rs111806192 | 16 | 28,252,382 | T | G | 0.97 | 0.96–0.98 | 1.8 × 10−8 |
DYNLRB2 | rs16952537 | 16 | 80,185,366 | A | G | 0.97 | 0.97–0.98 | 3.9 × 10−8 |
PIP4K2B | rs16968377 | 17 | 36,942,396 | T | C | 0.95 | 0.93–0.97 | 3.0 × 10−8 |
TIMP2 | rs8075861 | 17 | 76,915,710 | A | C | 0.97 | 0.97–0.98 | 5.0 × 10−9 |
WDR87 | rs73025613 | 19 | 38,334,361 | T | C | 1.03 | 1.02–1.03 | 4.4 × 10−8 |
RRP1B | rs35219138 | 21 | 45,117,913 | A | AT | 0.97 | 0.96–0.98 | 1.8 × 10−8 |
SYN3 | rs4452 | 22 | 33,283,257 | T | C | 0.96 | 0.95–0.97 | 1.2 × 10−8 |
PNPLA3 | rs738408 | 22 | 44,324,730 | T | C | 0.97 | 0.96–0.98 | 3.8 × 10−9 |
Positions are according to GRCh37. Odds ratios (and 95% CIs) are for per-allele effect estimates according to the effect allele. Two-sided P values are from Z scores from a fixed-effect inverse-variance weighted meta-analysis. aVariants that did not reach 1% FDR threshold in primary meta-analysis.
Prioritizing causal variants, genes and biological pathways
Using several independent approaches, we prioritized causal variants, effector genes, relevant tissues and intermediate causal pathways for all 279 significant associations. The presence of a protein-altering (that is, missense or predicted loss of function) variant has been shown to be a strong, causal gene predictor, particularly if the variant is uncommon14. At 52 associations, the sentinel variant, or a strong proxy (r2 ≥ 0.8), was a protein-altering variant (Supplementary Table 18). These included well-known low-frequency missense variants in PCSK9 (p.R46L) and ANGPTL4 (p.E40K)16. Nineteen of the 52 missense variants were new, including a missense variant (rs129415; p.G398R) in SCUBE1 that is strongly correlated with the CAD sentinel variant (r2 = 0.99). SCUBE1 encodes signal peptide-CUB-EGF domain-containing protein 1, a glycoprotein secreted by activated platelets that protect against thrombosis in mice when inhibited27.
Functionally informed fine-mapping
Incorporating functional annotations into fine-mapping approaches has been shown to improve identification of causal variants28–30. Using ChromHMM-derived chromatin states from the NIH Roadmap Epigenomics Consortium to functionally annotate the genome, we found more than twofold enrichment for these states in the ten CAD-relevant cell/tissue types we tested, consistent with previous findings (Supplementary Table 19)7. Of 235 distance-based regions containing genome-wide significant associations, we found 127 (54.0%) with significant enrichment (Supplementary Table 20). The majority (78; 61.4%) of distance-based regions were relatively tissue specific, showing enrichment in less than three tissues, but eight regions showed widespread enrichment in seven or more tissues (Fig. 3a). Adipose (n = 33), liver (n = 26) and aorta (n = 21) showed the greatest enrichment for the most regions (Supplementary Table 20).
We applied a functionally informed fine-mapping method (functional genome-wide association analysis (FGWAS))29, which uses chromatin state enrichment information to reweight GWAS summary statistics and compute variant-specific posterior probabilities of association (PPA). Among the 127 enriched regions, we identified 42 that contained less than five variants in the 95% credible set (Fig. 3b and Supplementary Table 21), while 53 regions contained a variant with PPA ≥ 0.5 (Fig. 3c and Supplementary Table 22) showing that the combination of functional annotation and high statistical power can pinpoint likely causal variants. Indeed, 14 regions were fine-mapped to a single variant, including missense variants in PCSK9, ANGPTL4 and APOE, plus other well-studied noncoding variants, such as rs9349379 (PHACTR1/EDN1)31 and rs2107595 (HDAC9/TWIST1)32.
At 12 loci, fine-mapping prioritized (PPA ≥ 0.5) variants that were not the sentinel. For example, at the low-density lipoprotein (LDL) cholesterol and adiposity-associated MAFB locus33, the sentinel variant was rs2207132 (Supplementary Table 3 and Extended Data Fig. 5a). However, a strongly correlated variant (rs1883711; r2 = 0.92) lies in a region annotated as a likely enhancer in liver and adipose tissue, the two enriched tissues at this locus (Extended Data Fig. 5b). Therefore, rs1883711 was upweighted by FGWAS (PPA = 0.77) over rs2207132 (PPA = 0.13). We queried CAD-associated variants for cis-expression quantitative trait loci (cis-eQTLs) in CAD-relevant tissues from the Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task (STARNET) and Genotype-Tissue Expression (GTEx) studies (Online Methods)34,35. The eQTL for MAFB observed in liver samples from CAD patients in STARNET suggests that the CAD association is mediated by changes in MAFB expression (encoding MAF bZIP transcription factor B; Supplementary Table 22). MafB expression in macrophages is upregulated by oxidized LDL stimulation36, while MafB deficiency in mice appears to increase atherosclerosis by inhibiting foam cell apoptosis37.
Polygenic prioritization of candidate causal genes
Combining locus- and similarity-based approaches has been shown to enhance the prioritization of causal genes14,38. However, established similarity-based methods have not leveraged the full polygenic signal to inform gene prioritization. We therefore incorporated a new similarity-based method for gene prioritization, the Polygenic Priority Score (PoPS), which uses the full genome-wide association data15. We applied PoPS to summary-level data from the GWAS meta-analysis. Initial 57,543 features—including gene expression, protein–protein interaction networks, and biological pathways—were considered, of which 19,091 features (33.2%) passed a marginal feature selection step and were input into the final PoPS model (Online Methods and Supplementary Table 23). We computed a PoPS score for all protein-coding genes within 500 kb of all 279 genome-wide associations and prioritized the gene with the highest PoPS score in each locus, resulting in 235 prioritized genes. PoPS prioritized many well-established genes implicated in CAD pathogenesis, including LDLR, APOB, PCSK9, SORT1, NOS3, VEGFA and IL6R (Supplementary Tables 24 and 25).
Next, we identified features from the PoPS model which were most informative in prioritizing CAD-relevant genes. Hierarchical clustering yielded 2,852 clusters, which we ranked by relative contribution to the PoPS scores of prioritized genes (Fig. 4a). The highest-ranking cluster contained features indicating homeostatic regulation of blood lipids (Supplementary Table 26). Other top clusters were related to vascular cell function, migration and proliferation; the structure and function of the extracellular matrix and metabolic pathways including those in adipose tissue controlling thermoregulation, all well-established mechanisms in CAD pathogenesis39–41. Additional high-ranking clusters highlighted early developmental processes and cell cycle signaling pathways as less recognized, but important, mediators of CAD risk.
We then examined a locus where the PoPS method facilitated the prioritization of a putative causal gene. Lead variant rs1807214 lies in an intergenic region of chromosome 15 at which no causal gene has been established7,8. Data from GTEx and STARNET identified cis-eQTLs for ABHD2, MFGE8 and HAPLN3 (Supplementary Tables 27 and 28). Prior locus-based algorithms have prioritized the nearest gene, ABHD2, located 65 kb downstream of the sentinel variant5,38. However, PoPS prioritized MFGE8, located 108 kb upstream of the sentinel (Fig. 4b). MFGE8 encodes lactadherin, an integrin-binding glycoprotein implicated in vascular smooth muscle cell (VSMC) proliferation and invasion, and the secretion of proinflammatory molecules42,43. In vitro deletion of this intergenic region by CRISPR–Cas9 increases MFGE8 expression—with no change to ABHD2 expression—and MFGE8 knock-down reduces coronary artery (CA)-VSMC and monocyte (THP-1) proliferation, lending functional support to MFGE8 as a likely causal mediator of the CAD association in this region44.
Systematic prioritization of putative causal genes
We developed and applied a consensus-based prioritization framework involving eight similarity-based or locus-based predictors to systematically prioritize likely causal genes for all 279 genome-wide associations (Online Methods and Fig. 5a). Most likely causal genes were selected based on the highest (unweighted) number of the eight predictors. To test this framework, we generated an a priori set of 30 ‘positive control’ genes with well-established causal roles in CAD and assessed the accuracy of each predictor (Supplementary Table 29). Twenty-eight of the 30 positive control genes were correctly prioritized as the most likely causal gene based on the highest number of concordant predictors with a median of four concordant predictors per gene (Supplementary Table 30). All predictors demonstrated high accuracy, including nearest gene (90%), PoPS (90%), eQTL (85%) and mouse knock-outs (100%; Supplementary Table 30).
We were able to prioritize a likely causal gene at 239 (85.7%) of the genome-wide associations based on having two or more concordant predictors, resulting in the prioritization of 220 genes (Supplementary Table 31). We considered 123 of these genes strongly prioritized (three or more concordant predictors; Fig. 5b and Supplementary Fig. 1). For 21 genes, the prioritized gene was not the nearest gene to the sentinel variant, including APOC3, PLTP and LOX. Agreement (the proportion of times that a predictor prioritized the same gene as the most likely causal gene) was high across predictors, including nearest gene (84%), PoPS (83%) and eQTLs (86%; Fig. 5a). Concordance (the proportion of times a pair of predictors both provided evidence for the consensus-based causal gene) was more variable (Extended Data Fig. 6); nearest gene and the presence of a protein-altering variant were typically concordant (71%), whereas monogenic genes and eQTLs were much less concordant (35%).
Candidate loci with converging lines of evidence
Several newly identified CAD risk loci had strong variant- and gene-level evidence supporting their candidacy for functional interrogation. For example, we identified a CAD-associated region that was most strongly enriched in the aorta (Supplementary Table 3), with an intronic variant (rs4074793) in ITGA1 having a PPA of 0.95 (Extended Data Fig. 7a,b). Lead variant rs4074793 lies in a region annotated as a likely enhancer in several tissues and is the lead variant for a strong cis-eQTL for ITGA1 in liver among CAD patients from STARNET (P = 1.8 × 10−73; Extended Data Fig. 7c). This eQTL was also seen in aorta, subcutaneous fat and mammary artery (Extended Data Fig. 7d). No other gene expression signals were seen at this locus, while PoPS also strongly prioritized ITGA1 as the likely causal gene (Supplementary Table 31). ITGA1 encodes integrin subunit alpha-1, a widely expressed protein that forms a heterodimer with integrin beta-1 and acts as a cell surface receptor for extracellular matrix components, such as collagens and laminins. The CAD risk allele (rs4074793-G), or strong proxies, were associated with elevated liver enzymes45, C-reactive protein and LDL cholesterol46, highlighting the influence of altered ITGA1 expression in the liver on lipid pathways as a likely causal pathway to CAD.
We also identified a new association with CAD at a gene-dense region enriched for epigenetic annotations in adipose, liver, monocytes and skeletal muscle myoblasts (Fig. 6a and Supplementary Table 20). FGWAS prioritized rs7246865 as the putative causal variant (PPA = 0.71). Among 30 genes within 500 kb of rs7246865, PoPS prioritized MYO9B (Supplementary Table 24), which encodes unconventional myosin-IXb, a myosin protein with Rho-GTPase signaling activity involved in cell migration47. Evidence for the involvement of MYO9B was also provided by a cis-eQTL in tibial artery in GTEx (P = 5.3 × 10−8), with the CAD risk allele exhibiting lower MYO9B expression (Supplementary Table 27).
Experimental interrogation of a new CAD locus
We proceeded to investigate the functional significance of the MYO9B locus with respect to CAD risk. This genomic region is contained within a vascular tissue enhancer, as identified by a strong H3K27ac ChIP-seq signal in coronary artery, aorta and tibial artery (Fig. 6b). Using ATAC-seq of primary vascular cells, we identified open chromatin at rs7246865 in the following three cell types of relevance to CAD: immortalized human aortic endothelial cells (ECs), CA-VSMCs and monocytes (Fig. 6b).
We used CRISPR–Cas9 to delete the enhancer sequence in these cell types (Fig. 6b), achieving 53–72% effective deletion of a 131-bp segment within the enhancer (Fig. 6c). We measured the transcriptional effect of enhancer deletion on all genes expressed in these cell types within a 250-kb window surrounding rs7246865. The enhancer deletion resulted in reduced MYO9B and HAUS8 expression in ECs (Fig. 6d) and reduced MYO9B expression in CA-VSMCs (Fig. 6e), compatible with vascular GTEx eQTLs. There was no change in the expression of any other genes in the region in either cell type or of any gene in monocytes.
Finally, we sought to understand whether the enhancer is associated with a cellular phenotype of relevance to CAD. Given the cytoskeletal functions of MYO9B and HAUS8 in other cell types47,48, we assessed the effects of these genes in a monolayer wound-healing assay, a composite of cell migration and proliferation49. We observed that ECs with the enhancer deletion exhibited impaired wound healing, as did ECs with knock-outs of either MYO9B or HAUS8, suggesting that the regulatory effect of the enhancer contributes to CAD risk through impaired wound healing in ECs (Fig. 6f). We did not observe any effect on migration with deletion of the noncoding enhancer or MYO9B in CA-VSMCs.
Discussion
In a discovery analysis involving >200,000 cases of CAD and >1 million controls, we identified 279 genome-wide significant associations, including 82 reported here for the first time. We objectively prioritized likely causal variants and effector genes across all associations using functionally informed fine-mapping, a recently developed genome-wide gene prioritization method (PoPS), and systematic integration of locus-based and similarity-based predictors, with several tailored specifically to cardiovascular disease. Finally, informed by our prioritization framework, we experimentally interrogated a new CAD signal to establish a putative, mechanistic link between this genomic region and risk of CAD.
The large sample size enabled detection of more than 80 new genetic associations with CAD, predominantly common weak-effect variants. Our findings suggest that future, larger GWAS—at least those in European ancestry populations—are unlikely to discover many more large-effect common variants (that is, those with ORs greater than 1.05) associated with CAD. In fact, additional associations contributing to the long polygenic tail of CAD risk are likely to arise from the ~650 predominantly weak-effect signals among associations that reached the 1% FDR threshold, which in aggregate explained ~36% of the heritability of CAD. Notably, we identified 38 new loci when we incorporated recently published GWAS results based on only 29,000 CAD cases from Biobank Japan, demonstrating that future multi-ancestry analyses should enhance the yield of genetic discovery for CAD.
Consistent with previous studies, we demonstrated that a genome-wide PRS derived from this GWAS strongly predicts both incident and recurrent CAD50–53. Notably, our new PRS demonstrated improved ability to discern those at higher and lower risk of CAD as compared to a widely used PRS derived from an earlier GWAS of ~61,000 CAD cases52. While the new PRS provides an improved tool for genetic risk prediction of CAD in the setting of primary and secondary prevention, our findings suggest that further increases in European-ancestry GWAS sample size may only modestly improve the predictive ability of the CAD PRS. More substantive improvements in polygenic risk prediction may arise from methodological developments, such as approaches that model interactions between variants or incorporate functional information54,55. Moreover, further investigations are required to understand the extent to which genetic discovery analyses that include more non-European ancestry participants will improve the portability of PRS across ancestries, and whether this will result in improved prediction across all ancestry groups56.
The weak effects of most CAD-associated variants do not preclude their contribution to important etiological insights with therapeutic implications, as the effects of pharmacologically perturbing identified targets are typically much stronger than those of naturally occurring genetic variants that are common in the population. For example, we uncovered common variant associations of weak effect at HMGCR and NPC1L1, which encode the targets of HMG-CoA reductase inhibitors (statins) and ezetimibe, respectively, two of the most effective and commonly prescribed medications for the prevention and management of CAD through lowering blood lipid levels. However, the translation of statistical associations into actionable biology and potential therapeutic targets requires elucidation of causal genes and mechanisms, which has lagged behind the rapid growth in genetic association discoveries.
Here we implemented strategies to enhance the identification of putative causal variants, genes and biological pathways. By incorporating epigenomic enrichment in disease-relevant tissues—a previously shown approach to improve fine-mapping over broader, disease-agnostic approaches29—we prioritized likely causal variants that were not always those with the strongest statistical associations. Using a recently developed similarity-based tool (PoPS) that exploits the full genome-wide data to identify disease-enriched features, we prioritized >200 likely causal genes. Support for the validity of the genes prioritized by PoPS comes from the high ranking of features of known relevance to atherosclerosis (for example, lipid metabolism, extracellular matrix processes) from more than 50,000 tested features; the correct assignment of the most likely causal gene at several well-established lipid and nonlipid CAD loci; selection of the likely-correct causal gene over several other candidates in a region, including those in closer proximity to the sentinel (for example, MFGE8); and corroborating evidence at many loci from orthogonal gene prioritization methods, such as eQTLs in disease-relevant tissues.
As support from multiple, orthogonal lines of evidence increases the likelihood of prioritizing the correct causal gene, we propose an integrative, consensus-based prioritization framework that incorporates eight complementary predictors. By applying this framework to all 279 genome-wide associations, we systematically enhance the level of evidence around both known and new risk loci for CAD to arrive at 123 genes strongly prioritized on the basis of having three or more concordant predictors. Although distance from the sentinel variant has been shown to be a reasonable predictor of causal genes across many phenotypes14,38, our integrative approach prioritized a gene that was not the nearest gene for 15% of associations. Also, at several newly identified associations, such as those nearest ITGA1 and MYO9B, we provide complementary lines of in silico evidence to nominate potential causal variants, genes and mechanistic pathways. Finally, we leveraged genome-editing and cell-based assays to interrogate the new association signal at chromosome 19, validating the involvement of MYO9B, but also implicating another putative causal gene, HAUS8. Importantly, these experimental findings substantiate our in silico prioritization of a region with apparent regulatory influence, and our similarity-based prioritization of cell migration pathways, as both MYO9B and HAUS8 may exert their influence on CAD risk through the control of vascular cell cytoskeleton. Furthermore, the findings raise the possibility that two genes at a locus may regulate a common, cellular pathway in coordinated fashion, such as seen for COL4A1 and COL4A2 at a well-established CAD risk locus57. While experimental evidence is ultimately required to confirm causal mechanisms at all unresolved CAD risk loci, we provide a prioritization framework yielding evidence-based candidates that may be amenable to analogous functional follow-up.
Methods
Genetic discovery meta-analysis
Details of the ten de novo studies, including the source of participants, case and control definitions, basic participant characteristics, and ethics approval, are provided in Supplementary Note, Supplementary Table 1 and Extended Data Fig. 1. Study-specific sample and variant filters were applied before additive logistic (or logistic mixed) models were run, with CAD status as the outcome and adjusting for study-specific covariates, including those accounting for potential ancestry effects.
We performed an inverse-variance weighted meta-analysis on the betas and standard errors using METAL58, combining the results from the ten de novo studies with previously published summary statistics. Variant-specific sample sizes were maximized by using a combination of summary statistics from prior CAD meta-analyses of the CARDIoGRAMplusC4D consortium, and additional variant filtering was performed, as detailed in Supplementary Note2,7,10,16. The final dataset included 20,073,070 variants.
Joint association analysis
We performed joint association analysis using GCTA software59. This approach fits an approximate multiple regression model using summary-level meta-analysis statistics and LD corrections estimated from a reference panel (here the UKBB sample using European ancestry participants only). We adopted a chromosome-wide stepwise selection procedure to select variants and estimate their joint effects at (i) a genome-wide significance level (PJoint ≤ 5.0 × 10−8) in the meta-analyzed variants that reached genome-wide significance (n = 18,348) and (ii) an FDR 1% P value cut-off (PJoint ≤ 2.52 × 10−5) in the 1% FDR variant list (n = 47,622). We identified 241 independent variants at the genome-wide significance threshold and 897 independent variants within the 1% FDR list.
Identifying previously reported regions and associations
To identify regions of the genome previously reported as having associations with CAD, we first collapsed variants reaching genome-wide significance by clumping variants within 500 kb of each other into a single locus. We compared these regions with all variants previously found to be associated with CAD at a genome-wide level of significance (P ≤ 5.0 × 10−8) from previous large-scale genetic association studies of CAD. Regions were annotated as ‘known’ if they included a previously reported CAD-associated variant. To assess which of our associations were previously reported or new, we examined the pairwise correlation between each of our 279 genome-wide significant sentinel variants and any nearby previously reported variants, defining ‘new’ as having r2 < 0.2 in UK Biobank European ancestry participants.
Genetic correlation analysis
Genetic correlation between CAD and conventional risk factors (total cholesterol, LDL cholesterol, HDL cholesterol, triglycerides, body mass index, systolic blood pressure and diastolic blood pressure) and cardiometabolic diseases (type 2 diabetes, ischemic stroke and heart failure) was assessed using LD Score Regression (LDSC)60. We used the 1000 Genomes European ancestry LD file comprising ~1.2 million variants available at https://alkesgroup.broadinstitute.org/LDSCORE/.
PheWAS in UK Biobank
To understand the spectrum of phenotypic consequences of our 279 independent associations with CAD, we conducted a PheWAS in the UK Biobank (see Supplementary Note for complete analysis details). Briefly, we tested for associations with 53 cardiovascular and noncardiovascular diseases and 32 continuous traits, as listed in Supplementary Tables 32 and 33. A genetic variant was considered to be associated with a ‘conventional CAD risk factor’ if the CAD risk-increasing allele exhibited a directionally consistent/positive association with blood lipids (total cholesterol, LDL cholesterol, triglycerides or a diagnosis of hypercholesterolemia); blood pressure (systolic blood pressure, diastolic blood pressure or a diagnosis of hypertension); hyperglycemia (serum glucose, hemoglobin A1c or a diagnosis of type 2 diabetes) or adiposity (body mass index).
Rare variant analyses
Variant annotation was performed using Variant Effect Predictor (VEP) v96.0 with LOFTEE plugin on version three imputed data and variants with an information score ≥0.8 (refs. 61,62). Various gene-based groupings were tested (Supplementary Table 6) and allele frequencies from the entire UK Biobank cohort were used for groupings. Variants (n = 64,102) were considered to be in a gene if they fell within the gene coordinates as defined by GENCODE v19. Gene-based association tests were performed in SAIGE-GENE v0.35.8.5 using a white British subset of UK Biobank (28,683 CAD cases and 367,783 controls)63. Software defaults were used except in step 0 the number of markers for sparse matrix was 2000, and in step 1, the tolerance for preconditioned conjugate gradient to converge was 0.01 and variance ratios were estimated across MAC categories. Two variants were required in each gene for testing. Covariates in the model included the genotyping array, the first five principal components calculated in the white British subset of samples, birth year, and sex. Burden, SKAT, and SKAT-O tests were performed for each gene. As no strong signals were observed except for the PCSK9 gene, we did not extend our rare variant testing to other studies.
Sex-specific analysis
We performed a sex-stratified GWAS analysis in UK Biobank following the same phenotype definition and sample exclusions with the main analysis. We used the SAIGE software and adjusted our single-variant association analysis for the first five genetic principal components and the genotyping array, separately for men and women64. Based on promising initial results in UK Biobank, we collated sex-stratified GWAS summary statistics, as available, from other participating studies (Supplementary Table 6). Additional details of sex-specific analyses are provided in Supplementary Note.
FDR estimation
The FDR following the meta-analysis was assessed using the ‘q value’ R package. We generated q values for all 20.1 million variants. The P value cut-off for a q value of 1% was 2.52 × 10−5 and there were 47,622 variants reaching that threshold. Joint conditional analysis was performed using GCTA (as described earlier) to identify approximately independent association signals.
Estimation of heritability explained
Heritability calculations were based on a multifactorial liability-threshold model, implemented in the INDI-V calculator (http://cnsgenomics.com/shiny/INDI-V/), under the assumption of a baseline population risk (K) of 0.0719 and a twin heritability (HL2) of 0.4 (refs. 65,66). Single-variant regression estimates from the meta-analysis summary statistics were used to estimate heritability for the sentinel variants at the 241 conditionally independent genome-wide significant associations and the 897 conditionally independent associations reaching the 1% FDR threshold in the primary meta-analysis. To account for correlation between variants, multiple regression estimates from the GCTA joint association analysis were also used to estimate heritability for both sets of variants.
Cross-ancestry comparison
For cross-ancestry comparison, we used summary statistics from a recent GWAS of 29,319 CAD cases and 183,134 controls from Biobank Japan3. In total, 199 of the 241 sentinel variants from our primary meta-analysis were also found in the Biobank Japan study; after aligning effect alleles, we compared the beta estimates and minor allele frequencies using Pearson’s correlation coefficient. To investigate the effect of outliers on the between-ancestry correlation of beta estimates, we re-estimated the correlation coefficient after excluding three strong outliers (at ATXN2, FER and SLC22A1). We then performed an inverse-variance weighted meta-analysis on the beta estimates and standard errors, incorporating summary results from Biobank Japan and those from all other studies in our primary meta-analysis. After cross-ancestry meta-analysis, we again dropped variants that were only present in one study or had fewer than 30,000 cases in total from all contributing studies, leaving 23,333,163 variants after filtering. We then collapsed variants reaching genome-wide significance (P ≤ 5.0 × 10−8) by clumping variants within 500 kb into a single locus, resulting in 38 additional loci that did not contain a previously reported CAD variant.
Derivation and training of PRSs
PRS were derived using the pruning and thresholding method or the LDpred computational algorithm (LDpred v.1.0), with 503 European ancestry individuals derived from the 1000 Genomes Project study serving as the linkage disequilibrium reference panel67. To evaluate the added utility of our GWAS for the prognostication of CAD risk, we compared two sets of scores using effect estimates from either the current meta-analysis or from our previous 1000 Genomes-imputed GWAS of CAD involving ~60,000 cases7. For each derivation method and summary statistic, we constructed a range of scores of varying sizes drawing from common genetic variants that overlapped between the current meta-analysis, the earlier 1000 Genomes-imputed CAD GWAS and our training/validation datasets from the MDC Study68. Additional details on PRS derivation and training are contained in Supplementary Note.
Incident event prediction analyses
Cox proportional hazard models were used to assess the time-to-event relationship between each PRS and incident CAD events in the MDC study (see Supplementary Note for study details). Baseline models were adjusted for age and sex only, and then subsequently, for established risk factors for CAD (total cholesterol, HDL cholesterol, systolic blood pressure, body mass index, type 2 diabetes, current smoking status and family history of CAD). Harrell C-statistics were estimated using Cox proportional hazard analysis over a 21-year follow-up period to assess the discrimination of the PRS.
Recurrent event prediction analyses
The two optimal PRS (2022 PRS and 2015 PRS) were calculated in participants of the genetic substudy of the FOURIER trial (see Supplementary Note for trial details) using the genotype dosage for each allele, multiplied by its weight and then summed across all variants. Patients received a raw score standardized per 1 s.d. (continuous), as well as a percentile score relative to the total cohort. All scoring was performed using PLINK v2.0 (www.cog-genomics.org/plink/2.0/)69. Model goodness-of-fit was evaluated using the concordance statistic and Akaike’s Information Criterion. R version 3.6.1 was used for statistical analyses.
The clinical outcome of interest was recurrent major coronary events, defined as myocardial infarction, coronary revascularization or death from CAD (nincident_cases = 673). Participants in the genetic cohort were followed for a median of 2.3 years. All endpoints were formally adjudicated by a blinded clinical events committee during the trial. A Cox model was used to determine the HR per 1 s.d. higher level of the PRS and for the extreme deciles compared to the middle 80%. Analyses were adjusted for age, sex and ancestry (using principal components 1–5).
Identifying protein-altering variants
To identify protein-altering variants among our genome-wide significant associations, we took the 279 sentinel variants and their LD proxies at r2 ≥ 0.8 as estimated in the European ancestry subset of UK Biobank and annotated them using the Ensembl VEP62. We selected for each sentinel variant any proxies identified as having a ‘high’ (that is, stop-gain and stop-loss, frameshift indel, donor and acceptor splice-site and initiator codon variants) or ‘moderate’ (that is, missense, in-frame indel, splice region) consequence and recorded the gene that the variant disrupts.
Functional GWAS analysis
To fine-map loci and identify credible functional variants, we applied FGWAS software29. The software integrates GWAS summary statistics with epigenetic data and we used the ChromHMM-derived states from the NIH Roadmap Epigenomics Consortium on a selection of ten CAD-relevant cell/tissue types (adipose nuclei, aorta and human skeletal muscle myoblasts (HSMM), liver, human umbilical vein endothelial cells (HUVEC), kidney, adrenal gland, pancreatic islets, primary monocytes and T-cells from peripheral blood)70,71. To maximize our search space to find functional elements, we prepared a custom state by merging likely functional ChromHMM states (enhancers, transcription start sites, repressed polycomb, transcription at 5′ and 3′ of gene) for each genomic position. We reweighted the GWAS by running a null model and then a model containing the custom annotation for each of the ten tissues. Regions of the genome that showed strong enrichment (>3 s.d. increment in Bayes factor (BF)) and had a genome-wide significant CAD-associated variant (P < 5.0 × 10−8) were selected. For each region, we identified the tissue that showed maximum increment in BF and then constructed a 95% credible functional set of variants based on the ranked PPA for each variant within a region.
eQTL analysis in CAD-relevant tissues
To examine whether the CAD associations were driven by changes in gene expression in CAD-relevant tissues and cell types, we interrogated cis-eQTLs from CAD-relevant tissues in the STARNET eQTL study and the GTEx study34,35. Analysis-specific details are provided in the Supplementary Note.
Polygenic prioritization of candidate causal genes
We implemented PoPS, a similarity-based gene prioritization method designed to leverage the full genome-wide signal to nominate causal genes independent of methods utilizing GWAS data proximal to the gene15. Broadly, PoPS leverages polygenic enrichments of gene features including cell-type-specific gene expression, curated biological pathways and protein–protein interaction networks (Supplementary Table 23) to train a linear model to compute a PoPS for each gene (see Supplementary Note for further details).
Variants responsible for cardiovascular-relevant monogenic disorders
To identify genes harboring pathogenic variants responsible for cardiovascular-relevant monogenic disorders, we searched the NCBI’s ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/) on 26 June 2020. Variants were pruned to those within ±500 kb of our CAD sentinel variants; categorized as ‘pathogenic’ or ‘likely pathogenic’; with a listed phenotype; and with either (i) details of the evidence for pathogenicity, (ii) expert review of the gene or (iii) a gene that appears in practice guidelines. We then filtered variants that were annotated with a manually curated set of cardiovascular-relevant phenotype terms, including those related to CAD, CAD risk factors (lipids, metabolism, blood pressure, obesity and platelets), bleeding disorders and relevant cardiac, vasculature or neurological abnormalities (Supplementary Table 34). Where a variant was annotated with multiple genes, both genes were considered as potentially pathogenic.
Phenotyping knock-out mice
Human gene symbols were mapped to gene identifiers (HGNC) and mouse ortholog genes were obtained using Ensembl (www.ensembl.org). Phenotype data for single-gene knock-out models were obtained from the International Mouse Phenotyping Consortium, data release 10.1 (www.mousephenotype.org), and from the Mouse Genome Informatics database, data from July 2019 (www.informatics.jax.org). For each mouse model, reported phenotypes were grouped using the mammalian phenotype ontology hierarchy into broad categories relevant to CAD as follows: cardiovascular physiology (MP:0001544), cardiovascular morphology (MP:0002127), growth and body weight (MP:0001259), lipid homeostasis (MP:0002118), cholesterol homeostasis (MP:0005278) and lung morphology (MP:0001175). This resulted in mapping from genes to phenotypes in animals (Supplementary Table 35).
Rare variant associations, MR and drug evidence
To inform prioritization of causal genes within 1-Mb regions around our genome-wide associations, we reviewed the literature for three sources of evidence as follows: (1) rare coding variants previously associated with CAD, either individually or in aggregate gene-based tests, through whole-exome sequencing (WES) or exome array studies; (2) Mendelian randomization (MR) studies of gene expression, protein levels or proximal phenotypes that implicate specific genes as causal effector genes for CAD and (3) drugs proven to be effective for cardiovascular-relevant indications and that target specific proteins encoded by genes.
Systematic integration of gene prioritization evidence
To systematically prioritize likely causal genes for all 279 genome-wide associations, we integrated the following eight of the aforementioned similarity-based or locus-based predictors of causal genes: (1) the top two prioritized genes from PoPS; (2) genes with eQTLs in CAD-relevant tissues from STARNET or GTEx; (3) genes containing protein-altering variants that are in strong LD (r2 ≥ 0.8) with the CAD sentinel variant; (4) genes harboring variants responsible for monogenic disorders of cardiovascular relevance according to ClinVar; (5) genes containing rare coding variants that have been associated with CAD risk in previous WES or array-based studies; (6) genes encoding proteins of causal relevance to CAD per MR studies or that are targets for established cardiovascular drugs; (7) genes that display cardiovascular-relevant phenotypes in knock-out mice from the International Mouse Phenotyping Consortium or Mouse Genome Informatics database; and (8) the nearest gene to the CAD sentinel variant (Fig. 5a). We prioritized the most likely ‘causal gene’ for each association using a consensus-based approach, selecting the gene with the highest, unweighted sum of evidence across all eight predictors.
We tested our approach by evaluating whether 30 (positive control) genes with established relevance to CAD were prioritized as the most likely causal genes within their respective genomic regions. Positive control genes were selected by a literature search that sought evidence from engineered mouse models of reduced gene expression (‘knock-out’ or ‘knock-down’ models), MR studies or successful drug targets. In addition, we defined two measures to summarize the relative contributions of individual predictors and pairs of predictors to the consensus-based approach. Specifically, we defined ‘agreement’ as the proportion of times that an individual predictor prioritized the same gene that was nominated as the most likely causal gene by the consensus-based framework. ‘Concordance’ was defined as the proportion of times a pair of predictors both converged on the gene that was nominated as the most likely causal gene by the consensus of the eight predictors.
CRISPR–Cas9 genome editing in vascular cells
Human coronary artery VSMCs (Lonza CC-2583; culture media CC-31182) were used at passage five or earlier. Endothelial cell experiments were conducted with immortalized human aortic endothelial cells (ATCC CRL-4052; culture media Lifeline Cell Technology LL-0003). Monocyte experiments were conducted with THP-1 monocyte cells (ATCC TIB-202; culture media RPMI ATCC 30-2001, 10% FBS Sigma 12306C-500ML). Genome editing was performed as previously described (Supplementary Note)72.
Gene expression by qPCR
For assessment of gene expression, mRNA was extracted (Qiagen RNAeasy kit; Qiagen, 74106) and DNase I digestion was performed (DNAse I, Thermo Fisher 18068015) before cDNA synthesis (Applied Biosystems, 43-688-14) and qPCR (Applied Biosystems, 4444965). Gene expression was assessed by quantitative PCR with Taqman probes (Invitrogen) for genes of interest (MYO9B: Hs00994622_m1; HAUS8: Hs00928622_m1; OCEL1: Hs00928613_m1; USE1: Hs00218426_m1; NR2F6: Hs00172870_m1; GAPDH: Hs03929097_g1). Data are shown relative to expression of GAPDH. Statistical analyses were conducted with unpaired two-way Student’s t test.
Noncoding enhancer characterization
Assay for transposase-accessible chromatin using sequencing (ATAC-seq) data for THP-1 monocytes and CA-VSMCs was previously available. We performed ATAC-seq in human immortalized aortic endothelial cells as previously described73. H3K27ac CHIP-seq data were publicly available via ENCODE (coronary artery, ENCFF970RKM; aorta, ENCFF118EKX; tibial artery, ENCFF972ZHA).
Wound-healing assay
Wound-healing assays were performed as previously described (Platypus Technologies, CMAUFL4)49. After genome editing, 15,000 cells per well were plated with well inserts in place in culture media. Inserts were then removed the day after plating. Prior to complete wound healing (48–72 h), cells were stained with Calcein AM dye (Invitrogen, C3099) and wound healing was quantified with a fluorescence plate reader (excitation 488 nm/emission 522 nm). Statistical analyses were conducted with one-way ANOVA between groups. Where specific software tools are not named, we used Stata or R for analyses.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-022-01233-6.
Supplementary information
Acknowledgements
T. Kessler is supported by the Corona-Foundation (Junior Research Group Translational Cardiovascular Genomics) and the German Research Foundation (DFG) as part of the Sonderforschungsbereich SFB 1123 (B02). T.J. was supported by a Medical Research Council DTP studentship (MR/S502443/1). J.D. is a British Heart Foundation Professor, European Research Council Senior Investigator, and National Institute for Health and Care Research (NIHR) Senior Investigator. J.C.H. acknowledges personal funding from the British Heart Foundation (FS/14/55/30806) and is a member of the Oxford BHF Centre of Research Excellence (RE/13/1/30181). R.C. has received funding from the British Heart Foundation and British Heart Foundation Centre of Research Excellence. O.G. has received funding from the British Heart Foundation (BHF) (FS/14/66/3129). P.S.d.V. was supported by American Heart Association grant number 18CDA34110116 and National Heart, Lung, and Blood Institute grant R01HL146860. The Atherosclerosis Risk in Communities study has been funded in whole or in part with Federal funds from the National Heart, Lung and Blood Institute, National Institutes of Health, Department of Health and Human Services (contract HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I and HHSN268201700005I), R01HL087641, R01HL059367 and R01HL086694; National Human Genome Research Institute contract U01HG004402; and National Institutes of Health contract HHSN268200625226C. We thank the staff and participants of the ARIC study for their important contributions. Infrastructure was partly supported by grant UL1RR025005, a component of the National Institutes of Health and NIH Roadmap for Medical Research. The Trøndelag Health Study (The HUNT Study) is a collaboration between HUNT Research Centre (Faculty of Medicine and Health Sciences, NTNU, Norwegian University of Science and Technology), Trøndelag County Council, Central Norway Regional Health Authority and the Norwegian Institute of Public Health. The K.G. Jebsen Center for Genetic Epidemiology is financed by Stiftelsen Kristian Gerhard Jebsen; Faculty of Medicine and Health Sciences, NTNU, Norwegian University of Science and Technology; and Central Norway Regional Health Authority. Whole genome sequencing for the HUNT study was funded by HL109946. The GerMIFs gratefully acknowledge the support of the Bavarian State Ministry of Health and Care, furthermore founded this work within its framework of DigiMed Bayern (grant DMB-1805-0001), the German Federal Ministry of Education and Research (BMBF) within the framework of ERA-NET on Cardiovascular Disease (Druggable-MI-genes, 01KL1802), within the scheme of target validation (BlockCAD, 16GW0198K), within the framework of the e:Med research and funding concept (AbCD-Net, 01ZX1706C), the British Heart Foundation (BHF)/German Centre of Cardiovascular Research (DZHK)-collaboration (VIAgenomics) and the German Research Foundation (DFG) as part of the Sonderforschungsbereich SFB 1123 (B02), the Sonderforschungsbereich SFB TRR 267 (B05), and EXC2167 (PMI). This work was supported by the British Heart Foundation (BHF) under grant RG/14/5/30893 (P.D.) and forms part of the research themes contributing to the translational research portfolios of the Barts Biomedical Research Centre funded by the UK National Institute for Health Research (NIHR). I.S. is supported by a Precision Health Scholars Award from the University of Michigan Medical School. This work was supported by the European Commission (HEALTH-F2–2013-601456) and the TriPartite Immunometabolism Consortium (TrIC)-NovoNordisk Foundation (NNF15CC0018486), VIAgenomics (SP/19/2/344612), the British Heart Foundation, a Wellcome Trust core award (203141/Z/16/Z to M.F. and H.W.) and the NIHR Oxford Biomedical Research Centre. M.F. and H.W. are members of the Oxford BHF Centre of Research Excellence (RE/13/1/30181). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. C.P.N. and T.R.W. received funding from the British Heart Foundation (SP/16/4/32697). C.J.W. is funded by NIH grant R35-HL135824. B.N.W. is supported by the National Science Foundation Graduate Research Program (DGE, 1256260). This research was supported by BHF (SP/13/2/30111) and conducted using the UK Biobank Resource (application 9922). O.M. was funded by the Swedish Heart and Lung Foundation, the Swedish Research Council, the European Research Council ERC-AdG-2019-885003 and Lund University Infrastructure grant ‘Malmö population-based cohorts’ (STYR 2019/2046). T.R.W. is funded by the British Heart Foundation. I.K., S. Koyama, and K. Ito are funded by the Japan Agency for Medical Research and Development, AMED, under grants JP16ek0109070h0003, JP18kk0205008h0003, JP18kk0205001s0703, JP20km0405209 and JP20ek0109487. The Biobank Japan is supported by AMED under grant JP20km0605001. J.L.M.B. acknowledges research support from NIH R01HL125863, American Heart Association (A14SFRN20840000), the Swedish Research Council (2018-02529) and Heart Lung Foundation (20170265) and the Foundation Leducq (PlaqueOmics: New Roles of Smooth Muscle and Other Matrix Producing Cells in Atherosclerotic Plaque Stability and Rupture, 18CVD02. A.V.K. has been funded by grant 1K08HG010155 from the National Human Genome Research Institute. K.G.A. has received support from the American Heart Association Institute for Precision Cardiovascular Medicine (17IFUNP3384001), a KL2/Catalyst Medical Research Investigator Training (CMeRIT) award from the Harvard Catalyst (KL2 TR002542) and the NIH (1K08HL153937). A.S.B. has been supported by funding from the National Health and Medical Research Council (NHMRC) of Australia (APP2002375). D.S.A. has received support from a training grant from the NIH (T32HL007604). N.P.B., M.C.C., J.F. and D.-K.J. have been funded by the National Institute of Diabetes and Digestive and Kidney Diseases (2UM1DK105554). EPIC-CVD was funded by the European Research Council (268834) and the European Commission Framework Programme 7 (HEALTH-F2-2012-279233). The coordinating center was supported by core funding from the UK Medical Research Council (G0800270; MR/L003120/1), British Heart Foundation (SP/09/002, RG/13/13/30194, RG/18/13/33946) and NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. This work was supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome. Support for title page creation and format was provided by AuthorArranger, a tool developed at the National Cancer Institute.
Extended data
Author contributions
GWAS was conducted by K.G.A., T.J., B.N.W., W.Z., C.R., I.S., L.M.V., B.O.Å., D.O.A., A.B., J.D., G.D., P.D., P.T.E., J.E., O.G., P.G., D.F.G., U.G., S.M.I.H., A.H., G.H., H.H., K.H., A.V.K., I.J.K., S. Kathiresan, T. Kessler, T. Kyriakou, A.K., L.L., N.A.M., T.M., S.M., M.M., C.P.N., J.B.N., M.M.N., S.P., L.S.R., T.R., C.T.R., M.S.S., H.S., K.S., G. Thorgeirsson, G. Thorleifsson, U.T., M.v.S., C.J.W and T.R.W. Discovery meta-analysis of data was conducted by K.G.A., T.J. and A.S.B. Conditional analysis, FDR analysis, and heritability estimation were performed by S. Kanoni. Rare variant analysis was performed by B.N.W., W.Z., I.S. and C.J.W. Sex analysis was performed by S. Kanoni, P.D., K.G.A., B.O.Å., E.B., M.R.B., B.B., A.S.B., R.C., J.D., P.S.d.V., J.E., M.F., A.G., C.G., U.G., S.M.I.H., G.H., J.C.U., K.H., M.I., T.J., A.V.K., S. Kathiresan, T. Kessler, T. Kyriakou, A.K., L.L., T.M., A.C.M., S.M., L.M.V., M.M., F.M., J.B.N., M.M.N., S.P., T.R., H.S., I.S., M.v.S., H.W., C.J.W., B.N.W., P.Z. and W.Z. PheWAS was conducted by K.G.A. Cross-ancestry analysis was performed by K.G.A., T.J., K. Ishigaki, K. Ito, Y.K., I.K., S. Koyama and A.S.B. Association of polygenic risk scores was calculated by K.G.A., M.W., G.H., C.R., N.A.M., F.K.K., L.A.L., A.B., O.M., M.S.S., M.O.-M., A.V.K., M.S.S., P.T.E., C.T.R. and S. Kathiresan. Functionally informed fine-mapping was performed by A.G., C.G., M.F., J.C.U., R.C. and H.W. PoPS: E.M.W., H.K.F., K.G.A. and S. Kathiresan. Mouse knock-outs: D.S., T. Konopka and P.D. eQTL analyses were performed by A.S.B., T.J., H.S., L.M., J.L.M.B., A.R. and P.D. Causal gene prioritization was performed by K.G.A., A.S.B., P.S., R.C., A.G., C.G., M.F., J.C.U. and H.W. Experimental work was performed by D.S.A., G.S. and R.M.G. Data visualization was performed by C.G., A.G., N.P.B., M.C.C., J.F., D.-K.J., A.S.B. and K.G.A. CARDIoGRAMplusC4D Executive Committee: J.E., N.J.S., H.S., H.W., P.D., R.R., M.F., S. Kathiresan and J.D. Conceptualization, initiation and oversight came from A.S.B., S. Kathiresan, J.D., C.R., N.J.S., H.S., J.E., H.W., P.D. and C.J.W. K.G.A., T.J., A.G., S. Kanoni, B.N.W., J.D., C.T.R., H.K.F., J.C.H., R.C., J.E., N.J.S., H.S., H.W., C.J.W., P.D., S. Kathiresan and A.S.B drafted and edited the manuscript. All authors reviewed the manuscript.
Peer review
Peer review information
Nature Genetics thanks Xueling Sim and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Data availability
Summary statistics are available upon publication through the CARDIoGRAMplusC4D website (http://www.cardiogramplusc4d.org/) and the NHGRI-EBI GWAS catalog (https://www.ebi.ac.uk/gwas/, accession codes: GCST90132314 (https://www.ebi.ac.uk/gwas/studies/GCST90132314) and GCST90132315 (https://www.ebi.ac.uk/gwas/studies/GCST90132315)). Interactive searchable Manhattan plots and a locus-specific epigenome annotation browser for functionally enriched loci are available at https://procardis.shinyapps.io/cadgen/. An interactive searchable browser detailing the locus-specific evidence prioritizing causal variants, genes and pathways is available at the Common Metabolic Diseases Knowledge Portal (https://hugeamp.org/method.html?trait=cad&dataset=cardiogram).
Other datasets used in this study include the NCBI’s ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/) on 26 June 2020, a 1000 Genomes European ancestry LD file comprising ~1.2 million variants (https://alkesgroup.broadinstitute.org/LDSCORE/), the GTEx Consortium v7 data release (https://www.gtexportal.org/home/datasets), the Ensembl database (www.ensembl.org), the International Mouse Phenotyping Consortium, data release 10.1 (www.mousephenotype.org) and the Mouse Genome Informatics database, data from www.informatics.jax.org on July 2019.
Code availability
Custom code for preparing the study-specific GWAS summary statistics files for meta-analysis can be found at https://github.com/cambridge-ceu/cardiogramplusC4D_GWAS. Custom code for PRS analysis using a modified version of Ldpred 1.0 can be found at https://github.com/wavefancy/LDpredChrByChr.
Competing interests
All deCODE affiliated authors are employees of deCODE/Amgen. The TIMI Study Group has received institutional research grant support through Brigham and Women’s from Abbott, Amgen, Aralez, AstraZeneca, Bayer HealthCare Pharmaceuticals, BRAHMS, Daiichi-Sankyo, Eisai, GlaxoSmithKline, Intarcia, Janssen, MedImmune, Merck, Novartis, Pfizer, Poxel, Quark Pharmaceuticals, Roche, Takeda, The Medicines Company, and Zora Biosciences. R.C., J.C.H., M.I. and F.M. work at the Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, which receives research grants from industry that are governed by the University of Oxford contracts that protect its independence and has a staff policy of not taking personal payments from industry; further details can be found at https://www.ndph.ox.ac.uk/files/about/ndph-independence-of-research-policy-jun-20.pdf. A.S.B. reports grants outside of this work from AstraZeneca, Bayer, Biogen, BioMarin, Bioverativ, Merck, Novartis and Sanofi. A.B. and L.A.L. are employees of Regeneron Pharmaceuticals and the spouse of C.J.W. works at Regeneron Pharmaceuticals. J.L.M.B. and A.R. are members of the board of directors, founders and shareholders of Clinical Gene Networks AB that has an invested interest in STARNET. J.D. serves on scientific advisory boards for AstraZeneca, Novartis, and UK Biobank and has received multiple grants from academic, charitable and industry sources outside of the submitted work. J.C.U. has received compensation for consulting from Goldfinch Bio and is an employee of Patch Biosciences. O.G. became a full-time employee of UCB while this manuscript was being drafted. The other authors declare no conflicts of interest.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Krishna G. Aragam, Tao Jiang, Anuj Goel, Stavroula Kanoni, Brooke N. Wolford, Deepak S. Atri.
These authors jointly supervised this work: Rajat M. Gupta, Jeanette Erdmann, Nilesh J. Samani, Heribert Schunkert, Hugh Watkins, Cristen J. Willer, Panos Deloukas, Sekar Kathiresan, Adam S. Butterworth.
A list of authors and their affiliations appears at the end of the paper.
A list of members and their affiliations appears in the Supplementary Information.
Contributor Information
Krishna G. Aragam, Email: karagam@broadinstitute.org
Adam S. Butterworth, Email: asb38@medschl.cam.ac.uk
EPIC-CVD:
The CARDIoGRAMplusC4D Consortium:
Extended data
is available for this paper at 10.1038/s41588-022-01233-6.
Supplementary information
The online version contains supplementary material available at 10.1038/s41588-022-01233-6.
References
- 1.GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396:1204–1222. doi: 10.1016/S0140-6736(20)30925-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Howson JM, et al. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat. Genet. 2017;49:1113–1119. doi: 10.1038/ng.3874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ishigaki K, et al. Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat. Genet. 2020;52:669–679. doi: 10.1038/s41588-020-0640-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Klarin D, et al. Genetic analysis in UK Biobank links insulin resistance and transendothelial migration pathways to coronary artery disease. Nat. Genet. 2017;49:1392–1397. doi: 10.1038/ng.3914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Koyama S, et al. Population-specific and trans-ancestry genome-wide analyses identify distinct and shared genetic risk loci for coronary artery disease. Nat. Genet. 2020;52:1169–1177. doi: 10.1038/s41588-020-0705-3. [DOI] [PubMed] [Google Scholar]
- 6.Nelson CP, et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat. Genet. 2017;49:1385–1391. doi: 10.1038/ng.3913. [DOI] [PubMed] [Google Scholar]
- 7.Nikpay M, et al. A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 2015;47:1121–1130. doi: 10.1038/ng.3396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.van der Harst P, Verweij N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ. Res. 2018;122:433–443. doi: 10.1161/CIRCRESAHA.117.312086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Verweij N, et al. Identification of 15 novel risk loci for coronary artery disease and genetic risk of recurrent events, atrial fibrillation and heart failure. Sci. Rep. 2017;7:2761. doi: 10.1038/s41598-017-03062-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Webb TR, et al. Systematic evaluation of pleiotropy identifies 6 further loci associated with coronary artery disease. J. Am. Coll. Card. 2017;69:823–836. doi: 10.1016/j.jacc.2016.11.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.de Leeuw CA, et al. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 2015;11:e1004219. doi: 10.1371/journal.pcbi.1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Pers TH, et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 2015;6:5890. doi: 10.1038/ncomms6890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Barbeira AN, et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 2021;22:49. doi: 10.1186/s13059-020-02252-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Stacey D, et al. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res. 2019;47:e3. doi: 10.1093/nar/gky837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Weeks, E. M. et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Preprint at medRxiv10.1101/2020.09.08.20190561 (2020). [DOI] [PMC free article] [PubMed]
- 16.Myocardial Infarction Genetics and CARDIoGRAM Exome Consortia Investigators. Coding variation in ANGPTL4, LPL, and SVEP1 and the risk of coronary disease. N. Engl. J. Med. 2016;374:1134–1144. doi: 10.1056/NEJMoa1507652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhang H, Hu W, Ramirez F. Developmental expression of fibrillin genes suggests heterogeneity of extracellular microfibrils. J. Cell Biol. 1995;129:1165–1176. doi: 10.1083/jcb.129.4.1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Putnam EA, et al. Fibrillin-2 (FBN2) mutations result in the Marfan-like disorder, congenital contractural arachnodactyly. Nat. Genet. 1995;11:456–458. doi: 10.1038/ng1295-456. [DOI] [PubMed] [Google Scholar]
- 19.Takeda N, et al. Congenital contractural arachnodactyly complicated with aortic dilatation and dissection: case report and review of literature. Am. J. Med. Genet. A. 2015;167A:2382–2387. doi: 10.1002/ajmg.a.37162. [DOI] [PubMed] [Google Scholar]
- 20.Deguchi JO, et al. Matrix metalloproteinase-13/collagenase-3 deletion promotes collagen accumulation and organization in mouse atherosclerotic plaques. Circulation. 2005;112:2708–2715. doi: 10.1161/CIRCULATIONAHA.105.562041. [DOI] [PubMed] [Google Scholar]
- 21.Quillard T, et al. Selective inhibition of matrix metalloproteinase-13 increases collagen content of established mouse atherosclerosis. Arterioscler. Thromb. Vasc. Biol. 2011;31:2464–2472. doi: 10.1161/ATVBAHA.111.231563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Romeo S, et al. Genetic variation in PNPLA3 confers susceptibility to nonalcoholic fatty liver disease. Nat. Genet. 2008;40:1461–1465. doi: 10.1038/ng.257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Grant SF, et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 2006;38:320–323. doi: 10.1038/ng1732. [DOI] [PubMed] [Google Scholar]
- 24.Purcell SM, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Vilhjálmsson BJ, et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sabatine MS, et al. Rationale and design of the further cardiovascular outcomes research with PCSK9 inhibition in subjects with elevated risk trial. Am. Heart J. 2016;173:94–101. doi: 10.1016/j.ahj.2015.11.015. [DOI] [PubMed] [Google Scholar]
- 27.Wu M-Y, et al. Inhibition of the plasma SCUBE1, a novel platelet adhesive protein, protects mice against thrombosis. Arterioscler. Thromb. Vasc. Biol. 2014;34:1390–1398. doi: 10.1161/ATVBAHA.114.303779. [DOI] [PubMed] [Google Scholar]
- 28.Kichaev G, et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 2014;10:e1004722. doi: 10.1371/journal.pgen.1004722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Pickrell JK. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 2014;94:559–573. doi: 10.1016/j.ajhg.2014.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Weissbrod O, et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 2020;52:1355–1363. doi: 10.1038/s41588-020-00735-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gupta RM, et al. A genetic variant associated with five vascular diseases is a distal regulator of endothelin-1 gene expression. Cell. 2017;170:522–533. doi: 10.1016/j.cell.2017.06.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Prestel M, et al. The atherosclerosis risk variant rs2107595 mediates allele-specific transcriptional regulation of HDAC9 via E2F3 and Rb1. Stroke. 2019;50:2651–2660. doi: 10.1161/STROKEAHA.119.026112. [DOI] [PubMed] [Google Scholar]
- 33.Surakka I, et al. The impact of low-frequency and rare variants on lipid levels. Nat. Genet. 2015;47:589–597. doi: 10.1038/ng.3300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Franzén O, et al. Cardiometabolic risk loci share downstream cis-and trans-gene regulation across tissues and diseases. Science. 2016;353:827–830. doi: 10.1126/science.aad6970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.GTEx Consortium. Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Alasoo K, et al. Genetic effects on promoter usage are highly context-specific and contribute to complex traits. Elife. 2019;8:e41673. doi: 10.7554/eLife.41673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Hamada M, et al. MafB promotes atherosclerosis by inhibiting foam-cell apoptosis. Nat. Commun. 2014;5:3147. doi: 10.1038/ncomms4147. [DOI] [PubMed] [Google Scholar]
- 38.Mountjoy E, et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 2021;53:1527–1533. doi: 10.1038/s41588-021-00945-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wong D, Turner AW, Miller CL. Genetic insights into smooth muscle cell contributions to coronary artery disease. Arterioscler. Thromb. Vasc. Biol. 2019;39:1006–1017. doi: 10.1161/ATVBAHA.119.312141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bennett MR, Sinha S, Owens GK. Vascular smooth muscle cells in atherosclerosis. Circ. Res. 2016;118:692–702. doi: 10.1161/CIRCRESAHA.115.306361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Fantuzzi G, Mazzone T. Adipose tissue and atherosclerosis: exploring the connection. Arterioscler. Thromb. Vasc. Biol. 2007;27:996–1003. doi: 10.1161/ATVBAHA.106.131755. [DOI] [PubMed] [Google Scholar]
- 42.Chiang H-Y, Chu P-H, Lee T-H. MFG-E8 mediates arterial aging by promoting the proinflammatory phenotype of vascular smooth muscle cells. J. Biomed. Sci. 2019;26:61. doi: 10.1186/s12929-019-0559-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Wang M, Wang HH, Lakatta EG. Milk fat globule epidermal growth factor VIII signaling in arterial wall remodeling. Curr. Vasc. Pharm. 2013;11:768–776. doi: 10.2174/1570161111311050014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Soubeyrand S, et al. Regulation of MFGE8 by the intergenic coronary artery disease locus on 15q26.1. Atherosclerosis. 2019;284:11–17. doi: 10.1016/j.atherosclerosis.2019.02.012. [DOI] [PubMed] [Google Scholar]
- 45.Chambers JC, et al. Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat. Genet. 2011;43:1131–1138. doi: 10.1038/ng.970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Klarin D, et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nat. Genet. 2018;50:1514–1523. doi: 10.1038/s41588-018-0222-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Hanley PJ, et al. Motorized RhoGAP myosin IXb (Myo9b) controls cell shape and motility. Proc. Natl Acad. Sci. USA. 2010;107:12145–12150. doi: 10.1073/pnas.0911986107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Lu Y, et al. Genome-wide identification of genes essential for podocyte cytoskeletons based on single-cell RNA sequencing. Kidney Int. 2017;92:1119–1129. doi: 10.1016/j.kint.2017.04.022. [DOI] [PubMed] [Google Scholar]
- 49.Gough W, et al. A quantitative, facile, and high-throughput image-based cell migration method is a robust alternative to the scratch assay. J. Biomol. Screen. 2011;16:155–163. doi: 10.1177/1087057110393340. [DOI] [PubMed] [Google Scholar]
- 50.Damask A, et al. Patients with high genome-wide polygenic risk scores for coronary artery disease may receive greater clinical benefit from alirocumab treatment in the ODYSSEY OUTCOMES trial. Circulation. 2020;141:624–636. doi: 10.1161/CIRCULATIONAHA.119.044434. [DOI] [PubMed] [Google Scholar]
- 51.Hindy G, et al. Genome-wide polygenic score, clinical risk factors, and long-term trajectories of coronary artery disease. Arterioscler. Thromb. Vasc. Biol. 2020;40:2738–2746. doi: 10.1161/ATVBAHA.120.314856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Khera AV, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Marston NA, et al. Predicting benefit from evolocumab therapy in patients with atherosclerotic disease using a genetic risk score: results from the FOURIER trial. Circulation. 2020;141:616–623. doi: 10.1161/CIRCULATIONAHA.119.043805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Amariuta T, et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 2020;52:1346–1354. doi: 10.1038/s41588-020-00740-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Xu Y, et al. Machine learning optimized polygenic scores for blood cell traits identify sex-specific trajectories and genetic correlations with disease. Cell Genom. 2022;2:100086. doi: 10.1016/j.xgen.2021.100086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Duncan L, et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 2019;10:3328. doi: 10.1038/s41467-019-11112-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Burbelo PD, Martin GR, Yamada Y. Alpha 1(IV) and alpha 2(IV) collagen genes are regulated by a bidirectional promoter and a shared enhancer. Proc. Natl Acad. Sci. USA. 1988;85:9679–9682. doi: 10.1073/pnas.85.24.9679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Yang J, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 2012;44:369–375. doi: 10.1038/ng.2213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Bulik-Sullivan B, et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Karczewski KJ, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.McLaren W, et al. The ensembl variant effect predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Zhou W, et al. Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts. Nat. Genet. 2020;52:634–639. doi: 10.1038/s41588-020-0621-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Zhou W, et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 2018;50:1335–1341. doi: 10.1038/s41588-018-0184-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Muñoz M, et al. Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. Nat. Genet. 2016;48:980–983. doi: 10.1038/ng.3618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Witte JS, Visscher PM, Wray NR. The contribution of genetic variants to disease depends on the ruler. Nat. Rev. Genet. 2014;15:765–776. doi: 10.1038/nrg3786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Berglund G, et al. The Malmo Diet and Cancer study. Design and feasibility. J. Intern. Med. 1993;233:45–51. doi: 10.1111/j.1365-2796.1993.tb00647.x. [DOI] [PubMed] [Google Scholar]
- 69.Chang CC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods. 2012;9:215–216. doi: 10.1038/nmeth.1906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Kundaje A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Atri DS, et al. CRISPR-Cas9 genome editing of primary human vascular cells in vitro. Curr. Protoc. 2021;1:e291. doi: 10.1002/cpz1.291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Buenrostro JD, et al. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 2015;109:21.29.1–21.29.9. doi: 10.1002/0471142727.mb2129s109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Summary statistics are available upon publication through the CARDIoGRAMplusC4D website (http://www.cardiogramplusc4d.org/) and the NHGRI-EBI GWAS catalog (https://www.ebi.ac.uk/gwas/, accession codes: GCST90132314 (https://www.ebi.ac.uk/gwas/studies/GCST90132314) and GCST90132315 (https://www.ebi.ac.uk/gwas/studies/GCST90132315)). Interactive searchable Manhattan plots and a locus-specific epigenome annotation browser for functionally enriched loci are available at https://procardis.shinyapps.io/cadgen/. An interactive searchable browser detailing the locus-specific evidence prioritizing causal variants, genes and pathways is available at the Common Metabolic Diseases Knowledge Portal (https://hugeamp.org/method.html?trait=cad&dataset=cardiogram).
Other datasets used in this study include the NCBI’s ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/) on 26 June 2020, a 1000 Genomes European ancestry LD file comprising ~1.2 million variants (https://alkesgroup.broadinstitute.org/LDSCORE/), the GTEx Consortium v7 data release (https://www.gtexportal.org/home/datasets), the Ensembl database (www.ensembl.org), the International Mouse Phenotyping Consortium, data release 10.1 (www.mousephenotype.org) and the Mouse Genome Informatics database, data from www.informatics.jax.org on July 2019.
Custom code for preparing the study-specific GWAS summary statistics files for meta-analysis can be found at https://github.com/cambridge-ceu/cardiogramplusC4D_GWAS. Custom code for PRS analysis using a modified version of Ldpred 1.0 can be found at https://github.com/wavefancy/LDpredChrByChr.