Abstract
Left ventricular mass is a risk marker for cardiovascular events, and may indicate an underlying cardiomyopathy. Cardiac magnetic resonance is the gold-standard for left ventricular mass estimation, but is challenging to obtain at scale. Here, we use deep learning to enable genome-wide association study of cardiac magnetic resonance-derived left ventricular mass indexed to body surface area within 43,230 UK Biobank participants. We identify 12 genome-wide associations (1 known at TTN and 11 novel for left ventricular mass), implicating genes previously associated with cardiac contractility and cardiomyopathy. Cardiac magnetic resonance-derived indexed left ventricular mass is associated with incident dilated and hypertrophic cardiomyopathies, and implantable cardioverter-defibrillator implant. An indexed left ventricular mass polygenic risk score ≥90th percentile is also associated with incident implantable cardioverter-defibrillator implant in separate UK Biobank (hazard ratio 1.22, 95% CI 1.05-1.44) and Mass General Brigham (hazard ratio 1.75, 95% CI 1.12-2.74) samples. Here, we perform a genome-wide association study of cardiac magnetic resonance-derived indexed left ventricular mass to identify 11 novel variants and demonstrate that cardiac magnetic resonance-derived and genetically predicted indexed left ventricular mass are associated with incident cardiomyopathy.
Subject terms: Cardiovascular genetics, Cardiac hypertrophy, Machine learning
A genome-wide association study of cardiac magnetic resonance-derived left ventricular mass index including 43,000 UK Biobank participants reveals 12 associations (11 novel), implicating genes involved in cardiac contractility and cardiomyopathy.
Introduction
Left ventricular hypertrophy (LVH) is defined as pathologically increased left ventricular mass (LVM)1 and is associated with increased risk of cardiovascular events including heart failure (HF)1–3, stroke1, atrial fibrillation (AF)4, and sudden cardiac death5. Increased LVM is also a hallmark of certain primary cardiomyopathies such as hypertrophic cardiomyopathy (HCM) and some dilated cardiomyopathies (DCM). Although LVM can be estimated using 12 lead electrocardiograms or echocardiography, cardiac magnetic resonance (CMR) offers more accurate and reproducible quantification, and has therefore emerged as the gold standard for diagnosing LVH6.
Imaging-based estimation of LVM typically requires LV segmentation, which is usually performed manually and requires substantial time and expertise. As a result, genetic analyses of imaging-based LVM have been limited by modest sample sizes. Genome-wide association studies (GWAS) of echocardiography-based LVM identified a single susceptibility locus downstream of SPCS37–9. More recently, a genome-wide association study within 19,000 individuals10 identified significant variants in the gene TTN associated with CMR-based LVM.
Here, we apply a validated deep learning approach to automate estimation of LVM using CMR images (Machine Learning for Health – Segmentation [ML4Hseg]), to maximize power to detect genetic associations underlying CMR-derived LVM11. Specifically, we implement ML4Hseg to estimate LVM using CMRs from nearly 50,000 participants in the UK Biobank. Given body size is a major determinant of LV size and mass12, we analyze LVMI (i.e., LVM indexed by body surface area) in our primary analyses, and assess unindexed LVM in secondary analyses. Our GWAS of LVMI identifies 12 independent variants meeting genome-wide significance, including 11 novel associations. Using expression quantitative trait loci (eQTLs), transcriptome-wide association testing (TWAS), and tissue-specific expression levels, we propose several candidate genes, many of which have been previously associated with cardiac contractility and cardiomyopathy. We additionally develop a polygenic risk score (PRS) for LVMI, and demonstrate that both phenotypic and genetic LVMI are associated with incident cardiovascular diseases including cardiomyopathy.
Results
Genome-wide association study of CMR-derived LVM
We conducted a multi-ancestry GWAS including 43,230 individuals (91% European ancestry) (Fig. 1, Supplementary Table 1). The analysis included 9.9 million common variants imputed at an INFO score ≥0.30 and having minor allele frequency (MAF) ≥1%. The genomic control factor was 1.15 with a linkage disequilibrium score regression intercept of 1.00, consistent with polygenicity of the LVMI trait as opposed to inflation (Supplementary Fig. 1). Observed scale h2 for LVMI was 0.26 (standard error [SE] 0.02).
The GWAS initially revealed 12 candidate SNPs associated with CMR-derived LVMI at genome-wide significance (Table 1 and Fig. 2). Conditional analyses identified an additional variant on chromosome 2, and that the two variants on chromosome 17 located 914 kb apart (r2 = 0.37) were not independent, ultimately resulting in 12 lead SNPs for LVMI. The SNP most strongly associated with LVMI (rs2255167, p = 1.4 × 10−26) was located at the TTN locus on chromosome 2 and has been previously associated with LVM. TTN is highly expressed in LV tissue (Supplementary Table 2)10. The remaining loci (n = 11) were novel, with many located at or proximate to genes implicated in arrhythmias, cardiomyopathy and cardiomyocyte function, including FLNC, MYOZ1, MAPT, WNT, CLCN6, MYBPC3 and SYNPO2L. Regional association plots for each genome-wide significant SNP are shown in Supplementary Fig. 2. Results for 18 additional variants having suggestive but not genome-wide significant associations are shown in Supplementary Table 3. A secondary GWAS of unindexed LVM revealed 12 genome-wide significant SNPs, of which 6 overlapped with the primary LVMI GWAS, and a 7th was a strong proxy (r2 = 0.87). Loci unique to analyses of unindexed LVM appeared primarily enriched for genes associated with body size (e.g., FTO, HMGA2, GDF5), although FTO has also been implicated in HF13 and CDKN1A has been associated with DCM in a recent multi-trait analysis14 (Supplementary Table 4 and Supplementary Fig. 3).
Table 1.
rsID | Chr | Position (hg38) | Closest gene(s) | Function | Risk/alt allele | RAF | Beta | SE | P value* |
---|---|---|---|---|---|---|---|---|---|
rs143800963 | 1 | 11835418 | CLCN6 | Intronic | C/A | 0.95 | 0.95 | 0.16 | 4.2 × 10−9 |
rs2255167† | 2 | 178693555 | TTN | Intronic | T/A | 0.81 | 0.97 | 0.09 | 3.2 × 10−26 |
rs10497529‡ | 2 | 178975161 | CCDC141 | Missense | G/A | 0.96 | 1.28 | 0.20 | 2.2 × 10−9 |
- | 5 | 133066736 | HSPA4 | Indel | CTT/C | 0.72 | 0.50 | 0.08 | 1.6 × 10−9 |
rs9388498 | 6 | 126552277 | CENPW | - | G/T | 0.81 | −0.55 | 0.10 | 4.1 × 10−9 |
rs34163229 | 10 | 73647154 | SYNPO2L | Missense | G/T | 0.86 | −0.60 | 0.10 | 1.0 × 10−8 |
rs3729989 | 11 | 47348490 | MYBPC3 | Missense | T/C | 0.87 | −0.61 | 0.11 | 1.8 × 10−8 |
rs28552516 | 12 | 121592356 | KDM2B | Intronic | C/T | 0.85 | −0.58 | 0.10 | 1.5 × 10−8 |
rs6598541 | 15 | 98727906 | IGF1R | Intronic | A/G | 0.36 | −0.42 | 0.08 | 4.6 × 10−8 |
rs56252725 | 16 | 14995819 | PDXDC1 | Intronic | G/A | 0.75 | 0.54 | 0.09 | 3.7 × 10−9 |
rs6503451 | 17 | 45870981 | MAPT | Intronic | T/C | 0.67 | −0.52 | 0.08 | 1.1 × 10−10 |
rs199501§ | 17 | 46785247 | WNT3 | Intronic | A/G | 0.24 | 0.55 | 0.09 | 1.1 × 109 |
rs62621197 | 19 | 8605262 | ADAMTS10 | Missense | C/T | 0.96 | 1.11 | 0.20 | 2.9 × 10−8 |
Chr chromosome, RAF risk allele frequency, OR odds ratio.
*Denotes two-sided p value corresponding to BOLT-LMM χ2 statistic.
†Locus previously reported for LVM10.
‡Variant identified in conditional analysis conditioned on lead SNPs (beta, standard error, and p value are adjusted).
§Association no longer observed in analysis conditioned on rs6503451.
In GWAS restricted to individuals of European ancestry, 14 loci met genome-wide significance, of which 12 were either a lead variant or a strong proxy (r2 > 0.8) for a lead variant in the primary GWAS (Supplementary Table 5 and Supplementary Figs. 4 and 5). The two loci unique to the European ancestry analysis were rs143973349, an insertion-deletion variant located near FLNC, a gene highly expressed in LV tissue and previously associated with familial hypertrophic, restrictive, and arrhythmogenic cardiomyopathies, and rs142032045, located in a gene-rich region closest to DOC2A and near several variants previously associated with body size15–18. The variant near FLNC had a suggestive association with LVMI in the primary multi-ancestry GWAS, while the variant near DOC2A did not (p = 3.2 × 10−7 and p = 1.1 × 10−5, respectively). The only variant meeting genome-wide significance in the primary mixed-ancestry GWAS that was not a lead variant in the European-only GWAS did have a suggestive association (rs6598541 near IGF1R p = 7.7 × 10−8).
Results of secondary GWAS analyses, including rank-based inverse normal transformed LVMI, LVMI indexed using the 2.7th power of height, LVMI indexed using lean body mass, LVMI with exclusions for prevalent myocardial infarction and heart failure, and unindexed LVM adjusted for height and weight, are shown in Supplementary Tables 6-10. Results obtained using alternative indexing methods were broadly consistent with the primary analysis in terms of variants identified and effect directions. A summary of association results for the lead variants identified in the primary GWAS tested across varying indexing methods is shown in Supplementary Table 11.
Bioinformatics and in silico functional analyses to determine candidate genes
In total, of the 12 independent lead SNPs, eight (or their proxies at r2 ≥ 0.8) were significant eQTLs in LV and/or AA tissue samples (Fig. 3). The locus including variant rs143973349 unique to the European ancestry analysis also included eQTLs for LV and AA tissue. For a significant proportion of candidate genes, expression was identified in both LV and AA tissue samples. We then performed TWAS and identified 6 genes across 5 loci where predicted expression was associated with LVMI. Each of the genes implicated by TWAS was also an eQTL for either LV or AA (Fig. 3). Using Hi-C analysis, we observed several potentially relevant chromatin interactions, including between lead variant rs56252725 on chromosome 16 and gene MYH11, which encodes an isoform of the myosin heavy chain which is highly expressed in LV tissue and has been associated with electrocardiogram amplitude, and between lead variant rs143973349 (European-only analysis) and gene CCDC136, which encodes a membrane protein and in which variants have been previously associated with dilated and hypertrophic cardiomyopathies. Detailed results of eQTL, TWAS, and Hi-C analyses are shown in Supplementary Table 2.
Probable candidate genes at each locus of interest are summarized in Fig. 3. In several cases, the closest gene was additionally supported by either eQTL or TWAS prioritization, including SYNPO2L near rs56252725, IGF1R near rs6598541, PDXDC1 near rs56252725, MAPT near rs6598541, and WNT3 near rs199501. In selected instances, downstream analyses prioritized alternative genes, including NPPA near rs143800963 and ORAI1 near rs28552516, with both genes having substantial expression in LV tissue. Selected genes prioritized based on strong biologic plausibility or previous associations with LVM included TTN near rs255167, MYBPC3 near rs3729989, and FLNC near rs143973349 (EUR only subset). TTN, MYBPC3, and FLNC are also substantially expressed in LV tissue (Supplementary Table 2).
Comparison to prior associations with LV measurements and cardiovascular traits
We assessed whether the significant loci we identified have been previously associated with LV measurements10, 19 and cardiovascular traits. Including the European-only analysis, a total of 4 loci have been previously associated with LV measurements. Variant rs2255167 is located on a region of TTN previously associated with LV mass, LV end diastolic volume, LV end-systolic volume, and LV ejection fraction. Variants rs6503451 near MAPT and rs199501 near WNT3 are located at regions previously associated with LV end-systolic volume. In the European-only analysis, variant rs143973349 near FLNC is at a locus previously associated with LV end-systolic volume and LV ejection fraction. Several additional loci have been implicated in other cardiovascular diseases such as heart failure (e.g., rs34163229 near SYNPO2L), cardiomyopathy (e.g., rs2255167 near TTN, rs3729989 near MYBPC3, rs143973349 near FLNC), and atrial fibrillation (e.g., rs6598541 near IGF1R), while others have been associated with cardiovascular risk factors such as blood pressure or diabetes. Several variants are located at regions previously associated with electrocardiographic traits such as PR interval (e.g., rs56252725 near PDXDC1), QRS duration (rs6598541 near IGF1R), and QRS amplitude (rs6503451 near MAPT). Variants rs28552516 near KDM2B, rs62621197 near ADAMTS10, and rs142032045 near DOC2A in the European-only analysis have not been previously associated with either LV or other cardiovascular traits. A summary of lead variants and their prior associations is shown in Supplementary Table 12.
Associations between LVMI and cardiomyopathy
We assessed for associations between CMR-derived LVMI and incident cardiovascular disease. At a median follow-up of 2.7 years (Q1:1.9, Q3:4.1), greater LVMI was consistently associated with greater risk of multiple conditions, including AF, MI, HF, DCM, HCM, and ICD implant (Supplementary Table 13). CMR-derived LVH was strongly associated with incident DCM (HR 10.9, 95% CI 4.67–20.2), HCM (HR 9.26, 95% CI 3.20–26.8), and ICD implant (HR 8.42, 95% CI 3.82–18.6). Cumulative risk of events stratified by presence versus absence of CMR-derived LVH is depicted in Fig. 4.
We next evaluated associations between LVMI genetic risk and incident outcomes. In a set of UK Biobank participants separate from the GWAS sample (n = 443,326), a greater LVMI PRS was associated with higher risk of multiple incident conditions including AF, HF, ventricular arrhythmias, DCM, and ICD implant (Table 2). In the independent MGB sample (n = 29,354), the LVMI PRS was again associated with incident ICD implant, along with suggestive associations with HCM and DCM (Table 2). In models of incident ICD risk, the relative hazard of ICD was consistently greatest at the highest levels of CMR-derived LVMI as well as LVMI PRS, with similar effect sizes in both the UK Biobank and MGB (Fig. 5). Disease association results were generally similar in analyses restricted to individuals of European ancestry (Supplementary Table 14), and when utilizing a PRS derived from GWAS performed after exclusion of individuals with prevalent myocardial infarction and heart failure (Supplementary Table 15).
Table 2.
Hazard ratio for covariate (95% CI)* | |||||
---|---|---|---|---|---|
N events/N total† | Follow-up, yrs (Q1,Q3) | PRS (per 1 SD) | PRS (90th percentile) | PRS (95th percentile) | |
UK Biobank | |||||
Atrial fibrillation | 25050/435917 | 11.8 (11.0,12.6) | 1.01 (1.00–1.03) | 1.03 (0.98–1.07) | 1.04 (0.98–1.10) |
Myocardial infarction | 13405/432044 | 11.8 (11.0,12.6) | 1.03 (1.01–1.05) | 1.05 (0.99–1.11) | 1.10 (1.02–1.18) |
Heart failure | 13540/440590 | 11.9 (11.0,12.6) | 1.04 (1.02–1.05) | 1.06 (1.00–1.12) | 1.08 (1.00–1.16) |
Ventricular arrhythmias | 4882/442295 | 11.9 (11.1,12.6) | 1.06 (1.03–1.09) | 1.13 (1.04–1.24) | 1.17 (1.04–1.32) |
Dilated cardiomyopathy‡ | 1023/443013 | 11.9 (11.1,12.6) | 1.10 (1.04–1.17) | 1.15 (0.95–1.40) | 1.29 (1.00–1.66) |
Hypertrophic cardiomyopathy‡ | 420/443150 | 11.9 (11.1,12.6) | 1.08 (0.98–1.09) | 0.95 (0.68–1.33) | 1.23 (0.82–1.86) |
Implantable defibrillator | 1444/443216 | 11.9 (11.1,12.6) | 1.07 (1.02–1.13) | 1.22 (1.05–1.44) | 1.22 (0.98–1.51) |
Mass General Brigham | |||||
Atrial fibrillation | 1332/25316 | 2.9 (2.0,4.1) | 1.01 (0.95–1.06) | 1.02 (0.85–1.22) | 1.03 (0.80–1.31) |
Myocardial infarction | 695/25592 | 2.9 (2.0,4.1) | 0.99 (0.92–1.06) | 0.97 (0.74–1.25) | 0.71 (0.47–1.07) |
Heart failure | 1074/25063 | 2.9 (2.0,4.1) | 0.97 (0.91–1.03) | 1.18 (0.97–1.42) | 1.00 (0.76–1.33) |
Ventricular arrhythmias | 944/26990 | 3.0 (2.0,4.2) | 0.99 (0.93–1.05) | 1.00 (0.81–1.24) | 1.03 (0.76–1.38) |
Dilated cardiomyopathy | 492/28821 | 3.0 (2.1,4.2) | 1.06 (0.97–1.16) | 1.27 (0.97–1.67) | 1.06 (0.70–1.59) |
Hypertrophic cardiomyopathy | 183/28731 | 3.0 (2.1,4.2) | 1.14 (0.98–1.32) | 1.04 (0.64–1.69) | 0.82 (0.38–1.75) |
Implantable defibrillator | 152/28454 | 3.0 (2.1,4.2) | 1.05 (0.89–1.24) | 1.75 (1.12–2.74) | 1.69 (0.91–3.12) |
CI confidence interval, PRS polygenic risk score, Q1 quartile 1, Q3 quartile 3, SD standard deviation.
*Hazard ratios obtained using Cox proportional hazards models adjusted for age, sex, and principal components 1–5.
†N includes all individuals without the prevalent condition at baseline.
‡Includes n = 20 events with high confidence loss-of-function, deleterious missense, known pathogenic or likely pathogenic variant for HCM, and n = 50 events with high confidence loss-of-function, deleterious missense, known pathogenic or likely pathogenic rare variant for DCM (see text and Supplementary Table 18).
Mendelian-randomization analyses of blood pressure and diabetes
To assess for potential causal associations between blood pressure and CMR-derived LVMI, we performed MR analyses using genetic instruments for SBP and DBP among individuals of European ancestry. We performed analogous analyses for diabetes. In an inverse-variance weighted two-sample MR, a 1-SD increase in genetically mediated SBP was associated with a 0.27 g/m2 increase in CMR-derived LVMI (95% CI 0.23–0.31, p = 1.75 × 10−41), and a 1-SD increase in genetically mediated DBP was associated with a 0.32 g/m2 increase in CMR-derived LVMI (95% CI, 0.25–0.39, p = 1.64 × 10−20). A 1-SD increase in genetically mediated risk of diabetes was associated with a 0.31 g/m2 increase in CMR-derived LVMI (95% CI, 0.05–0.56, p = 0.018). Weighted median and MR-Egger analyses demonstrated similar results for SBP and DBP, but associations with diabetes were no longer significant (weighted median: 0.19 g/m2, 95% CI −0.15 to 0.53, p = 0.26; MR-Egger: 0.15 g/m2, 95% CI −0.36 to 0.66, p = 0.56). MR-Egger analyses suggested no substantive directional pleiotropy in the SBP, DBP, and diabetes instruments (intercept 0.01, p-0.38 for SBP; intercept −0.02, p = 0.04 for DBP; intercept=0.01, p = 0.50 for diabetes). MR results were similar using unindexed LVM (Supplementary Table 16). MR plots are shown in Supplementary Fig. 6.
Discussion
In the current study, we utilized a deep learning segmentation algorithm to perform GWAS of CMR-derived LVMI in nearly 50,000 individuals. Leveraging favorable statistical power and a rich imaging-based phenotype, we identified 12 independent loci associated with LVMI at genome-wide significance. Of the loci identified, 11 are novel for LV mass, 9 have not been previously associated with any LV measurement, and 2 have not been associated with any cardiovascular trait or risk factor. A European-only analysis revealed 2 additional loci which are novel for LV mass. Downstream analyses prioritize several candidate genes, including multiple genes previously associated with cardiac structure and function, as well as cardiomyopathy. Importantly, CMR-derived and genetically determined LVMI were each associated with greater risk of incident cardiovascular events, including incident, DCM, and ICD implant.
Our analyses suggest that common variants in cardiac structural and functional genes appear to be important determinants of LVM. CMR-derived LVMI was strongly associated with variation at rs2255167, located within the gene encoding the large sarcomeric protein titin and previously associated with LV mass10, as well as LV volumes and ejection fraction19. MYOZ1, which encodes a sarcomeric protein involved in calcineurin signaling and was prioritized by both eQTL and TWAS analysis, has been previously associated with HF13 and AF20. A mouse knockout of MYOZ1 resulted in increased exercise capacity through activation of the nuclear factor of activated T-cells21. Another gene prioritized by both eQTL and TWAS, TNNT3, encodes a troponin T isoform which is highly expressed in LV tissue. The TNNT3 R63H variant has been shown to result in increased contractility in mouse skeletal muscle and is a cause of the human disease Arthrogryposis (Type 2B2)22, characterized by limb contractures (i.e., excessive muscular contraction). SYNPO2L, an actin-related protein expressed in LV myocardium, has been previously associated with AF23, HF24, HCM14, and voltage-duration product (a clinical indicator of LVH)25.
Several of the candidate genes we identified prioritize neurohormonal regulation and response to physiologic stress as potential genetic determinants of LVMI. Specifically, lead variant rs143800963 is located on chromosome 1 within 20 kb of NPPA and NPPB, genes that encode the natriuretic peptides Nppa and Nppb, respectively, with both proteins playing important roles in blood pressure regulation and salt homeostasis26. Both Nppa and Nppb are constitutively expressed in ventricular myocardium and upregulated in response to stress27. NPPB knockout in mice results in augmentation of the cardiac fibrosis response to pressure overload28. Conversely, cardiomyocyte-specific deletion of ORAI1, which encodes a regulator of calcium-induced calcium release, results in improved response to pressure overload and protection against angiotensin II-induced cardiac remodeling in adult myocardium29. IGFR1, an eQTL for LV tissue in which predicted expression in LV was associated with LVMI, encodes the insulin-like growth factor receptor 1, which has been implicated in organ growth and insulin resistance30.
Several LVMI candidate genes have previous links to cardiomyopathy and HF. The strongest association we observed was at rs2255167, a variant located in TTN, in which mutations have been previously associated with familial cardiomyopathy31 and early-onset AF32. One of the loci detected in the European ancestry analysis (and suggestive in the primary analysis), FLNC, encodes filamin C, an actin-related protein associated with familial HCM16, restrictive cardiomyopathy17, arrhythmogenic cardiomyopathy15, and LV contractile function19. A mouse knock-in of filamin C results in myofibrillar degeneration33. PPP3CB, which encodes the signaling protein calcineurin, has been implicated in pathologic cardiac hypertrophy34. Lead variant rs3729989 is located near MYBPC3, a gene encoding the cardiac myosin-binding protein. Mutations in MYBPC3 are a known cause of DCM and HCM35, 36. FTO, an obesity gene previously associated with HF13, was associated with unindexed LV mass, but not LVMI. Interestingly, we identified several loci which are novel for LVM but have prior associations with electrocardiographic traits37, 38. Future work is warranted to assess whether such associations may reflect electrical manifestations of LV mass or the presence of a cardiomyopathy.
Importantly, we observed that both phenotypic and genetically predicted LVMI were associated with increased risks of incident cardiovascular events. Increased LVMI and LVH are consistently associated with HF2. Here, we observed associations not only with HF, but also incident DCM, HCM, and insertion of an ICD (a surrogate for cardiomyopathy or ventricular arrhythmias). Consistent with the notion that LVMI may be an endophenotype for certain cardiomyopathies, we observed that genetically predicted LVMI (using a 465-variant PRS) was associated with greater risk of incident ICD implant in a separate set of UK Biobank participants as well as an external sample from the MGB healthcare system. Of note, we did not exclude individuals with DCM or HCM from our incident disease analyses since we hypothesized that polygenic risk may nevertheless contribute to the development of clinical outcomes39. In the context of low event rates, however, the LVMI PRS was associated with incident DCM only in the UK Biobank, and associations with incident HCM were not significant in either sample. Consistent with expectations40, 41, using Mendelian-randomization analyses, we observed associations between genetically predicted blood pressure and diabetes risk with greater LVM. Overall, our findings provide evidence that the genetic variation underlying increased LVM may be clinically relevant, and highlight the need for future research to evaluate the potential utility of a polygenic predictor of LVM to improve identification of individuals at risk of incident cardiomyopathy.
Our study has limitations. First, our analysis was a mixed-ancestry GWAS, but the sample is predominantly of European descent. Therefore, our results may not generalize to individuals of other ancestries. Second, we used a previously published deep learning model (ML4Hseg) to facilitate well-powered GWAS of CMR-derived LVM. ML4Hseg was trained using an imperfect segmentation method as ground truth11, 42, which may have led to lower agreement with true LVM as compared to some alternative approaches (e.g., 95% limits of agreement −27g to 27 g with ML4Hseg versus −18 to 18 g by Bai et al. using a proprietary deep learning model43 and −5 to 8 g by Peterson et al. in a small set of hand-labeled measurements44). Nevertheless, estimates from ML4Hseg correlate strongly (r = 0.86) with hand-labeled CMR-derived LVM in the UK Biobank11, and MR analyses recapitulated a known causal relationship between elevated blood pressure and increased indexed LVM40. Third, our ability to assess for associations between CMR-derived LVMI and incident outcomes was limited by event rates and follow-up currently available after imaging. Fourth, generalizability may be affected by bias introduced by methods of enrollment, as UK Biobank participants are enriched for health and socioeconomic status compared to the general population45. Fifth, we analyzed LVM indexed to body surface area since this measure is in common clinical use, even though alternative methods of body mass correction exist. We therefore performed multiple analyses using alternative indexing methods (e.g., 2.7th power of height).
In summary, we performed GWAS of deep-learned CMR-derived LVM including nearly 50,000 individuals. We discovered 12 independent loci meeting genome-wide significance, including 11 that are novel. Using complementary downstream analyses, we identified multiple candidate genes, many of which are involved in cardiac structure and function, and several that have been previously implicated in cardiomyopathy. Both CMR-derived and genetically determined LVM were associated with incident ICD implant in independent datasets. Our findings add to our understanding of common genetic variation underlying LVM and demonstrate the potential to use deep learning to define rich phenotypes at scale to empower clinically relevant biological discovery.
Methods
Study populations
The discovery sample comprised the UK Biobank, a population-based prospective cohort of 502,629 participants recruited between 2006–2010 in the United Kingdom to investigate the genetic and lifestyle determinants of disease. The design of the cohort has been described previously46, 47. Briefly, approximately 9.2 million individuals aged 40-69 years living within 25 miles of the 22 assessment centers in England, Wales, and Scotland were invited, and 5.4% participated in the baseline assessment. Extensive questionnaire data, physical measures, and biological data were collected at recruitment, with ongoing data collection in large subsets of the cohort, including repeated assessments and multimodal imaging. At the time of the current analysis, over 450,000 individuals have genome-wide genotyping data available. All participants are followed up for health outcomes through linkage to national health-related datasets.
We utilized the MGB Biobank to replicate a LVMI PRS that we derived in the UK Biobank. The MGB Biobank is a biorepository comprising patients from a multi-institutional healthcare network spanning seven hospitals in the New England region of the United States. MGB Biobank participants are followed for health outcomes through linkage to electronic health record (EHR) data.
UK Biobank and MGB Biobank participants provided written informed consent. The UK Biobank was approved by the UK Biobank Research Ethics Committee (reference number 11/NW/0382) and the MGB Biobank by the MGB Institutional Review Board. Use of UK Biobank (application #17488) and MGB Biobank data were approved by the local MGB Institutional Review Board.
Cardiac magnetic resonance acquisition
For all analyses, we included individuals who underwent CMR during a UK Biobank imaging assessment and whose bulk CMR data were available for download as of 04-01-2020 (Fig. 1). The full CMR protocol of the UK Biobank has been described in detail previously48. Briefly, all CMR examinations were performed in the United Kingdom on a clinical wide-bore 1.5 Tesla scanner (MAGNETOM Aera, Syngo Platform VD13A, Siemens Healthineers, Erlangen, Germany). All acquisitions used balanced steady-state free precession with typical parameters.
Left ventricular mass estimation
We obtained CMR-derived LVM from all individuals with available CMR imaging using ML4Hseg11. ML4Hseg is a convolutional neural network which identifies pixels corresponding to LV myocardium, which are then summed to estimate LV area and multiplied by slice thickness to estimate LV myocardial volume. LV myocardial volume is then multiplied by myocardial density (1.05 g/cm3) to yield LVM. LVM estimates were calibrated to the sex-specific sample means using manually labeled LVM measurements which were available within a subset of the UK Biobank sample (n = 4910), where sex was classified using self-reported data. LVM estimates obtained using the described method have been shown to have very good correlation (Pearson r 0.86) and agreement (mean absolute error 10 g) against manually labeled LVM in the UK Biobank11. LVM estimates were indexed for body surface area using the DuBois formula to yield LVMI49. A total of 59 (0.1%) individuals with outlying estimated LVM values (defined as falling outside 5 interquartile ranges from the median, or any value ≤0 g/m2 following calibration) were removed prior to analyses (Fig. 1). The distribution of CMR-derived LVM is shown in Supplementary Fig. 7.
Genome-wide association study
To identify common genetic variation associated with CMR-derived LVM, we performed a GWAS of indexed LVM using BOLT-LMM v2.3.450, which accounts for ancestral heterogeneity, cryptic population structure, and sample relatedness by fitting a linear mixed model with a Bayesian mixture prior as a random effect19, 51, 52. Previous evidence supports the use of LMM approaches to perform GWAS of admixed populations, which may provide favorable statistical power51, 53, 54, and similar approaches have been taken previously19, 51, 52. The GWAS was performed among 43,230 individuals having undergone CMR imaging, after exclusion of individuals without genetic data meeting standard quality control metrics (e.g., no evidence of sex chromosome aneuploidy, outliers in heterozygosity and missing rates). Imputed variants were retained if the imputation information metric was ≥0.3. All variants with minor allele frequency <1% were excluded from the final analyses. Our model was adjusted for age at CMR acquisition, sex, array platform, and first five principal components of genetic ancestry, where sex was classified on the basis of genetic sex. Associations were considered statistically significant at the standard genome-wide significance level (p = 5 × 10−8). Lead single nucleotide polymorphisms (SNPs) were grouped into independent loci based on distance (±500 kb), with conditional analyses performed to assess for independent signals within windows. Variants having suggestive (i.e., p < 1 × 10−6) but not genome-wide significant associations were similarly tabulated. Genetic inflation was assessed by calculating the genomic control factor λ, inspecting quantile-quantile plots, and calculating the linkage disequilibrium score (LDSC) regression intercept using LDSC v1.0.155. Observed scale heritability (h2) was estimated using the slope of LDSC regression. We assessed for independent signals within genome-wide significant loci by a) performing GWAS while conditioning on the imputed allele dosage of each lead SNP found in the primary GWAS (excluding insertion-deletion variants), and b) performing GWAS while conditioning on the top variant on chromosome 17 alone (rs6503451), to assess whether the additional variant located 914 kb apart on chromosome 17 (rs199502, r2 = 0.37), was independent. The primary GWAS was performed among individuals of all genetic ancestries.
We performed several secondary GWAS analyses. First, we performed analogous GWAS restricted to individuals of European genetic ancestry (n = 39,187). Second, we performed GWAS of unindexed LV mass (with and without adjustment for height and weight), as well as LV mass alternatively indexed using the 2.7th power of height56. Third, we performed a GWAS of LVMI after rank-based inverse normal transformation. Fourth, we performed GWAS of LVMI excluding individuals with prevalent myocardial infarction and heart failure.
Bioinformatics and in silico functional analyses
We assessed whether genes within 500 kb of lead SNPs were related to cardiac gene expression using GTEx57 version 8 cis-eQTL tissue data (dbGaP Study Accession phs000424.v8.p2). To maximize power to detect potential candidate genes, we considered eQTLs for both atrial appendage (AA) and LV tissue data19, 58. We included lead variants as well as strong proxy variants (r2 ≥ 0.8). We also quantified tissue-specific expression levels from bulk RNA sequencing data from GTEx57 version 8 (dbGaP Study Accession phs000424.v8.p2). We evaluated the effects of predicted gene expression levels on LVMI by performing a transcriptome-wide association study (TWAS) using S-PrediXcan59. GTEx genotypes and normalized expression data in AA and LV tissues provided in the software were used as training sets to develop the prediction models. Prediction models between each gene-tissue pair were developed using elastic net regression. In total, we tested 6636 and 6008 associations in AA and LV, respectively. The significance threshold for S-PrediXcan was therefore set at p = 0.05/(6636 + 6008), or 3.95 × 10−6. We assessed for potential long-range chromatin interactions using Hi-C analysis in adult heart tissues obtained from the Myocardial Applied Genomics Network (MAGNet, www.med.upenn.edu/magnet) at the University of Pennsylvania60.
We prioritized candidate genes on the basis of closest proximity to the lead variant, eQTLs, TWAS, tissue-specific expression levels, Hi-C analysis, and biologic plausibility based on previously reported data. All prioritized genes were supported by at least two lines of evidence.
Comparison to prior associations with LV measurements and cardiovascular traits
To assess whether the variants we identified in association with LVMI have been previously associated with other LV measurements, we compared our loci to those reported to have genome-wide associations with other LV measurements in prior analyses by Pirruccello et al.19 and Aung et al.10. We performed an analogous search for associations with any cardiovascular disease or risk factor using the National Human Genome Research Institute GWAS Catalog61. For these analyses, we tabulated all associations including the same variant, a variant serving as a strong proxy (r2 ≥ 0.80), or a variant mapping to the same candidate gene.
Polygenic risk score development
To develop a PRS as a genetic instrument for CMR-derived LVMI, we applied a pruning and thresholding approach to our LVMI GWAS results. After removing insertion-deletion variants and strand ambiguous (i.e., A/T and C/G) variants to facilitate replication, we developed and tested four separate candidate PRS utilizing each combination of two thresholds used to define index SNPs (p = 1 × 10−6 and p = 1 × 10−4) and two thresholds used to prune proxy SNPs (r2 = 0.3 and r2 = 0.5). We then selected the PRS explaining the greatest variance in LVMI within the derivation set, which ultimately comprised a set of 465 variants (r2 = 0.3, p = 1 × 10−4, variance of LVMI explained = 0.084; +3.56 g/m2 increase in LVMI per 1-standard deviation [1-SD] increase in PRS, p < 0.01).
Outcomes association testing
We assessed for associations between CMR-derived LVMI and incident AF, myocardial infarction, HF, ventricular arrhythmias, DCM, HCM, and implantable cardioverter-defibrillator (ICD) within participants with follow-up clinical data available after the imaging visit. We assessed for analogous associations using LVH, which was defined as LVMI > 72 g/m2 in men and >55 g/m2 in women44, and alternatively as the sex-specific 90th percentile of LVM1. Diseases were defined using combinations of self-report and inpatient International Classification of Diseases, 9th and 10th revision codes (Supplementary Data 1). Start of follow-up was defined at the time of CMR acquisition and spanned until the earliest of an incident event, death, or last follow-up. The date of last follow-up was dependent upon the availability of linked hospital data, and was therefore defined as March 31, 2021 for participants enrolled in England (93.6%) and Scotland (6.1%), and February 28, 2018 for participants enrolled in Wales (0.3%).
We performed analogous association testing between the LVMI PRS and the same set of incident cardiovascular events among individuals in the UK Biobank that did not undergo CMR (n = 443,326). Outcome and person-time definitions were similar, although start of follow-up was defined as the date of UK Biobank enrollment and blood sample collection. We also repeated association testing between the LVMI PRS and incident events in the independent MGB Biobank sample, using analogous models with person-time beginning at the date of blood sample collection and ending at an event, death, or last encounter in the electronic health record.
Mendelian-randomization analyses of blood pressure and diabetes
As a form of validation of our LVM estimation, we sought to identify evidence of known causal associations between elevated blood pressure and increased LVM40. We therefore conducted two-sample Mendelian-randomization (MR) within individuals of genetic European ancestry in the UK Biobank sample. Given strong epidemiologic associations between diabetes and LVM62, we performed analogous MR analyses for diabetes. Genetic instruments for systolic blood pressure (SBP) and diastolic blood pressure (DBP) were derived from a recent GWAS63. The same set of SNPs was used for both systolic and diastolic blood pressure, but weights specific to systolic versus diastolic blood pressure were used for the systolic and diastolic Mendelian-randomization analysis, respectively63. Utilizing an 865 SNP instrument for SBP and DBP, we prioritized inverse-variance weighted (IVW) meta-analyses of the effect of each SNP on CMR-derived LVMI (and LVM) divided by the effect of the same SNP on SBP and DBP, respectively. We performed an analogous procedure using a 337 SNP instrument for diabetes64. Linear regression models were adjusted for age, sex, genotyping array, and the first ten principal components of genetic ancestry, to determine the beta coefficients and standard errors for the association of each SNP with the outcome (CMR-derived LVMI). These SNP-specific estimates were combined to conduct two-sample Mendelian randomization using the ‘MendelianRandomization’ package in R. Weighted median and MR-Egger analyses were performed secondarily to address potential invalid instruments and directional pleiotropy.
Statistical analysis
We tested associations between CMR-derived LVM and incident AF, myocardial infarction, HF, ventricular arrhythmias, DCM, HCM, and ICD using Cox proportional hazards regression with adjustment for sex and age at CMR acquisition. We fit analogous models using LVH (defined using the thresholds described above) and the LVMI PRS as the primary exposures. Models including the PRS were additionally adjusted for the first five principal components of genetic ancestry. For the PRS outcomes analyses, we did not exclude individuals with pathogenic or likely pathogenic variants for HCM or DCM for the following reasons: (a) a substantial proportion of individuals with clinically confirmed HCM and DCM have no causal variant identified14, 65, (b) recent evidence suggests that polygenic background may play an important role in disease development even among individuals carrying mutations39, and (c) rare variant information is not available in all individuals in our UKBB or MGB replication samples. To assess the frequency of pathologic rare variants among individuals with incident HCM and DCM events, we did tabulate carrier status of high confidence loss of function, deleterious missense, and known pathogenic or likely pathogenic variants in HCM and DCM genes as cataloged in ClinVar as of 2/9/2021. We also included high confidence loss-of-function variants using LOFTEE66, a plug-in of VEP67, and deleterious missense variants68 using 30 in silico prediction tools presented in v4.1a of the dbnsfp database69. A full list of variants is shown in Supplementary Table 17.
Validity of the proportionality assumption was assessed using the Grambsch-Therneau test of correlation70 as well as visual inspection of smoothed fits to Schoenfeld residuals versus time. Where present, substantial deviations from proportional hazards (observed only for age, sex, and certain principal components of ancestry), were modeled by including interaction terms with strata of person-time.
Statistical analyses were performed using R v4.0 (packages ‘data.table’ v1.13.6, ‘ggplot2’ v3.3.3,’survival’ v3.2-7,’prodlim’ v2019.11.13, ‘MendelianRandomization’ v0.5.0)71, 72. Except where otherwise noted, all two-tailed p-values <0.05 were considered statistically significant.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
J.P.P. is supported by a John S. LaDue Memorial Fellowship. L.-C.W. is supported by NIH 1R01HL139731. S.H.C. is supported by the NIH NHLBI BioData Catalyst Fellows program. J.E.H. is supported by NIH (R01HL134893, R01HL140224, K24HL153669). S.A.L. is supported by NIH 1R01HL139731 and American Heart Association 18SFRN34250007. P.T.E. is supported by NIH 1R01HL092577, R01HL128914, K24HL105780, American Heart Association 18SFRN34110082, and Foundation Leducq 14CVD01. V.N. is supported by NIH T32HL007604.
Author contributions
Conceptualization: S.K. and S.A.L.; Methodology: S.K., J.L., J.P.P., L.C.W., S.H.C., A.W.H., X.W., S.F.F., V.N., K.J.B., K.G.A., P.B., and A.A.P.; Supervision: P.T.E. and S.A.L.; Writing – original draft: S.K. and J.L.; Writing – review and editing: J.E.H., P.T.E., and S.A.L.
Peer review
Peer review information
Nature Communications thanks Alistair Young and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
UK Biobank data are publicly available by application (https://www.ukbiobank.ac.uk/enable-your-research/register). LV mass estimates used for the current analysis are accessible to UK Biobank researchers as returned data (return ID #3290). The GWAS summary statistics generated in this study have been deposited in the Human Genome Research Institute GWAS Catalog61 under accession codes GCST90244710 for LVMI (ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90244001-GCST90245000/GCST90244710/) and GCST0244711 for unindexed LVM (ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90244001-GCST90245000/GCST90244711/) and from the Downloads page of the Cardiovascular Disease Knowledge Portal (broadcvdi.org). The LVMI PRS developed in this study has been deposited to the Polygenic Score (PGS) Catalog73 under accession code PGS003427 (https://www.pgscatalog.org/score/PGS003427/). Mass General Brigham (MGB) data contain identifiable protected health information and participants have not consented to data sharing; therefore, the data cannot be shared publicly or with controlled access. This research has been conducted using the UK Biobank Resource under Application #17488.
Code availability
Data processing scripts used to perform the analyses described herein are available at https://github.com/shaankhurshid/lvmass_gwas74.
Competing interests
J.P.P. has consulted for Maze Therapeutics. S.F.F. receives research support from Bayer AG and IBM. L.-C.W. receives research support from IBM to the Broad Institute. P.B. received research support from Bayer AG and IBM, and consults for Novartis. J.E.H. has received research support from Bayer AG and Gilead Sciences, has received research supplies from EcoNugenics, and is an employee of Flagship Pioneering as of January 2023. A.A.P. receives research support from Bayer AG, IBM, Intel, and Verily, and has consulted for Novartis and Rakuten. P.T.E. receives research support from Bayer AG, and has consulted for Bayer AG, Novartis, MyoKardia and Quest Diagnostics. S.A.L. has received research support from Bristol Myers Squibb/Pfizer, Bayer AG, Boehringer Ingelheim, and Fitbit, has consulted for Bristol Myers Squibb/Pfizer and Bayer AG, participated in research collaborations with IBM, and is an employee of Novartis Institute for Biomedical Research as of July 2022. Remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-023-37173-w.
References
- 1.Bluemke DA, et al. The relationship of left ventricular mass and geometry to incident cardiovascular events: the MESA (Multi-Ethnic Study of Atherosclerosis) study. J. Am. Coll. Cardiol. 2008;52:2148–2155. doi: 10.1016/j.jacc.2008.09.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kawel-Boehm N, et al. Left Ventricular Mass at MRI and long-term risk of cardiovascular events: the multi-ethnic study of atherosclerosis (MESA) Radiology. 2019;293:107–114. doi: 10.1148/radiol.2019182871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lazzeroni D, Rimoldi O, Camici PG. From left ventricular hypertrophy to dysfunction and failure. Circ. J. 2016;80:555–564. doi: 10.1253/circj.CJ-16-0062. [DOI] [PubMed] [Google Scholar]
- 4.Chrispin J, et al. Association of electrocardiographic and imaging surrogates of left ventricular hypertrophy with incident atrial fibrillation: MESA (Multi-Ethnic Study of Atherosclerosis) J. Am. Coll. Cardiol. 2014;63:2007–2013. doi: 10.1016/j.jacc.2014.01.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Haider AW, Larson MG, Benjamin EJ, Levy D. Increased left ventricular mass and hypertrophy are associated with increased risk for sudden death. J. Am. Coll. Cardiol. 1998;32:1454–1459. doi: 10.1016/S0735-1097(98)00407-0. [DOI] [PubMed] [Google Scholar]
- 6.Lenstrup M, Kjaergaard J, Petersen CL, Kjaer A, Hassager C. Evaluation of left ventricular mass measured by 3D echocardiography using magnetic resonance imaging as gold standard. Scand. J. Clin. Lab. Investig. 2006;66:647–657. doi: 10.1080/00365510600892233. [DOI] [PubMed] [Google Scholar]
- 7.Wild PS, et al. Large-scale genome-wide analysis identifies genetic variants associated with cardiac structure and function. J. Clin. Investig. 2017;127:1798–1812. doi: 10.1172/JCI84840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kanai M, et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 2018;50:390–400. doi: 10.1038/s41588-018-0047-6. [DOI] [PubMed] [Google Scholar]
- 9.Mosley JD, et al. The polygenic architecture of left ventricular mass mirrors the clinical epidemiology. Sci. Rep. 2020;10:7561. doi: 10.1038/s41598-020-64525-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Aung N, et al. Genome-wide analysis of left ventricular image-derived phenotypes identifies fourteen loci associated with cardiac morphogenesis and heart failure development. Circulation. 2019;140:1318–1330. doi: 10.1161/CIRCULATIONAHA.119.041161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Khurshid, S. et al. Deep learning to estimate cardiac magnetic resonance–derived left ventricular mass. Cardiovasc. Digit. Health J. S2666693621000232. 10.1016/j.cvdhj.2021.03.001 (2021). [DOI] [PMC free article] [PubMed]
- 12.Engel DJ, Schwartz A, Homma S. Athletic cardiac remodeling in US professional basketball players. JAMA Cardiol. 2016;1:80. doi: 10.1001/jamacardio.2015.0252. [DOI] [PubMed] [Google Scholar]
- 13.Shah S, et al. Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. Nat. Commun. 2020;11:163. doi: 10.1038/s41467-019-13690-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Tadros R, et al. Shared genetic pathways contribute to risk of hypertrophic and dilated cardiomyopathies with opposite directions of effect. Nat. Genet. 2021;53:128–134. doi: 10.1038/s41588-020-00762-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Begay RL, et al. Filamin C truncation mutations are associated with arrhythmogenic dilated cardiomyopathy and changes in the cell-cell adhesion structures. JACC Clin. Electrophysiol. 2018;4:504–514. doi: 10.1016/j.jacep.2017.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Valdés-Mas R, et al. Mutations in filamin C cause a new form of familial hypertrophic cardiomyopathy. Nat. Commun. 2014;5:5326. doi: 10.1038/ncomms6326. [DOI] [PubMed] [Google Scholar]
- 17.Brodehl A, et al. Mutations in FLNC are associated with familial restrictive cardiomyopathy. Hum. Mutat. 2016;37:269–279. doi: 10.1002/humu.22942. [DOI] [PubMed] [Google Scholar]
- 18.Locke AE, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Pirruccello JP, et al. Analysis of cardiac magnetic resonance imaging in 36,000 individuals yields genetic insights into dilated cardiomyopathy. Nat. Commun. 2020;11:2254. doi: 10.1038/s41467-020-15823-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Roselli C, et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat. Genet. 2018;50:1225–1233. doi: 10.1038/s41588-018-0133-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Frey N, et al. Calsarcin-2 deficiency increases exercise capacity in mice through calcineurin/NFAT activation. J. Clin. Investig. 2008;118:3598–3608. doi: 10.1172/JCI36277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Daly SB, et al. Exome sequencing identifies a dominant TNNT3 mutation in a large family with distal arthrogryposis. Mol. Syndromol. 2014;5:218–228. doi: 10.1159/000365057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Weng L-C, et al. Heritability of atrial fibrillation. Circ. Cardiovasc. Genet. 2017;10:e001838. doi: 10.1161/CIRCGENETICS.117.001838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Schneider BP, et al. Genome-wide association study for anthracycline-induced congestive heart failure. Clin. Cancer Res. 2017;23:43–51. doi: 10.1158/1078-0432.CCR-16-0908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.van der Harst P, et al. 52 genetic loci influencing myocardial mass. J. Am. Coll. Cardiol. 2016;68:1435–1448. doi: 10.1016/j.jacc.2016.07.729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Goetze JP, et al. Cardiac natriuretic peptides. Nat. Rev. Cardiol. 2020;17:698–717. doi: 10.1038/s41569-020-0381-0. [DOI] [PubMed] [Google Scholar]
- 27.Man J, Barnett P, Christoffels VM. Structure and function of the Nppa-Nppb cluster locus during heart development and disease. Cell Mol. Life Sci. 2018;75:1435–1444. doi: 10.1007/s00018-017-2737-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tamura N, et al. Cardiac fibrosis in mice lacking brain natriuretic peptide. Proc. Natl Acad. Sci. USA. 2000;97:4239–4244. doi: 10.1073/pnas.070371497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Segin S, et al. Cardiomyocyte-specific deletion of Orai1 reveals its protective role in angiotensin-II-induced pathological cardiac remodeling. Cells. 2020;9:1092. doi: 10.3390/cells9051092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Cubbon RM, Kearney MT, Wheatcroft SB. Endothelial IGF-1 receptor signalling in diabetes and insulin resistance. Trends Endocrinol. Metab. 2016;27:96–104. doi: 10.1016/j.tem.2015.11.009. [DOI] [PubMed] [Google Scholar]
- 31.Herman DS, et al. Truncations of titin causing dilated cardiomyopathy. N. Engl. J. Med. 2012;366:619–628. doi: 10.1056/NEJMoa1110186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Choi SH, et al. Association between titin loss-of-function variants and early-onset atrial fibrillation. JAMA. 2018;320:2354–2364. doi: 10.1001/jama.2018.18179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Chevessier F, et al. Myofibrillar instability exacerbated by acute exercise in filaminopathy. Hum. Mol. Genet. 2015;24:7207–7220. doi: 10.1093/hmg/ddv421. [DOI] [PubMed] [Google Scholar]
- 34.Wilkins BJ, et al. Calcineurin/NFAT coupling participates in pathological, but not physiological, cardiac hypertrophy. Circ. Res. 2004;94:110–118. doi: 10.1161/01.RES.0000109415.17511.18. [DOI] [PubMed] [Google Scholar]
- 35.Watkins H, et al. Mutations in the cardiac myosin binding protein-C gene on chromosome 11 cause familial hypertrophic cardiomyopathy. Nat. Genet. 1995;11:434–437. doi: 10.1038/ng1295-434. [DOI] [PubMed] [Google Scholar]
- 36.Daehmlow S, et al. Novel mutations in sarcomeric protein genes in dilated cardiomyopathy. Biochem Biophys. Res Commun. 2002;298:116–120. doi: 10.1016/S0006-291X(02)02374-4. [DOI] [PubMed] [Google Scholar]
- 37.Verweij N, et al. The genetic makeup of the electrocardiogram. Cell Syst. 2020;11:229–238.e5. doi: 10.1016/j.cels.2020.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ntalla I, et al. Multi-ancestry GWAS of the electrocardiographic PR interval identifies 202 loci underlying cardiac conduction. Nat. Commun. 2020;11:2542. doi: 10.1038/s41467-020-15706-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Fahed AC, et al. Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions. Nat. Commun. 2020;11:3635. doi: 10.1038/s41467-020-17374-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hendriks T, et al. Effect of systolic blood pressure on left ventricular structure and function: a Mendelian randomization study. Hypertension. 2019;74:826–832. doi: 10.1161/HYPERTENSIONAHA.119.12679. [DOI] [PubMed] [Google Scholar]
- 41.Ai S, et al. Effects of glycemic traits on left ventricular structure and function: a Mendelian randomization study. Cardiovasc. Diabetol. 2022;21:109. doi: 10.1186/s12933-022-01540-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Suinesiaputra A, et al. Fully-automated left ventricular mass and volume MRI analysis in the UK Biobank population cohort: evaluation of initial results. Int. J. Cardiovasc. Imaging. 2018;34:281–291. doi: 10.1007/s10554-017-1225-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Bai W, et al. Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. J. Cardiovasc. Magn. Reson. 2018;20:65. doi: 10.1186/s12968-018-0471-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Petersen SE, et al. Reference ranges for cardiac structure and function using cardiovascular magnetic resonance (CMR) in Caucasians from the UK Biobank population cohort. J. Cardiovasc. Magn. Reson. 2017;19:18. doi: 10.1186/s12968-017-0327-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Fry A, et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 2017;186:1026–1034. doi: 10.1093/aje/kwx246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Sudlow C, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Littlejohns TJ, Sudlow C, Allen NE, Collins R. UK Biobank: opportunities for cardiovascular research. Eur. Heart J. 2019;40:1158–1166. doi: 10.1093/eurheartj/ehx254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Petersen SE, et al. UK Biobank’s cardiovascular magnetic resonance protocol. J. Cardiovasc. Magn. Reson. 2016;18:8. doi: 10.1186/s12968-016-0227-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Du Bois D, Du Bois EF. A formula to estimate the approximate surface area if height and weight be known. 1916. Nutrition. 1989;5:303–311. [PubMed] [Google Scholar]
- 50.Loh P-R, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-model association for biobank-scale datasets. Nat. Genet. 2018;50:906–908. doi: 10.1038/s41588-018-0144-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Wojcik GL, et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Page GP, et al. Multiple-ancestry genome-wide association study identifies 27 loci associated with measures of hemolysis following blood storage. J. Clin. Investig. 2021;131:e146077. doi: 10.1172/JCI146077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Lloyd-Jones LR, et al. Inference on the genetic basis of eye and skin color in an admixed population via Bayesian linear mixed models. Genetics. 2017;206:1113–1126. doi: 10.1534/genetics.116.193383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Caliebe A, et al. Including diverse and admixed populations in genetic epidemiology research. Genet. Epidemiol. 2022;46:347–371. doi: 10.1002/gepi.22492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Bulik-Sullivan BK, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Cuspidi C, et al. Improving cardiovascular risk stratification in essential hypertensive patients by indexing left ventricular mass to height(2.7) J. Hypertens. 2009;27:2465–2471. doi: 10.1097/HJH.0b013e32833105a6. [DOI] [PubMed] [Google Scholar]
- 57.GTEx Consortium. Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Ahlberg G, et al. Genome-wide association study identifies 18 novel loci associated with left atrial volume and function. Eur. Heart J. 2021;42:4523–4534. doi: 10.1093/eurheartj/ehab466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.GTEx Consortium. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 2018;9:1825. doi: 10.1038/s41467-018-03621-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Bianchi, V. et al. Detailed regulatory interaction map of the human heart facilitates gene discovery for cardiovascular disease. Preprint at bioRxiv10.1101/705715 (2019).
- 61.Buniello A, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D1012. doi: 10.1093/nar/gky1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Palmieri V, et al. Effect of Type 2 Diabetes Mellitus on Left Ventricular Geometry and Systolic Function in Hypertensive Subjects: Hypertension Genetic Epidemiology Network (HyperGEN) Study. Circulation. 2001;103:102–107. doi: 10.1161/01.CIR.103.1.102. [DOI] [PubMed] [Google Scholar]
- 63.the Million Veteran Program et al. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 2018;50:1412–1425. doi: 10.1038/s41588-018-0205-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Mahajan A, et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat. Genet. 2022;54:560–572. doi: 10.1038/s41588-022-01058-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Walsh R, et al. Quantitative approaches to variant classification increase the yield and precision of genetic testing in Mendelian diseases: the case of hypertrophic cardiomyopathy. Genome Med. 2019;11:5. doi: 10.1186/s13073-019-0616-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Karczewski KJ, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.McLaren W, et al. The ensembl variant effect predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Jurgens SJ, et al. Analysis of rare genetic variation underlying cardiometabolic diseases and traits among 200,000 individuals in the UK Biobank. Nat. Genet. 2022;54:240–250. doi: 10.1038/s41588-021-01011-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Liu X, Wu C, Li C, Boerwinkle E. dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs. Hum. Mutat. 2016;37:235–241. doi: 10.1002/humu.22932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Grambsch PM, Thern’eau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994;81:515–526. doi: 10.1093/biomet/81.3.515. [DOI] [Google Scholar]
- 71.R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing Vienna, Austria, 2015).
- 72.Dowle, M. et al. data.table: extension of ‘data.frame’. Version 1.12.6. https://CRAN.R-project.org/package=data.table.
- 73.Lambert SA, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 2021;53:420–425. doi: 10.1038/s41588-021-00783-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Shaankhurshid. shaankhurshid/lvmass_gwas: v1.0.10.5281/ZENODO.7548696 (2023).
- 75.Hurvich CM, Simonoff JS, Tsai C-L. Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 1998;60:271–293. doi: 10.1111/1467-9868.00125. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
UK Biobank data are publicly available by application (https://www.ukbiobank.ac.uk/enable-your-research/register). LV mass estimates used for the current analysis are accessible to UK Biobank researchers as returned data (return ID #3290). The GWAS summary statistics generated in this study have been deposited in the Human Genome Research Institute GWAS Catalog61 under accession codes GCST90244710 for LVMI (ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90244001-GCST90245000/GCST90244710/) and GCST0244711 for unindexed LVM (ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90244001-GCST90245000/GCST90244711/) and from the Downloads page of the Cardiovascular Disease Knowledge Portal (broadcvdi.org). The LVMI PRS developed in this study has been deposited to the Polygenic Score (PGS) Catalog73 under accession code PGS003427 (https://www.pgscatalog.org/score/PGS003427/). Mass General Brigham (MGB) data contain identifiable protected health information and participants have not consented to data sharing; therefore, the data cannot be shared publicly or with controlled access. This research has been conducted using the UK Biobank Resource under Application #17488.
Data processing scripts used to perform the analyses described herein are available at https://github.com/shaankhurshid/lvmass_gwas74.