Defining a Contemporary Ischemic Heart Disease Genetic Risk Profile using Historical Data

Jonathan D Mosley; Sara L Van Driest; Quinn S Wells; Christian M Shaffer; Todd L Edwards; Lisa Bastarache; Catherine A McCarty; Will Thompson; Christopher G Chute; Gail P Jarvik; David R Crosslin; Eric B Larson; Iftikhar J Kullo; Jennifer A Pacheco; Peggy L Peissig; Murray H Brilliant; James G Linneman; Josh C Denny; Dan M Roden

doi:10.1161/CIRCGENETICS.116.001530

. Author manuscript; available in PMC: 2017 Dec 1.

Published in final edited form as: Circ Cardiovasc Genet. 2016 Oct 25;9(6):521–530. doi: 10.1161/CIRCGENETICS.116.001530

Defining a Contemporary Ischemic Heart Disease Genetic Risk Profile using Historical Data

Jonathan D Mosley ¹, Sara L Van Driest ^1,², Quinn S Wells ¹, Christian M Shaffer ¹, Todd L Edwards ³, Lisa Bastarache ⁴, Catherine A McCarty ⁵, Will Thompson ⁶, Christopher G Chute ⁷, Gail P Jarvik ⁸, David R Crosslin ⁹, Eric B Larson ¹⁰, Iftikhar J Kullo ¹¹, Jennifer A Pacheco ¹², Peggy L Peissig ¹³, Murray H Brilliant ¹⁴, James G Linneman ¹⁵, Josh C Denny ^1,⁴, Dan M Roden ^1,^4,¹⁶

PMCID: PMC5177499 NIHMSID: NIHMS825921 PMID: 27780847

Abstract

Background

Continued reductions in morbidity and mortality attributable to ischemic heart disease (IHD) require an understanding of the changing epidemiology of this disease. We hypothesized that we could use genetic correlations, which quantitate the shared genetic architectures of phenotype pairs, and extant risk factors from a historical prospective study to define the risk profile of a contemporary IHD phenotype.

Methods and Results

We used 37 phenotypes measured in the Atherosclerosis Risk in Communities (ARIC) study (n=7,716 European ancestry subjects) and clinical diagnoses from an electronic health record (EHR) data set (n=19,093). All subjects had genome-wide SNP genotyping. We measured pairwise genetic correlations (rG) between the ARIC and EHR phenotypes using linear mixed models. The genetic correlation estimates between the ARIC risk factors and the EHR IHD were modestly linearly correlated with hazards ratio estimates for incident IHD in ARIC (Pearson’s correlation [r]=0.62), indicating that the two IHD phenotypes had differing risk profiles. For comparison, this correlation was 0.80 when comparing EHR and ARIC type 2 diabetes (T2D) phenotypes. The EHR IHD phenotype was most strongly correlated with ARIC metabolic phenotypes including total-to-HDL cholesterol ratio (rG=−0.44, p=0.005), HDL (rG=−0.48, p=0.005), systolic blood pressure (rG=0.44, p=0.02) and triglycerides (rG=0.38, p=0.02). EHR phenotypes related to T2D, atherosclerotic and hypertensive diseases were also genetically correlated with these ARIC risk factors.

Conclusions

The EHR IHD risk profile differed from ARIC, and indicates that treatment and prevention efforts in this population should target hypertensive and metabolic disease.

Keywords: genetic epidemiology, ischemic heart disease, risk factor, epidemiology, genetics

Journal Subject Terms: Cardiovascular Disease, Diabetes, Type 2, Epidemiology, Genetic, Association Studies

Introduction

There has been a marked decline in mortality and morbidity for ischemic heart disease (IHD) over the last several decades.¹ These gains have come from targeting treatment and prevention strategies toward risk factors identified in landmark longitudinal studies, such as the Framingham Heart Study.² However, since the inception of these cohort studies, there have been changes in the prevalence of IHD risk factors such as smoking, type 2 diabetes (T2D) and obesity which would be expected to alter the epidemiological risk profile of this disease.³ In order to realize continued declines in morbidity, ongoing treatment and prevention efforts must be directed toward contemporary risk profiles.⁴ While initiating new longitudinal studies is one approach, it is hampered by long latencies and high costs.⁵ Data sources such as electronic health records (EHRs) offer a contemporary cohort of subjects with prevalent and incident IHD. However, baseline risk factors are often unavailable or inconsistently measured across subjects in EHR data sets, thereby limiting their utility for epidemiological analyses.⁶ We propose here an alternative study design which overcomes this limitation by using statistical methods that determine the relationship between two phenotypes based on their shared genetic risk.

Genetic variation is an important modulator of many known IHD risk factors and biomarkers.⁷ As this genetic variation constitutes a lifelong exposure of disease risk, modelling disease associations based on underlying genetic variation can link risk factors to clinical outcomes, as demonstrated by Mendelian randomization or genetic risk score analyses.^8–11 An alternative genetic association approach used here is based on genetic correlations measured using generalized linear mixed models (GLMMs). GLMMs use common SNPs to quantitate the phenotypic variation attributable to additive genetics in unrelated individuals.^12,13 This method is typically more sensitive to capturing the overall additive genetic effects of common SNPs than genome wide association studies (GWAS) and related approaches, such as genetic risk score analyses.¹⁴ GLMMs can also analyze pairs of phenotypes to measure a genetic correlation, which quantitates the extent to which two phenotypes share genetic influences.^15,16 Importantly, a genetic correlation can be calculated using a risk factor measured in one population and an outcome measured in a second, unrelated population.

We hypothesized that we could use genetic correlations to define the epidemiology of IHD in an EHR population using risk factors measured in an unrelated population. We used baseline phenotypes measured in the prospective Atherosclerosis Risk in Communities (ARIC) study¹⁷ and diseases ascertained through the Electronic Medical Records and Genomics (eMERGE) network, a consortium of medical centers with EHR-linked DNA biobanks.¹⁸ We demonstrate that genetic correlation and longitudinal analyses identify similar risk factor associations for T2D. However, this was not the case for IHD, indicating that the IHD genetic risk profile in the EHR cohort differs from the ARIC cohort.

Materials and Methods

An overview of the approaches used is shown in Supplementary Figure 1.

Study populations

ARIC

The ARIC population comprises 13,113 genotyped adult subjects participating in the NHLBI-funded Atherosclerosis Risk in Communities longitudinal study designed to investigate the natural history of cardiovascular and atherosclerotic diseases.¹⁷ Study subjects were recruited in 1987 to 1989 from four U.S. communities: Minneapolis, MN, Washington County, MD, Forsyth County, NC, and Jackson, MS. Genetic and phenotypic data were downloaded from dbGaP (phs000280.v3.p1). EHR subjects: The EHR populations included adult subjects from the eMERGE Phase I Network (n=14,205), a consortium of medical centers using electronic health records as a tool for genomic research, and the Vanderbilt Electronic Systems for Pharmacogenomic Assessment (VESPA) cohort (n=11,639).^19,20 The eMERGE subjects came from five sites [Vanderbilt University (VUMC), Marshfield Clinic, Northwestern University, Mayo Clinic and Group Health Research Institute], while the non-overlapping VESPA cohort included additional subjects from VUMC’s BioVU resource, a de-identified collection of patients whose DNA was extracted from discarded blood and linked to phenotypes through a de-identified EHR²¹. Both the ARIC and EHR data sets were primarily composed of self-reported whites and so only subjects of European ancestry were included in the final analyses, defined using STRUCTURE²² in conjunction with ancestry informative markers, with European ancestry defined as >90% (ARIC) or >80% (EHR subjects) probability of being in the HapMap CEU cluster. Thresholds were selected based on comparisons of self-reported race with STRUCTURE ancestry assignment using a multi-racial population.

The eMERGE study was approved by the Institutional Review Board (IRB) at each site.^18,21 During the period of study, Vanderbilt’s BioVU resource operated as nonhuman subjects research according to the provisions of 45 Code of Federal Regulations, part 46, with oversight by Vanderbilt’s Institutional Review Board (IRB), as previously described.²¹ This study was approved by Vanderbilt’s IRB.

Genetic Data

SNP genotype data were acquired on the Illumina Human660W-Quadv1_A (eMERGE), Illumina HumanOmni1-Quad (VESPA), HumanOmni5-Quad (VESPA) and Affymetrix 6.0 SNP array (ARIC) platforms. Quality control (QC) steps for the EHR data sets were performed per the published protocols established by the eMERGE Genomics Working Group.²³ For imputation, palindromic alleles were aligned to the positive strand using allele frequency information from the 1000 Genomes Project. SNPs were pre-phased using SHAPEIT,²⁴ and data were imputed using IMPUTE2²⁵ and the 10/2014 release of the 1000 Genomes cosmopolitan reference haplotypes. QC for the ARIC data set followed the guidelines accompanying the dbGaP release including removing SNPs with chromosomal anomalies and with >5 discordant calls in replicate samples, and using a predefined subset of unrelated subjects. QC analyses used PLINK v1.07.²⁶ After filtering for a sample missingness rate<2.0%, a SNP missingness rate<2.0% and a SNP deviation from Hardy-Weinberg<0.001, there were 627,580 SNPs with MAF>1.0%. The merged intersection of the ARIC and imputed EHR data sets contained 488,525 SNPs with MAF>1.0%.

Phenotype data

EHR clinical phenotypes were based on PheCodes, which are collections of related ICD-9 (International Classification of Disease, Ninth revision) diagnosis codes.^27,28 For each phenotype, cases are subjects with two or more instances of the code appearing in their medical record on two separate dates.²⁸ Controls are randomly selected for each phenotype among individuals without any closely related PheCodes. There are ~1,600 defined phenotypes, of which 519 had ≥400 cases in the EHR data set. PheCodes used in the primary analyses were T2D (code 250.2), Ischemic heart disease (411), Atherosclerosis of the extremities (440.2) and Myocardial infarction (411.2). PheCodes and ICD-9 mappings are available at https://phewas.mc.vanderbilt.edu/.

The ARIC phenotypes included 37 baseline (visit 1) measurements. A list of these phenotypes, selection criteria and data transformations are in Supplementary Table 1. Longitudinal outcomes and latencies for incident T2D and coronary heart disease used phenotypes generated for the ARIC-Geneva substudy. For the T2D longitudinal analysis, the variables used were incident cases (phv00080468.v1.p1), follow-up time (phv00080469.v1.p1) and prevalent diabetes at baseline (phv00022845.v1.p1). For coronary heart disease, the analyses used incident cases (phv00066400.v1.p1) of myocardial infarction, fatal coronary event, silent infarction or revascularization procedure by 12/31/2004, follow-up time (phv00022861.v1.p1) and prevalent disease a baseline (phv00022832.v1.p1).

Statistics

Generalized linear mixed models (GLMM) were used to compute genetic liabilities and genetic correlations, as implemented in the Genome-wide Complex Trait Analysis (GCTA) program v1.24.7.²⁹ The GLMM method is described in more detail in supplementary methods. First, a genetic relationship matrix (GRM) was computed using autosomal SNPs with a MAF>1.0%. One of a pair of subjects with a relatedness score>0.05 were randomly excluded. After exclusions, there were 7,716 ARIC and 19,093 EHR unrelated subjects. Analyses adjusted for age (ARIC subjects) or birth year (EHR subjects), sex and 20 principal components (PCs). P-values for genetic liability estimates were computed by a likelihood ratio test (LRT) comparing a full model with a model excluding the GRM variance component. PheCodes with a positive liability estimate with p<0.1 and >400 cases were used for genetic correlation analyses (n=158).

A bivariate GLMM, adjusting for age, sex and 20 PCs, was used to compute genetic correlations between the ARIC risk factors and EHR phenotypes. Bivariate GLMMs were constrained to have a genetic correlation between -1 and 1 and estimates from constrained models are noted. These models were not adjusted for comorbidities, as adjusting for genetic phenotypes can lead to false-positive associations due to collider bias.³⁰ P-values are based on a LRT comparing a full model (L1) and a model where the genetic correlation was constrained to be 0 (L0) [LRT= −2(L1-L0) ~Χ² (1 d.f.)]. Because of the highly correlated nature of the phenotypes, a Benjamini-Hochberg (B-H) false discovery rate (FDR) adjustment was applied within each experiment to adjust for multiple testing.³¹ While all association data are shown within each figure and table, only phenotype pairs with FDR q-value<0.1 were considered to have a statistically significant genetic correlation.

Cox proportional hazards analysis was used to measure hazard ratios between ARIC risk factor phenotypes and incident T2D or coronary heart disease (CHD). For each analysis, subjects with prevalent T2D or CHD at baseline were excluded, respectively, and only self-reported whites were analyzed. Each risk factor was standardized to have a standard deviation of 1, so hazard ratios represent the risk per standard deviation increase. The models were adjusted for gender and age. The proportional hazards assumption was evaluated by examining cumulative Martingale residuals and applying Kolmogorov-type supremum tests. Analyses were performed using SAS v9.3 (SAS Institute, Cary, NC).

Results

Study populations and phenotypes

We used genetic correlations to associate ARIC phenotypes with clinical diagnoses measured in an EHR data set (Figure 1 and Supplementary Figure 1). The EHR data set comprised 19,093 unrelated EA subjects, of which 50.1% were males and the median birth year was 1945 (interquartile range 1935–1955) (Supplementary Table 2). There were 158 genetic phenotypes available for analyses in this population (Supplementary Table 3). The ARIC data set comprised 7,716 unrelated EA subjects. We selected 37 genetically modulated phenotypes from this population that have been associated with IHD and represent a range of anthropometric, laboratory and physiological biomarkers (Supplementary Table 4).

Overview of the study approach. Risk factors and biomarkers were measured from the ARIC population and clinical diseases were curated from an EHR data set. Pair-wise genetic correlations between the diseases and risk factor were measured to identify genetically correlated disease-risk factor pairs. A more detailed overview is presented in Supplementary Figure 1.

Validating genetic correlation associations

In order to assess the specificity and magnitudes of genetic correlations, we computed genetic correlations between five continuous ARIC phenotypes and EHR phenotypes whose clinical case definitions are closely related to the continuous phenotypes (e.g. diabetes is defined by elevated blood glucose levels). For each ARIC phenotype, the largest and most significant genetic correlation was seen with its corresponding EHR phenotype (Figure 2A and Supplementary Table 5). For instance, glucose levels were significantly positively genetically correlated with a clinical diagnosis of T2D (genetic correlation [rG]=0.79, p=1.4×10⁻⁷), indicating that higher genetically determined glucose values are associated with an increased genetic risk of T2D. The genetic correlations for the ARIC phenotypes and the corresponding EHR disease-defining phenotypes ranged from 0.70 (body mass index [BMI] and obesity) to 1.0 (pack-years of smoking and “tobacco use disorder”).

Validating genetic correlations. (A) Cases and controls for five diseases were extracted from the EHR data set and its continuous analog was measured in the ARIC population. Genetic correlations for each disease-phenotype pair were computed using a GLMM adjusting for age, sex and 20 PCs. Each bar shows the genetic correlation for a pair. The genetic correlation for the smoking-tobacco user pair was constrained to a value of 1. An asterisk denotes genetic correlations with a nominal p<0.001. (B) Genetic correlations between EHR and ARIC phenotypes for Type 2 diabetes (T2D) and ischemic heart disease (IHD). All includes both incident and prevalent cases. (C) Genetic correlations between the EHR T2D phenotype (n=4,367 cases; genetic liability=0.15 [s.e. 0.02]) and 37 ARIC phenotypes. Each point is a pairwise genetic correlation. Color-coding indicates FDR significance levels for genetic correlations, as indicated in the key. (D) Scatter plot comparing the genetic correlations with the EHR T2D phenotype and hazards ratios for incident T2D (n=878 incident cases) computed in the ARIC data set. Hazard ratios were adjusted for age and gender.

We compared cross-sectional phenotype associations based on genetic correlations to longitudinal associations based on hazard ratios for T2D and 37 ARIC phenotypes. The EHR T2D phenotype was significantly genetically correlated with an ARIC phenotype of incident and prevalent T2D (rG=0.76, p=2.3×10⁻⁶; 1,321 cases and 6,395 controls), indicating that the phenotypes have a similar genetic risk profile (Figure 2B). Fourteen ARIC phenotypes were genetically correlated with EHR T2D at FDR q<0.1 (Figure 2C and Supplementary Table 6). Phenotypes with positive correlations included measures of adiposity (e.g. waist circumference rG=0.51, p=8×10⁻⁵), low density lipoprotein (LDL) cholesterol (rG=0.79, p=2.0×10⁻⁴), SBP (rG=0.35, p=0.017), Protein C (rG=0.30, p=0.001) and the inflammatory marker c-reactive protein (CRP) (rG=0.39, p=0.027). The negatively correlated phenotypes were activated partial thromboplastin time (aPTT) (rG=−0.44, p=4.2×10⁻⁴) and HDL cholesterol levels (rG=−0.38, p=0.006). We compared the genetic correlation estimates to hazard ratio estimates for incident T2D within the ARIC data set (Supplementary Table 6). There was a strong linear relationship between hazards ratios and the genetic correlations (Pearson’s r=0.80) (Figure 2D). .

Risk factors associated with ischemic heart disease

We next examined an EHR ischemic heart disease (IHD) phenotype, which comprises myocardial infarction, coronary atherosclerosis and cardiac angina. IHD cases (n=5,114) included a larger proportion of males and had higher rates of comorbid diagnoses related to heart failure, hypertension, kidney disease, T2D and obesity, as compared to controls (n=8,789) (Table 1). The genetic correlation between the EHR IHD phenotype and coronary heart disease in ARIC (defined as MI, fatal CAD, silent MI detected by ECG, or coronary revascularization) was 0.58 (p=0.01) using incident ARIC cases (911 cases and 6,272 controls) and 0.70 (p=0.0005) using incident and prevalent cases (n=1,314) (Figure 2B). The EHR IHD phenotype was positively genetically correlated with carotid intimal-medial thickness (CIMT) (rG=0.74, p=0.002), two cardiac electrocardiogram (ECG) phenotypes (QTc [rG=0.59, p=0.003] and Cornell voltage which is used to identify left ventricular hypertrophy³² [rG=0.52, p=0.007]), total-to-HDL cholesterol ratio and triglycerides (Figure 3A and Table 2). HDL (rG=−0.48, p=0.005) and apolipoprotein A (ApoA) (rG=−0.45, p=0.016) were negatively correlated. We compared the genetic correlation estimates to hazard ratio estimates for incident coronary heart disease in ARIC (Table 2). The estimates were correlated (r=0.62), but more weakly than for T2D (Figure 3B). A similar result was observed when comparing odd-ratios based on incident and prevalent disease in ARIC (r=0.66) (supplementary Figure 2). Several risk factors were associated with the ARIC IHD phenotype but not the EHR phenotype, including CRP, fibrinogen and lipid measures related to LDL (Figure 3B). LDL cholesterol had a highly significant association with the ARIC phenotype (hazard ratio [HR]=1.4 [95% CI=1.3–1.5], p=1×10⁻²⁷), but a non-significant genetic correlation (rG=0.07, p=0.7) with the EHR phenotype (Table 2). The genetic correlation between LDL cholesterol and incident coronary heart disease, both measured in ARIC, was larger than with the EHR phenotype, but was not significant (rG=0.33, p=0.16).

Table 1.

Characteristics of the EHR ischemic heart disease cases and controls.

Characteristic	Cases (n=5,114)	Controls (n=8,789)	P-value^*
Sex [n (%)]
Males	3,206 (62.7)	4,126 (47.0)	<.0001
Females	1,908 (37.3)	4,663 (53.0)
Birth Decade
median (IQR)	1935 (1925–1945)	1945 (1935–1955)	<.0001
Comorbidity [n (%)]^†
Congestive heart failure	2,013 (39.4)	427 (4.9)	<.0001
Atrial fibrillation	1,738 (34.0)	596 (6.8)	<.0001
Hyperlipidemia	3,881 (75.9)	3,718 (42.3)	<.0001
Essential hypertension	4,276 (83.6)	4,402 (50.1)	<.0001
Chronic renal failure	1,292 (25.3)	800 (9.1)	<.0001
Type 2 diabetes	2,057 (40.2)	1,739 (19.8)	<.0001
Chronic airway obstruction	1,222 (23.9)	689 (7.8)	<.0001
Tobacco use disorder	965 (18.9)	885 (10.1)	<.0001
Gout	591 (11.6)	294 (3.3)	<.0001
Sleep apnea	805 (15.7)	667 (7.6)	<.0001
Obesity	1,168 (22.8)	1,490 (17.0)	<.0001

Open in a new tab

P-values were based on chi-squared analysis (Sex) or logistic regression analyses, adjusting for gender and birth year (comorbidities).

^†

Comorbidities were defined based on ICD-9 derived PheCodes.

ARIC phenotypes genetically correlated with ischemic heart disease. Each point is a pair-wise genetic correlation. Color-coding indicates FDR significance levels for genetic correlations, as indicated in the key at the bottom of the figure. (A) Genetic correlations with the EHR ischemic heart disease (IHD) phenotype (n=5,114 cases, liability=0.09[0.02]). (B) Scatter plot comparing the IHD genetic correlations to hazard ratios for incident coronary heart disease (n=1,314 cases) in the ARIC data set. Risk factors strongly associated with the ARIC phenotype, but not with the EHR phenotype, are labelled. Genetic correlations for: (C) peripheral atherosclerosis (n=1,604 cases, genetic liability=0.15[0.04]) and (D) myocardial infarction (n=1,700 cases, liability=0.06[0.04]).

Table 2.

Associations between ARIC risk factors and the EHR IHD (based on genetic correlations) and ARIC IHD (based on hazard ratios) phenotypes.

	Genetic correlation with EHR IHD^*		Hazard Ratio for ARIC IHD^†
ARIC risk factor	rG (s.e.)	P-value	Hazard Ratio(95% CI)	P-value
Carotid IM thickness	0.73 (0.28)	0.002	1.31 (1.2–1.4)	6.3E-23
Cornell voltage	0.59 (0.23)	0.004	1.17 (1.1–1.2)	2.3E-07
HDL Cholesterol	−0.48 (0.18)	0.005	0.59 (0.5–0.6)	3.1E-37
QTc	0.52 (0.21)	0.007	1.20 (1.1–1.3)	3.6E-09
Total-to-HDL cholesterol ratio	0.44 (0.17)	0.008	1.72 (1.6–1.8)	3.1E-86
Serum protein C	0.28 (0.13)	0.01	1.15 (1.1–1.2)	7.6E-06
Apolipoprotein A	−0.45 (0.20)	0.02	0.73 (0.7–0.8)	6.7E-20
Systolic BP	0.44 (0.20)	0.02	1.34 (1.3–1.4)	1.4E-21
Serum triglycerides	0.38 (0.17)	0.02	1.38 (1.3–1.5)	6.8E-33
Smoking	0.48 (0.25)	0.03	1.21 (1.2–1.3)	5.3E-15
Serum Insulin	0.45 (0.26)	0.06	1.37 (1.3–1.5)	1.4E-25
Serum Glucose	0.35 (0.20)	0.07	1.27 (1.2–1.3)	2.8E-21
Serum uric acid	0.37 (0.23)	0.08	1.14 (1.1–1.2)	3.5E-05
aPTT	0.23 (0.16)	0.15	0.94 (0.9–1.0)	3.7E-02
White blood cell count	0.24 (0.17)	0.16	1.36 (1.3–1.4)	1.4E-24
Waist-hip ratio	0.24 (0.18)	0.18	1.45 (1.3–1.6)	1.4E-16
Eosinophil percent	0.29 (0.23)	0.20	1.05 (1.0–1.1)	1.7E-01
Height	−0.14 (0.12)	0.25	0.87 (0.8–1.0)	1.6E-03
Platelet count	−0.15 (0.14)	0.26	1.02 (1.0–1.1)	4.4E-01
Mean corpuscular volume	−0.17 (0.20)	0.40	0.86 (0.8–0.9)	8.7E-06
Subscapular skinfold	0.13 (0.18)	0.46	1.39 (1.3–1.5)	5.5E-15
Serum Potassium	0.13 (0.20)	0.49	0.93 (0.9–1.0)	2.1E-02
Apolipoprotein B	0.12 (0.20)	0.55	1.42 (1.3–1.5)	5.7E-38
Serum creatinine	0.10 (0.19)	0.59	1.05 (1.0–1.1)	2.3E-01
Diastolic BP	−0.10 (0.19)	0.60	1.11 (1.0–1.2)	9.9E-04
Fibrinogen level	−0.10 (0.19)	0.61	1.33 (1.3–1.4)	1.0E-26
LDL cholesterol	0.07 (0.20)	0.72	1.38 (1.3–1.5)	1.1E-27
Body mass index	0.06 (0.18)	0.75	1.23 (1.1–1.3)	1.6E-08
Total protein	0.06 (0.20)	0.77	1.05 (1.0–1.1)	1.2E-01
Total cholesterol	0.04 (0.17)	0.83	1.35 (1.3–1.4)	2.1E-25
Hs c-reactive protein	−0.05 (0.22)	0.83	1.32 (1.2–1.4)	4.8E-16
Serum calcium	−0.04 (0.23)	0.85	1.00 (0.9–1.1)	9.5E-01
Heart rate	0.04 (0.20)	0.86	1.12 (1.1–1.2)	7.4E-05
Factor VII level	0.02 (0.17)	0.90	1.18 (1.1–1.3)	1.7E-07
Von Willebrand factor	0.01 (0.16)	0.94	1.16 (1.1–1.2)	1.6E-06
Waist circumference	0.00 (0.16)	0.97	1.29 (1.2–1.4)	6.5E-12
Hemoglobin level	−0.01 (0.22)	0.97	1.15 (1.1–1.2)	6.0E-04

Open in a new tab

Genetic correlations between the EHR IHD phenotype and each ARIC risk factor were computed using a bivariate GLMM adjusting for age, sex and 20 PCs.

^†

Hazard ratios units for incident IHD in the ARIC set are per standard deviation increase in the risk factor and are adjusted for gender and age.

We examined two additional EHR phenotypes associated with atherosclerotic disease to see whether they had LDL associations. Peripheral vascular disease was positively genetically correlated with LDL (rG=0.79, p=2×10⁻⁴) (Figure 3C and Supplementary Table 7). There were also significant positive correlations with smoking, SBP and glucose levels and negative correlations with HDL and ApoA. Myocardial infarction had a different pattern of associations and was most strongly positively correlated with triglycerides (Tg), total-to-HDL cholesterol ratio, coagulation Factor VII levels, ECG measures and CIMT (Figure 3D and Supplementary Table 8).

Diagnoses associated with risk factors

The genetic correlation approach can test associations between a risk factor and a collection of EHR diagnosis to identify EHR diagnoses associated with the risk factor. We used this approach to further define four ARIC phenotypes positively correlated with the EHR IHD phenotype (Figure 3C). For each ARIC phenotype, we measured pair-wise genetic correlations between 158 EHR phenotypes. Non-HDL cholesterols directly contribute to atherosclerotic disease, while CIMT is a measure of accumulated disease burden within the carotid artery.³³ Total-to-HDL cholesterol ratio was significantly associated with a 19 EHR phenotypes including atherosclerotic diseases, such as peripheral vascular disease and coronary atherosclerosis (rG=0.42, p=0.008), and T2D (rG=0.45, p=4.9×10⁻⁴) (Figure 4A and Supplementary Table 9). The number and strength of the significant genetic correlations for the Total-to-HDL ratio phenotype was greater than either LDL (n=0) or HDL (n=2) cholesterol phenotypes (Supplementary Figure 3). CIMT was significantly correlated with four diagnoses related to hypertension and ischemic heart disease and showed weaker associations (FDR q>0.1) with atherosclerotic diseases (Figure 4B and Supplementary Table 10). We next examined systolic blood pressure and Cornell voltage. SBP was significantly positively genetically correlated with four diagnoses related to hypertension and hypertensive heart and kidney disease (rG=0.75, p=0.0004) (Figure 4C and Supplementary Table 11). There were no significant genetic correlations with Cornell voltage (Figure 4D and Supplementary Table 12). The strongest non-significant correlations were seen with ischemic heart disease and hypertension-related diagnoses. In sum, the pattern of diseases associated with these risk factors point to a contribution of hypertensive, metabolic and atherosclerotic disease processes to the EHR ischemic heart disease risk.

EHR phenotypes genetically correlated with ARIC risk factors and biomarkers. Pairwise genetic correlations between 158 EHR phenotypes and the ARIC phenotypes of (A) Total-to-HDL cholesterol ratio (n=7,701, genetic heritability [h2]=0.20[0.04]), (B) log carotid intima-medial arterial wall thickness (n=7,315, h2=0.10[0.04]), (C) systolic blood pressure (SBP) (n=7,710, h2=0.16[0.04] and (D) Cornell voltage from cardiac ECG (n=7432, h2=0.13[0.04]).

Discussion

We used genetic correlations to associate baseline risk factors in the ARIC prospective study with clinical diagnoses from an EHR data set. We show that, for T2D, the genetic correlations from cross-sectional analyses were linearly related to hazard ratio estimates from a longitudinal analysis. The correlation was weaker when comparing these values for an IHD phenotype. The EHR IHD phenotype was genetically correlated with CIMT, Total/HDL cholesterol ratio, HDL, systolic blood pressure and triglycerides. An analysis of additional EHR phenotypes genetically correlated with the IHD risk factors indicates that a genetic predisposition toward hypertension, atherosclerosis and metabolic syndrome is an important contributor to IHD risk in the EHR population.

The marked reductions in morbidity and mortality for stroke and myocardial infarction are largely attributable to effective treatment and prevention strategies targeting epidemiologically significant risk factors.³⁴ Over the last decades, there have also been changes in societal behaviors and norms that have altered the prevalence of IHD risk factors and, potentially, their contribution to the risk of this disease. We hypothesized that we could directly evaluate whether the epidemiological profile for IHD has changed by comparing associations between both a contemporary or historical IHD phenotype and a common set of risk factors. Testing this hypothesis was feasible because associations can now be modelled based on underlying genetic risk. Furthermore, by testing associations based on underling genetics, we demonstrate that it is possible to rapidly delineate a contemporary risk profile for an EHR IHD phenotype. This contemporary risk profile can be used to redirect treatment and prevention strategies toward areas of unmet need.

For T2D, the genetic correlations were strongly linearly correlated with longitudinal hazard ratio estimates. Consistently, the ARIC and EHR T2D phenotypes were also strongly genetically correlated. Together, these results indicate that the T2D cases in the two data sets had similar epidemiological and genetic profiles. This could be expected, as the EHR and ARIC case definitions are based on standardized measures of serum glucose levels.³⁵ Our findings also show that genetic factors which predispose to T2D modulate a wide range of physiological measures beyond serum glucose, including lipid levels, blood pressure, hematological parameter, inflammatory markers and adiposity, consistent with findings from epidemiological and genetic studies.^36,37

The correlation between the genetic correlation and hazard ratio estimates was weaker for the IHD phenotype, consistent with the weaker genetic correlation between the EHR and ARIC incident IHD phenotypes. The EHR IHD phenotype was genetically correlated with several phenotypes comprising the metabolic syndrome including low HDL levels, elevated triglycerides and elevated systolic blood pressure.³⁸ In ARIC, LDL cholesterol and ApoB were strongly associated with incident IHD, but were not significantly genetically correlated with the EHR IHD or myocardial infarction phenotypes. The genetic correlation between LDL and coronary artery disease has been previously reported to be 0.25,³⁹ which is similar to the genetic correlation between the ARIC LDL and ARIC IHD phenotypes (rG=0.33). While our study was not powered to detect correlations this small (we had only 20% power to detect a genetic correlation of <0.20⁴⁰), our point estimate was considerably smaller than this (rG=0.07), suggesting that the correlation in our data set may be lower than these estimates. One explanation for this discrepancy is that the LDL-associated risk is strongly driven by environmental LDL modulators such as diet.⁴¹ Some ARIC subjects were also taking lipid-lowering medications, which attenuates genetic heritability estimates as their LDL levels do not accurately reflect their genetically determined levels. We found that both LDL and ApoB were strongly genetically correlated with an EHR peripheral vascular disease phenotype,⁴² however, indicating the LDL could be associated with other atherosclerotic EHR phenotypes. Another explanation is that subjects in the EHR data set were effectively treated with LDL-lowering medications which attenuated the contribution of LDL to IHD risk in this population. CRP and fibrinogen were also more strongly associated with IHD in ARIC. However, a difference in a genetic versus epidemiological association for these factors would be expected as genetic risk scores for these traits do not predict IHD.^43,44

A strength of a prospective study is that it can delineate the extended set of outcomes associated with a biomarker measured at baseline.⁵ To emulate this feature of prospective study, we used a large set of clinical phenotypes extracted from the EHR data set to generate a library of genetically modulated diseases. We then used genetic correlation analyses to identify diseases from this library associated with an ARIC risk factor. Using this discovery approach, we showed that three IHD risk factors (CIMT, systolic blood pressure and Total-to-HDL cholesterol ratio) had significant correlations with atherosclerotic diseases, hypertension-related diseases and T2D. Thus, these analyses highlight the contributions of these disease processes to IHD risk and indicate that interventions and therapies directed toward these processes would benefit this EHR population.

An advantage of measuring associations using genetic correlations is that associations are not confounded by environment, as the association is based solely on underlying genetics.⁴⁵ However, genetic factors can modulate exposures to environmental factors, which can result in genetic correlations with environmental risk factors.⁴⁶ For instance, we found that pack-years of smoking measured in the ARIC study was genetically correlated with tobacco use disorder, peripheral artery disease and ischemic heart disease. While smoking is an environmental toxin and would not be expected to be an intrinsically heritable biomarker, smoking behaviors are genetically influenced.⁴⁷ Hence, a genetic predisposition toward high cumulative exposures to cigarettes is associated with clinically significant morbidity.

There are several limitations to this study. EHR PheCode phenotypes rely on clinical disease assignments that are often not concisely defined. Hence, even though the ARIC and EHR phenotypes evaluated in these analyses had similar clinical definitions, case status in the ARIC study was assigned based on active surveillance and standardized case definitions, while the EHR phenotypes were based on observational data and represent incident and prevalent cases detected through routine clinical care. These differences in ascertainment can contribute to unexpected differences between the phenotypes that could alter the patterns of risk factor associations between them. Since the EHR data set comprised incident and prevalent disease, factors associated with their acute presentation could not be ascertained. Consequently, we could not adjust for medication use at the time of the diagnosis or the presenting illness. Not factoring in medication use at the time of diagnosis could attenuate associations. The phenotypes were derived from billing code data which can be incomplete, leading to misclassification in control groups. However, the phenotyping methodologies used in these analyses has been extensively studied and perform well in genetic studies.^28,48 The cases and controls for the EHR IHD phenotype differed with respect to many baseline characteristics. These differences, such as the younger age of controls, can contribute to phenotype misclassification which would attenuate associations with risk factors. Furthermore, prevalent IHD, which cannot easily be diagnosed based on routine clinical testing, may not be effectively captured in an EHR data set which would lead to further misclassification. In the future, as ongoing longitudinal epidemiological studies grow and accumulate sufficiently large numbers of clinical endpoints, it will be informative to apply this analytical approach to determine whether the patterns of associations that we observed in the EHR population are seen with comparisons across other well-defined longitudinal cohorts. For many of the EHR phenotypes, there were relatively few cases, which decreases the power to detect significant genetic correlations, and can result in false negative findings. The ARIC phenotypes used in this study were limited to risk factors which have previously been associated with IHD. Hence, we were not able to identify new associations in these analyses. It is also important to note that a genetic correlation does not always indicate that a pair of phenotypes has a shared mechanism, as it is possible that SNPs used to compute the correlations may be simultaneously tagging disparate causative genetic variants.⁴⁹ Finally, these analyses were limited to subjects of European ancestry, as the numbers of subjects of other ancestries was insufficient for analysis. Further validation of this approach in other ancestral groups is needed.

In summary, we demonstrate that genetic correlations using cross-sectional data derived from separate data sets can identify clinical phenotypes associated with biomarkers, and can recapitulate longitudinal associations. We also demonstrate the utility of EHR data sets as a source of clinically relevant disease outcomes that can be linked to genetically-modulated preclinical markers ascertained in epidemiological cohorts. We anticipate that this analytic paradigm will prove useful as new putative biomarkers are identified through large scale proteomics studies and other discovery approaches, as it enables rapid identification of clinically-relevant phenotypes associated with these markers and overcomes several limitations inherent to longitudinal studies designs.

Supplementary Material

001530 - PAP

NIHMS825921-supplement-001530_-_PAP.pdf^{(635.4KB, pdf)}

001530 - Supplemental Material

NIHMS825921-supplement-001530_-_Supplemental_Material.pdf^{(317.5KB, pdf)}

Clinical Perspective.

While the gold standard for defining epidemiological risk profiles is longitudinal studies, this approach is hampered by a need for very long observation times. Here, we tested the idea that electronic health records (EHR) systems coupled to dense genomic data can be leveraged to circumvent this time constraint. We describe an approach that uses baseline risk factors from epidemiological studies (here we studied the Atherosclerosis Risk in Communities (ARIC) cohort), and measures their association with EHR phenotypes. This approach is enabled by genetics methods that measure the extent to which phenotypes are modulated by a common set of genetic variants. We evaluate EHR phenotypes for type 2 diabetes and ischemic heart disease (IHD) using data collected through the Electronic Medical Records and Genomics (eMERGE) network. We show that associations identified using our genetic approach are similar to those identified using longitudinal association approaches. Our analyses validate systolic blood pressure, triglyceride and total-to-HDL cholesterol levels as important measures of IHD risk. This analytic paradigm should prove useful not only to evaluate the contribution of known IHD risk factors to contemporary populations, but also to rapidly evaluate new putative risk factors and biomarkers of disease.

Acknowledgments

The authors thank the staff and participants of the ARIC study for their contributions.

Sources of Funding: This work was supported by a career development award from the Vanderbilt Faculty Research Scholars Fund (JDM), American Heart Association (15MCPRP25620006 and 16FTF30130005) (JDM), PGRN (P50-GM115305) and R01 LM010685. BioVU is supported by institutional funding and by CTSA grant UL1 TR000445 from NCATS/NIH. The eMERGE Network is funded by NHGRIand NIGMS: U01-HG8672 and U01-HG006378 (VUMC), U01-HG-004610 (Group Health Cooperative/University of Washington); U01-HG-004608 (Marshfield Clinic Research Foundation and VUMC); U01-HG-04599 (Mayo Clinic); U01HG004609 (Northwestern University); U01-HG-006378 and U01-HG-04603 (VUMC Coordinating Center); U01HG004438 (CIDR) and U01HG004424 (the Broad Institute) serving as Genotyping Centers. ARIC is supported by NHLBI contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C). Funding for GENEVA was provided by NHGRI grant U01HG004402 (E. Boerwinkle).

Footnotes

Disclosures: None.

References

1.WRITING GROUP MEMBERS. Lloyd-Jones D, Adams RJ, Brown TM, Carnethon M, Dai S, et al. Heart disease and stroke statistics--2010 update: a report from the American Heart Association. Circulation. 2010;121:e46–e215. doi: 10.1161/CIRCULATIONAHA.109.192667. [DOI] [PubMed] [Google Scholar]
2.Mahmood SS, Levy D, Vasan RS, Wang TJ. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet. 2014;383:999–1008. doi: 10.1016/S0140-6736(13)61752-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Wong ND. Epidemiological studies of CHD and the evolution of preventive cardiology. Nat Rev Cardiol. 2014;11:276–289. doi: 10.1038/nrcardio.2014.26. [DOI] [PubMed] [Google Scholar]
4.Ruff CT, Braunwald E. The evolving epidemiology of acute coronary syndromes. Nat Rev Cardiol. 2011;8:140–147. doi: 10.1038/nrcardio.2010.199. [DOI] [PubMed] [Google Scholar]
5.Grimes DA, Schulz KF. Cohort studies: marching towards outcomes. Lancet. 2002;359:341–345. doi: 10.1016/S0140-6736(02)07500-1. [DOI] [PubMed] [Google Scholar]
6.Denny JC. Chapter 13: Mining electronic health records in the genomics era. PLoS Comput Biol. 2012;8:e1002823. doi: 10.1371/journal.pcbi.1002823. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.McPherson R, Tybjaerg-Hansen A. Genetics of Coronary Artery Disease. Circ Res. 2016;118:564–578. doi: 10.1161/CIRCRESAHA.115.306566. [DOI] [PubMed] [Google Scholar]
8.Maher BS. Polygenic Scores in Epidemiology: Risk Prediction, Etiology, and Clinical Utility. Curr Epidemiol Rep. 2015;2:239–244. doi: 10.1007/s40471-015-0055-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Smith JA, Ware EB, Middha P, Beacher L, Kardia SLR. Current Applications of Genetic Risk Scores to Cardiovascular Outcomes and Subclinical Phenotypes. Curr Epidemiol Rep. 2015;2:180–190. doi: 10.1007/s40471-015-0046-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Smith GD, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32:1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
11.Holmes MV, Lange LA, Palmer T, Lanktree MB, North KE, Almoguera B, et al. Causal effects of body mass index on cardiometabolic traits and events: a Mendelian randomization analysis. Am J Hum Genet. 2014;94:198–208. doi: 10.1016/j.ajhg.2013.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Yang J, Manolio TA, Pasquale LR, Boerwinkle E, Caporaso N, Cunningham JM, et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet. 2011;43:519–525. doi: 10.1038/ng.823. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 2014;46:100–106. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012;28:2540–2542. doi: 10.1093/bioinformatics/bts474. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Vattikuti S, Guo J, Chow CC. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 2012;8:e1002637. doi: 10.1371/journal.pgen.1002637. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. Am J Epidemiol. 1989;129:687–702. [PubMed] [Google Scholar]
18.Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA, et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med. 2013;15:761–771. doi: 10.1038/gim.2013.72. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB, et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011;4:13. doi: 10.1186/1755-8794-4-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bowton E, Field JR, Wang S, Schildcrout JS, Van Driest SL, Delaney JT, et al. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med. 2014;6:234cm3. doi: 10.1126/scitranslmed.3008604. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, Balser JR, et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin Pharmacol Ther. 2008;84:362–369. doi: 10.1038/clpt.2008.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zuvich RL, Armstrong LL, Bielinski SJ, Bradford Y, Carlson CS, Crawford DC, et al. Pitfalls of merging GWAS data: lessons learned in the eMERGE network and quality control procedures to maintain high data quality. Genet Epidemiol. 2011;35:887–898. doi: 10.1002/gepi.20639. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Delaneau O, Zagury J-F, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods. 2013;10:5–6. doi: 10.1038/nmeth.2307. [DOI] [PubMed] [Google Scholar]
25.Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012;44:955–959. doi: 10.1038/ng.2354. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–1210. doi: 10.1093/bioinformatics/btq126. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31:1102–1110. doi: 10.1038/nbt.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Aschard H, Vilhjálmsson BJ, Joshi AD, Price AL, Kraft P. Adjusting for heritable covariates can bias effect estimates in genome-wide association studies. Am J Hum Genet. 2015;96:329–339. doi: 10.1016/j.ajhg.2014.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Majumdar A, Haldar T, Witte JS. Determining Which Phenotypes Underlie a Pleiotropic Signal. Genet Epidemiol. 2016;40:366–81. doi: 10.1002/gepi.21973. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Crow RS, Prineas RJ, Rautaharju P, Hannan P, Liebson PR. Relation between electrocardiography and echocardiography for left ventricular mass in mild systemic hypertension (results from Treatment of Mild Hypertension Study) Am J Cardiol. 1995;75:1233–1238. [PubMed] [Google Scholar]
33.Weber C, Noels H. Atherosclerosis: current pathogenesis and therapeutic options. Nat Med. 2011;17:1410–1422. doi: 10.1038/nm.2538. [DOI] [PubMed] [Google Scholar]
34.Mozaffarian D, Benjamin EJ, Go AS, Arnett DK, Blaha MJ, Cushman M, et al. Heart Disease and Stroke Statistics-2016 Update: A Report From the American Heart Association. Circulation. 2016;133:e38–e360. doi: 10.1161/CIR.0000000000000350. [DOI] [PubMed] [Google Scholar]
35.Inzucchi SE. Clinical practice. Diagnosis of diabetes. N Engl J Med. 2012;367:542–550. doi: 10.1056/NEJMcp1103643. [DOI] [PubMed] [Google Scholar]
36.Raynor LA, Pankow JS, Duncan BB, Schmidt MI, Hoogeveen RC, Pereira MA, et al. Novel risk factors and the prediction of type 2 diabetes in the Atherosclerosis Risk in Communities (ARIC) study. Diabetes Care. 2013;36:70–76. doi: 10.2337/dc12-0609. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Long MT, Fox CS. The Framingham Heart Study - 67 years of discovery in metabolic disease. Nat Rev Endocrinol. 2016;12:177–183. doi: 10.1038/nrendo.2015.226. [DOI] [PubMed] [Google Scholar]
38.Eckel RH, Grundy SM, Zimmet PZ. The metabolic syndrome. Lancet. 2005;365:1415–1428. doi: 10.1016/S0140-6736(05)66378-7. [DOI] [PubMed] [Google Scholar]
39.Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Visscher PM, Hemani G, Vinkhuyzen AAE, Chen G-B, Lee SH, Wray NR, et al. Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples. PLoS Genet. 2014;10:e1004269. doi: 10.1371/journal.pgen.1004269. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Hu FB, Willett WC. Optimal diets for prevention of coronary heart disease. JAMA. 2002;288:2569–2578. doi: 10.1001/jama.288.20.2569. [DOI] [PubMed] [Google Scholar]
42.Smith SC, Milani RV, Arnett DK, Crouse JR, McDermott MM, Ridker PM, et al. Atherosclerotic Vascular Disease Conference: Writing Group II: risk factors. Circulation. 2004;109:2613–2616. doi: 10.1161/01.CIR.0000128519.60762.84. [DOI] [PubMed] [Google Scholar]
43.Sabater-Lleal M, Huang J, Chasman D, Naitza S, Dehghan A, Johnson AD, et al. Multiethnic meta-analysis of genome-wide association studies in >100 000 subjects identifies 23 fibrinogen-associated Loci but no strong evidence of a causal association between circulating fibrinogen and cardiovascular disease. Circulation. 2013;128:1310–1324. doi: 10.1161/CIRCULATIONAHA.113.002251. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Zacho J, Tybjaerg-Hansen A, Jensen JS, Grande P, Sillesen H, Nordestgaard BG. Genetically elevated C-reactive protein and ischemic vascular disease. N Engl J Med. 2008;359:1897–1908. doi: 10.1056/NEJMoa0707402. [DOI] [PubMed] [Google Scholar]
45.Ebrahim S, Davey Smith G. Mendelian randomization: can genetic epidemiology help redress the failures of observational epidemiology? Hum Genet. 2008;123:15–33. doi: 10.1007/s00439-007-0448-6. [DOI] [PubMed] [Google Scholar]
46.Gage SH, Davey Smith G, Ware JJ, Flint J, Munafò MR. G = E: What GWAS Can Tell Us about the Environment. PLoS Genet. 2016;12:e1005765. doi: 10.1371/journal.pgen.1005765. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Amos CI, Spitz MR, Cinciripini P. Chipping away at the genetics of smoking behavior. Nat Genet. 2010;42:366–368. doi: 10.1038/ng0510-366. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Mosley JD, Witte JS, Larkin EK, Bastarache L, Shaffer CM, Karnes JH, et al. Identifying genetically driven clinical phenotypes using linear mixed models. Nat Commun. 2016;7:11433. doi: 10.1038/ncomms11433. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Gianola D, de los Campos G, Toro MA, Naya H, Schön C-C, Sorensen D. Do Molecular Markers Inform About Pleiotropy? Genetics. 2015;201:23–29. doi: 10.1534/genetics.115.179978. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

001530 - PAP

NIHMS825921-supplement-001530_-_PAP.pdf^{(635.4KB, pdf)}

001530 - Supplemental Material

NIHMS825921-supplement-001530_-_Supplemental_Material.pdf^{(317.5KB, pdf)}

[R1] 1.WRITING GROUP MEMBERS. Lloyd-Jones D, Adams RJ, Brown TM, Carnethon M, Dai S, et al. Heart disease and stroke statistics--2010 update: a report from the American Heart Association. Circulation. 2010;121:e46–e215. doi: 10.1161/CIRCULATIONAHA.109.192667. [DOI] [PubMed] [Google Scholar]

[R2] 2.Mahmood SS, Levy D, Vasan RS, Wang TJ. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet. 2014;383:999–1008. doi: 10.1016/S0140-6736(13)61752-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Wong ND. Epidemiological studies of CHD and the evolution of preventive cardiology. Nat Rev Cardiol. 2014;11:276–289. doi: 10.1038/nrcardio.2014.26. [DOI] [PubMed] [Google Scholar]

[R4] 4.Ruff CT, Braunwald E. The evolving epidemiology of acute coronary syndromes. Nat Rev Cardiol. 2011;8:140–147. doi: 10.1038/nrcardio.2010.199. [DOI] [PubMed] [Google Scholar]

[R5] 5.Grimes DA, Schulz KF. Cohort studies: marching towards outcomes. Lancet. 2002;359:341–345. doi: 10.1016/S0140-6736(02)07500-1. [DOI] [PubMed] [Google Scholar]

[R6] 6.Denny JC. Chapter 13: Mining electronic health records in the genomics era. PLoS Comput Biol. 2012;8:e1002823. doi: 10.1371/journal.pcbi.1002823. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.McPherson R, Tybjaerg-Hansen A. Genetics of Coronary Artery Disease. Circ Res. 2016;118:564–578. doi: 10.1161/CIRCRESAHA.115.306566. [DOI] [PubMed] [Google Scholar]

[R8] 8.Maher BS. Polygenic Scores in Epidemiology: Risk Prediction, Etiology, and Clinical Utility. Curr Epidemiol Rep. 2015;2:239–244. doi: 10.1007/s40471-015-0055-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Smith JA, Ware EB, Middha P, Beacher L, Kardia SLR. Current Applications of Genetic Risk Scores to Cardiovascular Outcomes and Subclinical Phenotypes. Curr Epidemiol Rep. 2015;2:180–190. doi: 10.1007/s40471-015-0046-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Smith GD, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32:1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]

[R11] 11.Holmes MV, Lange LA, Palmer T, Lanktree MB, North KE, Almoguera B, et al. Causal effects of body mass index on cardiometabolic traits and events: a Mendelian randomization analysis. Am J Hum Genet. 2014;94:198–208. doi: 10.1016/j.ajhg.2013.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Yang J, Manolio TA, Pasquale LR, Boerwinkle E, Caporaso N, Cunningham JM, et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet. 2011;43:519–525. doi: 10.1038/ng.823. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 2014;46:100–106. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012;28:2540–2542. doi: 10.1093/bioinformatics/bts474. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Vattikuti S, Guo J, Chow CC. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 2012;8:e1002637. doi: 10.1371/journal.pgen.1002637. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. Am J Epidemiol. 1989;129:687–702. [PubMed] [Google Scholar]

[R18] 18.Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA, et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med. 2013;15:761–771. doi: 10.1038/gim.2013.72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB, et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011;4:13. doi: 10.1186/1755-8794-4-13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Bowton E, Field JR, Wang S, Schildcrout JS, Van Driest SL, Delaney JT, et al. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med. 2014;6:234cm3. doi: 10.1126/scitranslmed.3008604. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, Balser JR, et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin Pharmacol Ther. 2008;84:362–369. doi: 10.1038/clpt.2008.89. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Zuvich RL, Armstrong LL, Bielinski SJ, Bradford Y, Carlson CS, Crawford DC, et al. Pitfalls of merging GWAS data: lessons learned in the eMERGE network and quality control procedures to maintain high data quality. Genet Epidemiol. 2011;35:887–898. doi: 10.1002/gepi.20639. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Delaneau O, Zagury J-F, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods. 2013;10:5–6. doi: 10.1038/nmeth.2307. [DOI] [PubMed] [Google Scholar]

[R25] 25.Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012;44:955–959. doi: 10.1038/ng.2354. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–1210. doi: 10.1093/bioinformatics/btq126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31:1102–1110. doi: 10.1038/nbt.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Aschard H, Vilhjálmsson BJ, Joshi AD, Price AL, Kraft P. Adjusting for heritable covariates can bias effect estimates in genome-wide association studies. Am J Hum Genet. 2015;96:329–339. doi: 10.1016/j.ajhg.2014.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Majumdar A, Haldar T, Witte JS. Determining Which Phenotypes Underlie a Pleiotropic Signal. Genet Epidemiol. 2016;40:366–81. doi: 10.1002/gepi.21973. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Crow RS, Prineas RJ, Rautaharju P, Hannan P, Liebson PR. Relation between electrocardiography and echocardiography for left ventricular mass in mild systemic hypertension (results from Treatment of Mild Hypertension Study) Am J Cardiol. 1995;75:1233–1238. [PubMed] [Google Scholar]

[R33] 33.Weber C, Noels H. Atherosclerosis: current pathogenesis and therapeutic options. Nat Med. 2011;17:1410–1422. doi: 10.1038/nm.2538. [DOI] [PubMed] [Google Scholar]

[R34] 34.Mozaffarian D, Benjamin EJ, Go AS, Arnett DK, Blaha MJ, Cushman M, et al. Heart Disease and Stroke Statistics-2016 Update: A Report From the American Heart Association. Circulation. 2016;133:e38–e360. doi: 10.1161/CIR.0000000000000350. [DOI] [PubMed] [Google Scholar]

[R35] 35.Inzucchi SE. Clinical practice. Diagnosis of diabetes. N Engl J Med. 2012;367:542–550. doi: 10.1056/NEJMcp1103643. [DOI] [PubMed] [Google Scholar]

[R36] 36.Raynor LA, Pankow JS, Duncan BB, Schmidt MI, Hoogeveen RC, Pereira MA, et al. Novel risk factors and the prediction of type 2 diabetes in the Atherosclerosis Risk in Communities (ARIC) study. Diabetes Care. 2013;36:70–76. doi: 10.2337/dc12-0609. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Long MT, Fox CS. The Framingham Heart Study - 67 years of discovery in metabolic disease. Nat Rev Endocrinol. 2016;12:177–183. doi: 10.1038/nrendo.2015.226. [DOI] [PubMed] [Google Scholar]

[R38] 38.Eckel RH, Grundy SM, Zimmet PZ. The metabolic syndrome. Lancet. 2005;365:1415–1428. doi: 10.1016/S0140-6736(05)66378-7. [DOI] [PubMed] [Google Scholar]

[R39] 39.Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Visscher PM, Hemani G, Vinkhuyzen AAE, Chen G-B, Lee SH, Wray NR, et al. Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples. PLoS Genet. 2014;10:e1004269. doi: 10.1371/journal.pgen.1004269. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Hu FB, Willett WC. Optimal diets for prevention of coronary heart disease. JAMA. 2002;288:2569–2578. doi: 10.1001/jama.288.20.2569. [DOI] [PubMed] [Google Scholar]

[R42] 42.Smith SC, Milani RV, Arnett DK, Crouse JR, McDermott MM, Ridker PM, et al. Atherosclerotic Vascular Disease Conference: Writing Group II: risk factors. Circulation. 2004;109:2613–2616. doi: 10.1161/01.CIR.0000128519.60762.84. [DOI] [PubMed] [Google Scholar]

[R43] 43.Sabater-Lleal M, Huang J, Chasman D, Naitza S, Dehghan A, Johnson AD, et al. Multiethnic meta-analysis of genome-wide association studies in >100 000 subjects identifies 23 fibrinogen-associated Loci but no strong evidence of a causal association between circulating fibrinogen and cardiovascular disease. Circulation. 2013;128:1310–1324. doi: 10.1161/CIRCULATIONAHA.113.002251. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Zacho J, Tybjaerg-Hansen A, Jensen JS, Grande P, Sillesen H, Nordestgaard BG. Genetically elevated C-reactive protein and ischemic vascular disease. N Engl J Med. 2008;359:1897–1908. doi: 10.1056/NEJMoa0707402. [DOI] [PubMed] [Google Scholar]

[R45] 45.Ebrahim S, Davey Smith G. Mendelian randomization: can genetic epidemiology help redress the failures of observational epidemiology? Hum Genet. 2008;123:15–33. doi: 10.1007/s00439-007-0448-6. [DOI] [PubMed] [Google Scholar]

[R46] 46.Gage SH, Davey Smith G, Ware JJ, Flint J, Munafò MR. G = E: What GWAS Can Tell Us about the Environment. PLoS Genet. 2016;12:e1005765. doi: 10.1371/journal.pgen.1005765. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Amos CI, Spitz MR, Cinciripini P. Chipping away at the genetics of smoking behavior. Nat Genet. 2010;42:366–368. doi: 10.1038/ng0510-366. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Mosley JD, Witte JS, Larkin EK, Bastarache L, Shaffer CM, Karnes JH, et al. Identifying genetically driven clinical phenotypes using linear mixed models. Nat Commun. 2016;7:11433. doi: 10.1038/ncomms11433. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.Gianola D, de los Campos G, Toro MA, Naya H, Schön C-C, Sorensen D. Do Molecular Markers Inform About Pleiotropy? Genetics. 2015;201:23–29. doi: 10.1534/genetics.115.179978. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Defining a Contemporary Ischemic Heart Disease Genetic Risk Profile using Historical Data

Jonathan D Mosley, MD, PhD

Sara L Van Driest, MD, PhD

Quinn S Wells, MD, PhamD, MSc

Christian M Shaffer, BS

Todd L Edwards, PhD

Lisa Bastarache, MS

Catherine A McCarty, PhD, MPH

Will Thompson, PhD

Christopher G Chute, MD, DrPH

Gail P Jarvik, MD, PhD

David R Crosslin, PhD

Eric B Larson, MD, MPH

Iftikhar J Kullo, MD

Jennifer A Pacheco, BA

Peggy L Peissig, PhD, MBA

Murray H Brilliant, PhD

James G Linneman, BA

Josh C Denny, MD, MS

Dan M Roden, MD

Abstract

Background

Methods and Results

Conclusions

Introduction

Materials and Methods

Study populations

ARIC

Genetic Data

Phenotype data

Statistics

Results

Study populations and phenotypes

Figure 1.

Validating genetic correlation associations

Figure 2.

Risk factors associated with ischemic heart disease

Table 1.

Figure 3.

Table 2.

Diagnoses associated with risk factors

Figure 4.

Discussion

Supplementary Material

Clinical Perspective.

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases