Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jul 1.
Published in final edited form as: Atherosclerosis. 2016 Apr 23;250:63–68. doi: 10.1016/j.atherosclerosis.2016.04.011

Targeted exonic sequencing of GWAS loci in the high extremes of the plasma lipids distribution

Aniruddh P Patel 1,2,3, Gina M Peloso 1,2, James P Pirruccello 1,2,3, Christopher T Johansen 4, Joseph B Dubé 4, Daniel B Larach 5, Matthew R Ban 4, Geesje M Dallinge-Thie 6, Namrata Gupta 2, Michael Boehnke 7, Gonçalo R Abecasis 7, John JP Kastelein 6, G Kees Hovingh 6, Robert A Hegele 4, Daniel J Rader 8, Sekar Kathiresan 1,2,3
PMCID: PMC4907838  NIHMSID: NIHMS787201  PMID: 27182959

Abstract

Objective

Genome-wide association studies (GWAS) for plasma lipid levels have mapped numerous genomic loci, with each region often containing many protein-coding genes. Targeted re-sequencing of exons is a strategy to pinpoint causal variants and genes.

Methods

We performed solution-based hybrid selection of 9,008 exons at 939 genes within 95 GWAS loci for plasma lipid levels and sequenced using next-generation sequencing technology individuals with extremely high as well as low to normal levels of low-density lipoprotein cholesterol (LDL-C, n=311; mean low=71 mg/dl versus high=241mg/dl), triglycerides (TG, n=308; mean low=75 mg/dl versus high=1938mg/dl), and high-density lipoprotein cholesterol (HDL-C, n=684; mean low=32 mg/dl versus high=102mg/dl). We identified 15,002 missense, nonsense, or splice site variants with a frequency <5%. We tested whether coding sequence variants, individually or aggregated within a gene, were associated with plasma lipid levels. To replicate findings, we performed sequencing in independent participants (n=6,424).

Results

Across discovery and replication sequencing, we found 6 variants with significant associations with plasma lipids. Of these, one was a novel association: p.Ser147Asn variant in APOA4 (14.3% frequency, TG OR=0.49, P=7.1×10−4) with TG. In gene-level association analyses where rare variants within each gene are collapsed, APOC3 (P=2.1×10−5) and LDLR (P=5.0×10−12) were associated with plasma lipids.

Conclusions

After sequencing genes from 95 GWAS loci in participants with extremely high plasma lipid levels, we identified one new coding variant associated with TG. These results provide insight regarding design of similar sequencing studies with respect to sample size, follow-up, and analysis methodology.

Introduction

Low-density lipoprotein cholesterol (LDL-C), triglycerides (TG), and high-density lipoprotein cholesterol (HDL-C) are highly heritable risk factors for coronary heart disease (CHD).1 Genome-wide association studies (GWAS) have identified many new single nucleotide polymorphisms (SNPs) related to plasma lipid levels in the population.2-9 Most associated SNPs are non-coding (intergenic or intronic) and fall in regions containing many protein-coding genes. It has been a major challenge to identify the causal gene and variant responsible for the observed associations.

One approach to pinpoint causal genes and variants at GWAS loci is to perform fine mapping through targeted sequencing. Sequencing may identify a protein-altering variant in a gene, which if associated with LDL-C, TG, or HDL-C might suggest that the gene is influencing plasma lipid variation. The discovery of rare, nonsense alleles that affect a trait may be particularly informative. Targeted sequencing of GWAS loci has been used to pinpoint independent rare variants and causal genes for diabetes mellitus,10 fetal hemoglobin,11 age-related macular degeneration,12-16 and Crohn's disease, among others.17

To search for causal genes and variants at genomic regions implicated for plasma lipids, we focused on the first 95 GWAS loci reported for plasma lipid levels5 and targeted 9,008 exons at 939 genes within these genomic regions. We sequenced individuals with extremely high plasma lipid levels and healthy controls, and performed replication sequencing in independent samples. Our two major goals were: 1) to discover novel coding variants of large effect in GWAS loci associated with plasma lipids; 2) to determine at least one causal gene influencing plasma lipids at each GWAS locus.

Materials and Methods

Ethics statement

All individuals studied and all analyses on their samples were performed according to the Declaration of Helsinki and were approved by the local medical ethics and institutional review committees at the Broad Institute.

Discovery Cohort Selection

Individuals of European ancestry with an extremely high LDL-C, TG, or HDL-C level were recruited from lipid specialty clinics at the University of Pennsylvania, Amsterdam Medical Centre, and the University of Western Ontario. Healthy age and sex-matched controls were recruited from the same medical centers independent of the lipid clinics. Individuals without history of liver disease or HIV and who were not pregnant, nursing, or taking hormone replacement therapy or niacin had ~40cc of blood drawn. Plasma lipid levels were measured directly by standard protocols in clinical labs. LDL-C was calculated using the Friedewald equation (LDL-C=TC – HDL-C – (TG/5)) for those with TG<400 mg/dl. If TG>400 mg/dl, calculated LDL-C was not calculated. Whole genomic DNA was extracted from the blood of these individuals. The cut point percentiles were calculated based on the individuals of European descent in the Framingham Heart Study Offspring cohort stratified by age and sex. Individuals with plasma lipid levels greater than the 95th percentile were selected for targeted sequencing (LDL: n=145, mean LDL-C=241 mg/dl; TG: n=143, mean TG=1,937 mg/dl; HDL: n=353, mean HDL-C=101 mg/dl). Healthy controls with plasma lipid levels less than the 25th percentile for LDL and HDL and less than the 50th percentile for TG were also sequenced (LDL: n=166, mean LDL-C=71 mg/dl; TG: n=165, mean TG=75 mg/dl; HDL: n=331, mean HDL-C=32 mg/dl) (Table 1). The paucity of individuals with very low TG led to the altered cutoff for the control group to maintain power.

Table 1.

Characteristics of participants who underwent targeted sequencing with extremely high or low plasma lipid levels

Low LDL
(n=166)
High LDL
(n=145)
Low TG
(n=165)
High TG
(n=143)
Low HDL
(n=331)
High HDL
(n=353)
LDL-C
(mg/dl)
70.8
(15.9)
241.3
(41.2)
116.7
(32.7)
118.0
(83.0)
103.1
(38.3)
123.1
(36.0)
TG (mg/dl) 90.5
(36.5)
128.8
(45.7)
74.8
(20.4)
1937.8
(1907.5)
155.4
(86.4)
74.4
(30.4)
HDL-C
(mg/dl)
50.7
(12.7)
53.5
(12.2)
53.9
(14.9)
32.2
(13.6)
31.5
(4.9)
101.3
(18.9)
Age 41.7
(17.5)
42.8
(15.1)
48.7
(15.2)
49.6
(12.2)
62.9
(13.6)
59.7
(12.2)
Female
(%)
47.4% 48.0% 58.2% 28.7% 45.9% 49.6%
BMI 18.3
(2.7)
23.9
(3.4)
25.6
(4.3)
30.5
(4.8)
28.9
(5.0)
23.5
(3.2)
T2D (%) 0.0% 1.4% 1.8% 37.9% 6.9% 5.9%
Smoking
(%)
20.0% 45.0% 10.8% 34.1% 35.1% 5.6%
HTN(%) 0.0% 40.7% 10.4% 46.4% 60.4% 27.4%
Statin use
(%)
0.0% 0.0% 3.0% 0.0% 28.2% 18.9%
CAD (%) 0.0% 30.2% N/A N/A 82.1% 4.6%

Mean phenotypic characteristics of individuals with low TG (<50th percentile), high TG (>95th percentile), low LDL-C (<25th percentile), high LDL-C (>95th percentile), low HDL-C (<25th percentile), high HDL-C (>95th percentile) who underwent targeted sequencing. All individuals who underwent targeted sequencing were of European ancestry. Parentheses denote standard deviations.

Targeted Sequencing

We sequenced the exons of all genes within 300 kb from the lead GWAS SNP identified in a GWAS meta-analysis involving >100,000 individuals (Supplementary Table 1).5, 9 The 95 loci previously mapped for plasma lipids (P<5×10−8) represented a total of 9,008 exons at 939 genes. Exons were captured by solution-based hybridization.18 To amplify exons, target-specific oligonucleotides 170 bases in length were designed to cover the entire coding sequence (hybrid selection bait size: 262,873 bases). These 170-mers were flanked with universal primer sequence to allow for PCR amplification. A T7 promoter was added in a second round of PCR, and in vitro transcription in the presence of biotin-UTP was performed to generate single-stranded hybridization bait to capture targets of interest from the DNA sample. Genomic DNA from individuals was randomly sheared and ligated to Illumina sequencing adapters. The fragments of this sheared and ligated genomic DNA were PCR amplified for 12 cycles and hybridized with biotinylated RNA bait. The hybridized DNA was extracted and PCR amplified to generate 36-base sequencing reads off of the Illumina adaptor sequence at the ends of each fragment.

Next generation sequencing reactions were performed using Illumina Genome Analyzers. Base pairs were called and sequencing reads were aligned to the human genome reference GRCh37 (hg19). Sequencing metrics were calculated using the Picard data-processing pipeline with an output of Binary Alignment Map (BAM) files. The Genome Analysis Toolkit19 suite was used to genotype all variants, calculate initial quality control metrics, and filter variants based on these values to result in an output of Variant Call Format (VCF) files, which were used for further quality control and analysis. Variants were annotated using SnpEFF.20

Discovery Cohort Quality Control

Samples that failed in any step of the solution hybrid selection component of the targeted sequencing process were excluded. Population clustering was assessed through multidimensional scaling using pruned common variants (>5% minor allele frequency) with high call rates and that were not in linkage disequilibrium. Outliers on a plot of the first two principal components generated from multidimensional scaling were excluded to ensure population stratification did not confound the results. Samples with high heterozygosity rates (number of heterozygote sites/number of variants per sample) were excluded as presumptively contaminated, and those with high singleton counts (> three interquartile range above the median) were excluded due to presumptive sequencing error. Variants with low mean depth (<8) and low call rate (<95%) were excluded.

Discovery Cohort Statistical Analysis

Single variant association results for the discovery phase targeted sequencing analysis were computed using adaptive permutations on a dichotomous phenotype of high and low levels of plasma lipids using Fisher’s exact test. Using a minor allele frequency cutoff of 5%, the variable threshold and C-alpha gene burden tests were used to identify significantly associated genes in each locus, with a Bonferroni corrected P value based on the total number of genes sequenced at the locus.21, 22 We filtered results to have a minor allele count of at least 5. All single variant associations and gene-based associations with a P value <0.05 were compared with respective association results in the replication sequencing population. Multiple marginally significant associations within a GWAS locus underwent conditional analysis with respect to the strongest known association in the locus to confirm independent association results. All analyses were performed using R, GATK, PLINK, PLINK/SEQ.23-26 Power estimates were recalculated after the quality control phase.

Replication Studies

To replicate our findings, samples were obtained from 3 studies of European descent: the Ottawa Heart Study,27 PROCARDIS,28 and ATVB29 (Supplementary Table 2). Each study was designed to understand the inherited basis for coronary heart disease and ascertained cases with either myocardial infarction or coronary revascularization at an early age and controls free of coronary heart disease. Plasma lipid levels were measured by standard protocols in clinical labs in these individuals. For participants known to be on lipid lowering therapy, we estimated the untreated LDL-C value by dividing an individual's total cholesterol (TC) value by 0.8 for those on treatment. Such an approach has been demonstrated to perform well in accounting for treatment effects in studies of quantitative traits.30 Statins are the most widely used treatment to lower plasma lipids and a statin at average dose reduces total cholesterol by 20%.31 LDL-C was calculated using the Friedewald equation (LDL-C=TC – HDL-C – (TG/5)) for those with TG<400mg/dl. If TG>400mg/dl, calculated LDL-C was set to missing. If TC was modified for medication use, the modified total cholesterol was used to calculate LDL-C. Whole genomic DNA was extracted from the blood of these individuals for sequencing, which was performed at the Broad Institute as previously described.32 Briefly, we sequenced the exomes of individuals within these cohorts to high coverage by performing solution-based hybrid selection of exons (Agilent) followed by massively parallel sequencing (Illumina Genome Analyzer II and HiSeq). The hybrid selection targeted 32.7 million bases spanning 188,260 exons from 18,560 genes. We used the Burrows-Wheeler Aligner to map 76-base-pair reads. Using the Genome Analysis Toolkit, we identified and genotyped autosomal single nucleotide variants (SNVs) and short insertion-deletion variants (indels) occurring in exons and canonical splice sites up to 2 bases from each intron-exon boundary. We performed several quality control steps to identify and remove outlier samples (based on missingness, total number of variants, singletons, doubletons, and TiTv) and variants (based on VQRS). Analysis was performed using the seqMeta package (http://cran.r-project.org/web/packages/seqMeta/index.html) in the R software package. Two analyses were performed: single variant association using linear regression analysis of plasma lipid levels on a continuous distribution and gene-based burden testing. For the gene based test, a 1% MAF threshold was used with only nonsynonymous and nonsense variants. TG levels were log transformed due to skewness, and all traits were inverse normalized before analysis. Analyses were performed separately by study and myocardial infarction case-control status adjusted for age, sex, and PCs of ancestry. Replication results were obtained through meta-analysis using the seqMeta package.

Results

Discovery Sequencing

Of the 1,434 individuals of European descent that underwent targeted sequencing, 1,303 individuals remained after quality control measures and phenotype modeling. The final targeted sequencing association analysis was performed on 311 individuals for LDL-C levels (166 low LDL-C individuals with mean LDL-C=70.8mg/dl and 145 high LDL-C individuals with mean LDL-C=241.3mg/dl), 308 individuals for TG levels (143 low TG individuals with mean TG=74.8mg/dl and 165 high TG individuals with mean TG=1937.8mg/dl), and 684 individuals for HDL-C levels (331 low HDL-C individuals with mean HDL-C=31.6mg/dl and 353 high HDL-C individuals with mean HDL-C=101.8mg/dl) (Table 1).

Of the 262,873 targeted bases, 76% were covered at greater than 30-fold coverage whereas 81% were covered at greater than 20-fold coverage. Across the 1,303 individuals, we identified 16,199 missense, nonsense, or essential splice site DNA sequence variants and of these, 15,002 were ‘rare’ (defined in this report as minor allele frequency <5%).

Single Variant Association Analysis from Targeted Sequence Data

First, we tested the association of individual variants discovered through targeted sequencing with plasma lipid levels using a case-control design. Quantile-quantile plots of the single variant association results from targeted sequencing show that most of the variants fall along the expected null distribution for each trait, indicating that each study is well calibrated (Supplementary Figures 1-3). A small fraction of nonsynonymous variants within the vicinity of the tested 95 GWAS loci (n=114 coding variants) displayed nominal evidence for association (P<0.05) (Supplementary Table 3). Of these, 7 were loss-of-function mutations (stop gained, frame-shift, splice site) and 107 were missense mutations.

The variants with the lowest P values for each trait were in genes with well-characterized roles in plasma lipid metabolism including APOB with LDL-C, LPL with TG, and CETP with HDL-C. The rare missense, nonsense, or splice-site variant with the strongest association evidence was in the TM6SF2 gene with LDL-C (p.Leu156Pro, 1.60% frequency, OR for high LDL-C of 0.05, P=1.1×10−3), C6orf10 gene with TG (p.Gly463Val, 1.30% frequency, OR for high TG of 20.17, P=8.6×10−4), and CETP gene with HDL-C (p.Arg408Gln, 3.2% frequency, OR for high HDL-C of 0.22, P=1.0×10−5).

Replication of Single Variant Results with Additional Sequencing

We sought to replicate single variants with nominal significance in a set of independent participants. Towards this end, we used sequences from 6,424 individuals of European descent. These individuals had plasma lipid levels reflective of the general European population (mean LDL-C=140.4 mg/dl, mean TG=162.8 mg/dl, and mean HDL-C=48.1 mg/dl (Supplementary Table 2). After performing stringent quality control measures, we performed single variant analyses using linear regression for LDL-C, TG, and HDL-C levels in this cohort. Quantile-quantile plots were well calibrated (Supplementary Figure 4) for each trait.

Across discovery and replication, 6 variants were found to be associated with plasma lipid levels (Table 2) after accounting for the number of variants tested. Of these, 1 was found to be associated with LDL-C (p.Thr98Ile in APOB), 3 with HDL-C (p.Arg408Gln in CETP, p.Ser460X in LPL, and p.Ser19Trp in APOA5), and 4 with triglycerides (p.Ser460X in LPL, p.Leu256Pro in GCKR, p.Ser147Asn in APOA4, and p.Ser19Trp in APOA5). Only one nonsense variant, p.Ser460X in LPL, was found to have a significant association with both HDL-C and triglycerides, and only p.Arg408Gln in CETP had a MAF < 5%.

Table 2.

Single variant association results including discovery and replication phases

Trait Gene Chromosome:
Position
rsID Protein
Change
Ref/Alt
Allele
Alternate
AF in
Low
Group
Alternate
AF in
High
Group
HWE P
value
Targeted
sequencing
OR
Targeted
sequencing
P value
Replication
sequencing
effect size
(mg/dl)
Replication
Sequencing
P value
Locus
HDL CETP 16:57017319 rs1800777 R408Q G/A 0.05 0.01 0.53 0.22 1.0×10−5 −3.9+/−0.7 4.7×10−8 CETP
HDL LPL 8:19819724 rs328 S460X C/G 0.08 0.13 0.85 1.66 3.6×10−3 +2.9+/−0.4 4.3×10−13 LPL
HDL APOA5 11:116662407 rs3135506 S19W G/C 0.07 0.04 0.15 0.61 4.2×10−2 −2.13+/−0.5* 8.9×10−5* APOA1
LDL APOB 2:21263900 rs1367117 T98I G/A 0.26 0.39 0.14 1.83 2.7×10−4 +4.5+/−0.8 8.2×10−8 APOB
TG APOA5 11:116662407 rs3135506 S19W G/C 0.03 0.18 0.21 7.11 1.0×10−6 +25.2+/−4.2* 2.3×10−9* APOA1
TG GCKR 2:27730940 rs1260326 L256P T/C 0.63 0.45 0.91 0.48 4.0×10−6 −10.7+/−2.1 1.8×10−7 GCKR
TG APOA4 11:116692334 rs5104 S147N C/T 0.88 0.78 0.42 0.49 1.5×10−3 −9.8+/−2.9 7.1×10−4** APOA1
TG LPL 8:19819724 rs328 S460X C/G 0.10 0.03 0.65 0.33 2.6×10−3 −24.3+/−3.2 2.1−10−14 LPL

Association results for variants found to have P value < 0.05 in targeted sequencing single variant analysis and significant association in replication sequencing single variant analysis after Bonferroni correction for number of variants tested; REF=Reference allele, ALT=Alternate allele, AF=allele frequency, HWE=Hardy-Weinberg Equilibrium, OR=odds ratio, Locus=gene name assigned to plasma lipid GWA SNP from Teslovich et al., Nature 2010

*

Result provided for rs35120633, which is a perfect proxy (r2=1) for rs3135506.

**

Conditional analysis performed for SNP rs5104 taking into account nearby SNP rs35120633 with stronger association, and P value remained significant at P=1.5×10−4.

One of the results was novel whereas seven of the associations have been previously reported. We found a novel association between APOA4 p.Ser147Asn (MAF=14.3%) and TG. Carriers of APOA4 p.Ser147Asn have a 51% reduced risk of high TG compared with non-carriers in the discovery set (P=1.5×10−3), and on average 9.8 mg/dl lower on TG (P=7.1×10−4) in the replication set.33-36

Gene-level Association Analysis Using Sequence Data

Our initial goal was to find and validate low-frequency variants in genes in the vicinity of known GWAS loci which may be contributing to the local GWAS signal. Power for detecting these associations is limited by the rarity of variants under study and the limited number of individuals that carry a particular variant. With the gene burden analysis approach, we aggregated putatively functional variation within a gene to test the proportion of variant carriers in the low lipid group versus the proportion of carriers in the high lipid group using the variable threshold (VT) and C-alpha gene-based tests. C-alpha test is more powerful than VT when there are both protective and deleterious variants with different magnitudes in the same gene. The VT test is more powerful when the magnitudes and directions of effect for the individual variants in a gene are consistent. Using burden testing collapsing missense, nonsense, and splice site mutations in each gene, we found nominally significant associations for 18 genes with LDL-C, 11 genes with TG, and 25 genes with HDL-C (P<0.05, see Supplementary Table 4). Of these, only the APOC3 association with HDL (P=2.1×10−5) and the LDLR association with LDL (P=5.0×10−12) were found to replicate in independent participants after adjusting for the number of genes tested (Table 3). No replicating significant associations were identified using the C-alpha test.

Table 3.

Top gene-level association results after discovery and replication phases

Trait Position Gene Test P value from
targeted sequencing
P value from
replication sequencing
HDL-C chr11:116701353-
116701613
APOC3 VT 1.8×10−3 2.1×10−5
LDL-C chr19:11200282-
11241961
LDLR VT 4.1×10−3 5.0×10−12

Association results with P <0.05 in targeted sequencing gene based analysis using variable threshold tests (VT) and replication sequencing gene based analysis using burden tests.

Conclusion

In individuals with extremely high LDL-C, TG, or HDL-C and healthy controls, we sequenced the coding regions of 939 genes located at 95 GWAS loci for plasma lipids. We subsequently performed a replication analysis in independent samples. After performing single variant and gene-burden analyses across discovery and replication cohorts, we identified a coding variant at APOA4 as associated with plasma TG levels.

This study was successful in replicating several known genes with previously defined associations with plasma lipid levels. The functions of LPL, GCKR, APOA5, APOB, and CETP in plasma lipid metabolism have been well established.33-37 The study also identified a novel genetic association between APOA4 and triglycerides. Although its precise function remains unknown, human apolipoprotein A4 is synthesized in the intestines and is secreted in chylomicrons.38-40 Synthesis of APOA4 is increased during fat absorption, and it is thought to activate lecithin-cholesterol acyltransferase in vitro.41, 42 Further functional studies will be needed to fully elucidate the role of this protein in regulating plasma triglyceride levels.

The study permits several conclusions. First, since we were unable to pinpoint specific coding variants responsible for the genome-wide association signals for many plasma lipid loci, we can speculate that the GWAS association signals may truly be due to non-coding, regulatory variants. Of the 95 total HDL-C GWAS loci that were fine mapped, 88 loci (93%) remain without any replicating significant single coding variant or gene-based association. Only the p.Ser460X variant in LPL, the p.Thr98Ile variant in APOB, and the p.Ser147Asn variant in APOA4 were found to have identical minor allele frequencies and similar effect size estimates as the non-coding variants in their respective GWAS loci (LPL locus: rs12678919 , frequency=12%, HDL effect= +2.25 mg/dl, HDL P=9.71×10−98 TG effect= -13.64 mg/dl, TG P=1.5×10−115; APOB locus: rs1367117, frequency=30%, LDL effect= +4.16 mg/dl, LDL P=4.08×10−96; and APOA1 locus: rs964184, frequency=13%, TG effect= +16.95 mg/dl, TG P=6.71×10−240).5 This suggests that for these three loci, the identified coding variants may be responsible for the signal derived by the common noncoding variant in the GWAS association, but further functional studies need to be performed. For the remaining loci, intronic or intergenic SNPs in the vicinity of the plasma lipid GWAS loci may be involved in regulation and expression of coding genes involved in lipid metabolism.43

Secondly, targeted sequencing may have limited utility in discovering causal variants at GWAS regions. The absence of rare coding variants of large effect in GWAS loci seen in this study is consistent with a recent targeted sequencing study for autoimmune diseases with even larger sample sizes.44 Although targeted sequencing has previously been used to identify a few genes implicated in various diseases, hundreds of GWAS loci have collectively been fine mapped and the functional significance of the association signal has not been fully resolved for the vast majority of the loci.10-12, 17 Therefore revisiting and systematically studying the initially discovered non-coding variants in the implicated loci will be necessary to better understand the biologic underpinnings of these associations.

Some key limitations of the present study need to be considered. Since the discovery cohort is ascertained from the high extremes of the population while the validation cohort is ascertained from the general population, we could have missed replicating associations that only drive someone to an extreme phenotype and do not have an effect in the general population. Although sampling in the high extremes was performed to increase the power of finding rare functional variants,45 dichotomizing continuously varying plasma lipid levels may have led to loss of information. Additionally, the collective sample size of 1,303 individuals may be too small to provide sufficient power to detect associations of rare alleles with more modest effect associated with these three traits. Based on the respective cohort sizes, this final targeted sequencing analysis has a calculated 80% power to identify 1% frequency variants with OR greater than 2.4 or less than 0.42 in LDL-C, 1% frequency variants with OR greater than 2.58 or less than 0.39 in TG, and 1% frequency variants with odds ratio greater than 1.85 or less than 0.54 in HDL-C, (Supplementary Table 5).46

This study successfully identified common variants and genes previously implicated in plasma lipid metabolism as well as one new association with plasma triglyceride levels. However, we did not identify any new coding variants or genes associated with plasma lipids at the remaining 94 GWAS loci. Fine mapping of coding regions surrounding GWAS loci may have limited utility in the investigation of the cause of these association signals. These results provide insight regarding the design of similar sequencing studies for cardiovascular traits.

Supplementary Material

Acknowledgements

APP and JPP are recipients of research fellowships from the Stanley J. Sarnoff Cardiovascular Research Foundation. Funding for this study was provided by NHLBI grant 5RC1HL099793 to DJR. GMP is supported by award number T32HL007208 from the NHLBI. NJS holds a Chair supported by the British Heart Foundation. SK is supported by a Research Scholar award from the Massachusetts General Hospital, R01HL107816, and a grant from Fondation Leducq. JJPK is holder of a lifetime achievement award from the Dutch Heart Foundation. GKH is a recipient of a Veni grant (project number 91612122) from the Netherlands Organisation for Scientific Research (NWO). GKH is holder of a Veni grant (91612122) from the Netherlands Organisation for Scientific Research (NWO). This work is supported by the CardioVascular Research Initiative (CVON2011-19; Genius), European Union (TransCard: FP7-603091-2), and Fondation LeDucq (Transatlantic Network, 2009-2014).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J. Factors of risk in the development of coronary heart disease--six year follow-up experience. The framingham study. Annals of internal medicine. (3rd) 1961;55:33–50. doi: 10.7326/0003-4819-55-1-33. [DOI] [PubMed] [Google Scholar]
  • 2.Kathiresan S, Melander O, Guiducci C, et al. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nature genetics. 2008;40:189–197. doi: 10.1038/ng.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kathiresan S, Willer CJ, Peloso GM, et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nature genetics. 2009;41:56–65. doi: 10.1038/ng.291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Saxena R, Voight BF, Lyssenko V, Burtt NP, De Bakker PIW, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, Hughes TE, Groop L, Altshuler D. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316:1331–1336. doi: 10.1126/science.1142358. [DOI] [PubMed] [Google Scholar]
  • 5.Teslovich TM, Musunuru K, Smith AV, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010;466:707–713. doi: 10.1038/nature09270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Willer CJ, Sanna S, Jackson AU, et al. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nature genetics. 2008;40:161–169. doi: 10.1038/ng.76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Aulchenko YS, Ripatti S, Lindqvist I, et al. Loci influencing lipid levels and coronary heart disease risk in 16 european population cohorts. Nature genetics. 2009;41:47–55. doi: 10.1038/ng.269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sabatti C, Service SK, Hartikainen AL, et al. Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nature genetics. 2009;41:35–46. doi: 10.1038/ng.271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.C Global Lipids Genetics. Willer CJ, Schmidt EM, et al. Discovery and refinement of loci associated with lipid levels. Nature genetics. 2013;45:1274–1283. doi: 10.1038/ng.2797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Nejentsev S, Walker N, Riches D, Egholm M, Todd JA. Rare variants of ifih1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science. 2009;324:387–389. doi: 10.1126/science.1167728. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Galarneau G, Palmer CD, Sankaran VG, Orkin SH, Hirschhorn JN, Lettre G. Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation. Nature genetics. 2010;42:1049–1051. doi: 10.1038/ng.707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Raychaudhuri S, Iartchouk O, Chin K, et al. A rare penetrant mutation in cfh confers high risk of age-related macular degeneration. Nature genetics. 2011;43:1232–1236. doi: 10.1038/ng.976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Helgason H, Sulem P, Duvvari MR, et al. A rare nonsynonymous sequence variant in c3 is associated with high risk of age-related macular degeneration. Nature genetics. 2013;45:1371–1376. doi: 10.1038/ng.2740. [DOI] [PubMed] [Google Scholar]
  • 14.Seddon JM, Yu Y, Miller EC, et al. Rare variants in cfi, c3 and c9 are associated with high risk of advanced age-related macular degeneration. Nature genetics. 2013;45:1366–1373. doi: 10.1038/ng.2741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Van De Ven JPH, Nilsson SC, Tan PL, et al. A functional variant in the cfi gene confers a high risk of age-related macular degeneration. Nature genetics. 2013;45:813–817. doi: 10.1038/ng.2640. [DOI] [PubMed] [Google Scholar]
  • 16.Zhan X, Larson DE, Wang C, et al. Identification of a rare coding variant in complement 3 associated with age-related macular degeneration. Nature genetics. 2013;45:1375–1381. doi: 10.1038/ng.2758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rivas MA, Beaudoin M, Gardet A, et al. Deep resequencing of gwas loci identifies independent rare variants associated with inflammatory bowel disease. Nature genetics. 2011;43:1066–1073. doi: 10.1038/ng.952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature Biotechnology. 2009;27:182–189. doi: 10.1038/nbt.1523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Depristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics. 2011;43:491–501. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: Snps in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Price AL, Kryukov GV, de Bakker PIW, Purcell SM, Staples J, Wei LJ, Sunyaev SR. Pooled association tests for rare variants in exon-resequencing studies. American Journal of Human Genetics. 2010;86:832–838. doi: 10.1016/j.ajhg.2010.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, Kathiresan S, Purcell SM, Roeder K, Daly MJ. Testing for an unusual distribution of rare variants. PLoS genetics. 2011;7:e1001322. doi: 10.1371/journal.pgen.1001322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis toolkit: A mapreduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, De Bakker PIW, Daly MJ, Sham PC. Plink: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Plink/seq: A library for the analysis of genetic variation data. 2012 [Google Scholar]
  • 26.RDC Team. R: A language and environment for statistical computing. 2010 [Google Scholar]
  • 27.McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, Cox DR, Hinds DA, Pennacchio LA, Tybjaerg-Hansen A, Folsom AR, Boerwinkle E, Hobbs HH, Cohen JC. A common allele on chromosome 9 associated with coronary heart disease. Science. 2007;316:1488–1491. doi: 10.1126/science.1142447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Clarke R, Peden JF, Hopewell JC, et al. Genetic variants associated with lp(a) lipoprotein level and coronary disease. The New England journal of medicine. 2009;361:2518–2528. doi: 10.1056/NEJMoa0902604. [DOI] [PubMed] [Google Scholar]
  • 29.Kathiresan S, Voight BF, Purcell S, et al. Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nature genetics. 2009;41:334–341. doi: 10.1038/ng.327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tobin MD, Sheehan NA, Scurrah KJ, Burton PR. Adjusting for treatment effects in studies of quantitative traits: Antihypertensive therapy and systolic blood pressure. Statistics in medicine. 2005;24:2911–2935. doi: 10.1002/sim.2165. [DOI] [PubMed] [Google Scholar]
  • 31.Baigent C, Keech A, Kearney PM, Blackwell L, Buck G, Pollicino C, Kirby A, Sourjina T, Peto R, Collins R, Simes R. C Cholesterol Treatment Trialists. icacy and safety of cholesterol-lowering treatment: Prospective meta-analysis of data from 90,056 participants in 14 randomised trials of statins. Lancet. 2005;366:1267–1278. doi: 10.1016/S0140-6736(05)67394-1. [DOI] [PubMed] [Google Scholar]
  • 32.Crosby J, Peloso GM, Auer PL, et al. Loss-of-function mutations in apoc3, triglycerides, and coronary disease. New England Journal of Medicine. 2014;371:22–31. doi: 10.1056/NEJMoa1307095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kondo I, Berg K, Drayna D, Lawn R. DNA polymorphism at the locus for human cholesteryl ester transfer protein (cetp) is associated with high density lipoprotein cholesterol and apolipoprotein levels. Clinical Genetics. 1989;35:49–56. doi: 10.1111/j.1399-0004.1989.tb02904.x. [DOI] [PubMed] [Google Scholar]
  • 34.Patsch JR, Prasad S, Gotto AM, Patsch W. High density lipoprotein2. Relationship of the plasma levels of this lipoprotein species to its composition, to the magnitude of postprandial lipemia, and to the activities of lipoprotein lipase and hepatic lipase. Journal of Clinical Investigation. 1987;80:341–347. doi: 10.1172/JCI113078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Farrelly D, Brown KS, Tieman A, Ren J, Lira SA, Hagan D, Gregg R, Mookhtiar KA, Hariharan N. Mice mutant for glucokinase regulatory protein exhibit decreased liver glucokinase: A sequestration mechanism in metabolic regulation. Proceedings of the National Academy of Sciences of the United States of America. 1999;96:14511–14516. doi: 10.1073/pnas.96.25.14511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Soria LF, Ludwig EH, Clarke HRG, Vega GL, Grundy SM, McCarthy BJ. Association between a specific apolipoprotein b mutation and familial defective apolipoprotein b-100. Proceedings of the National Academy of Sciences of the United States of America. 1989;86:587–591. doi: 10.1073/pnas.86.2.587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Pennacchio LA, Olivier M, Hubacek JA, Cohen JC, Cox DR, Fruchart JC, Krauss RM, Rubin EM. An apolipoprotein influencing triglycerides in humans and mice revealed by comparative sequencing. Science. 2001;294:169–173. doi: 10.1126/science.1064852. [DOI] [PubMed] [Google Scholar]
  • 38.Green PHR, Glickman RM, Riley JW, Quinet E. Human apolipoprotein a-iv. Intestinal origin and distribution in plasma. Journal of Clinical Investigation. 1980;65:911–919. doi: 10.1172/JCI109745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Green PHR, Glickman RM, Saudek CD, Blum CB, Tall AR. Human intestinal lipoproteins. Studies in chyluric subjects. Journal of Clinical Investigation. 1979;64:233–242. doi: 10.1172/JCI109444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Karathanasis SK. Apolipoprotein multigene family: Tandem organization of human apolipoprotein ai, ciii, and aiv genes. Proceedings of the National Academy of Sciences of the United States of America. 1985;82:6374–6378. doi: 10.1073/pnas.82.19.6374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Karathanasis SK, Oettgen P, Haddad IA, Antonarakis SE. Structure, evolution, and polymorphisms of the human apolipoprotein a4 gene (apoa4) Proceedings of the National Academy of Sciences of the United States of America. 1986;83:8457–8461. doi: 10.1073/pnas.83.22.8457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Steinmetz A, Utermann G. Activation of lecithin:Cholesterol acyltransferase by human apolipoprotein a-iv. Journal of Biological Chemistry. 1985;260:2258–2264. [PubMed] [Google Scholar]
  • 43.Consortium EP, Dunham I, Kundaje A, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Hunt KA, Mistry V, Bockett NA, et al. Negligible impact of rare autoimmune-locus coding-region variants on missing heritability. Nature. 2013;498:232–235. doi: 10.1038/nature12170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Peloso GM, Rader DJ, Gabriel S, Kathiresan S, Daly MJ, Neale BM. Phenotypic extremes in rare variant study designs. Eur J Hum Genet. 2015 doi: 10.1038/ejhg.2015.197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Purcell S, Cherny SS, Sham PC. Genetic power calculator: Design of linkage and association genetic mapping studies of complex traits. Bioinformatics. 2003;19:149–150. doi: 10.1093/bioinformatics/19.1.149. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

RESOURCES